Mapping with purrr

Jose M Sallan 2022-11-30 8 min read

In this post, I will present the functionalities of the purrr package for mapping (iterating) along vectors, lists or data frames using functional programming. purrr is included in the tidyverse and it is loaded with the tidyverse (meta-)package.

I will also be using kableExtra to present data frames nicely.

library(tidyverse)
library(kableExtra)

Mapping one list

In mathematics, mapping is an operation that associates each element of a given set (the domain) with one or more elements of a second set (the range). This is precisely what the map function of purrr is doing.

map has two arguments. The first one is a list, vector or data frame. The second is the function to be applied to each element of the first argument.

Let’s build a list of a vector and two data frames:

l <- list(a = LETTERS, b = iris, c = mtcars)

The outcome of map is always a list:

map(l, length)
## $a
## [1] 26
## 
## $b
## [1] 5
## 
## $c
## [1] 11

We can obtain a similar output using the lapply R base function:

lapply(l, length)
## $a
## [1] 26
## 
## $b
## [1] 5
## 
## $c
## [1] 11

The length function always returns an integer, so it makes sense to obtain a vector of integers instead of a list. We can achieve that with map_int.

map_int(l, length)
##  a  b  c 
## 26  5 11

Again, we can obtain a similar output using sapply:

sapply(l, length)
##  a  b  c 
## 26  5 11

All mapping functions of purrr include variants that allow specifying the class of the output. Using map_int, map_dbl, map_chr and map_lgl we obtain, if possible, outputs of class integer, double, character and logical respectively.

Let’s obtain the output of length for list l as a character vector:

map_chr(l, length)
##    a    b    c 
## "26"  "5" "11"

The first argument of the map family functions can also be a data frame. Those functions treat a data frame as a list of columns. Let’s see how can we calculate the mean of each of the columns of mtcars.

map_dbl(mtcars, mean)
##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb 
##   0.437500   0.406250   3.687500   2.812500

Using R base we can obtain the same result using apply across columns:

apply(mtcars, 2, mean)
##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb 
##   0.437500   0.406250   3.687500   2.812500

In purrr functions we can use function shortcuts, where the function is introduced with ~. In map, the element of list or data frame is represented as ..

map_dbl(mtcars, ~ round(mean(.), 4))
##      mpg      cyl     disp       hp     drat       wt     qsec       vs 
##  20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
##       am     gear     carb 
##   0.4062   3.6875   2.8125

We can produce a similar output using R base, but not with the function shortcut.

sapply(mtcars, \(i) round(mean(i), 4))
##      mpg      cyl     disp       hp     drat       wt     qsec       vs 
##  20.0906   6.1875 230.7219 146.6875   3.5966   3.2172  17.8487   0.4375 
##       am     gear     carb 
##   0.4062   3.6875   2.8125

Mapping two lists

The map2 family of functions allows iterating a function with two arguments along two lists.

To illustrate how map2 functions work, let’s build a function that tells us if we have improved or worsened our performance when comparing past and present grades:

check_improvement <- function(past, present){
  
  if(past < present){
    report <- "improved"
  }else{
    report <- "not improved"
  }
  
  return(report)
}

We want to apply check_improvement to two vectors of past and present grades:

set.seed(1111)
past_grades <- sample(1:10, 10, replace = TRUE)
present_grades <- sample(1:10, 10, replace = TRUE)

We cannot apply check_improvement to the vectors past_grades and present_grades directly, as if only logical evaluations arguments of length one. We can iterate along these two vectors using map2. The two inputs of the function are labeled as ´.xand.y`.

map2(past_grades, present_grades, ~ check_improvement(.x, .y))
## [[1]]
## [1] "not improved"
## 
## [[2]]
## [1] "improved"
## 
## [[3]]
## [1] "not improved"
## 
## [[4]]
## [1] "not improved"
## 
## [[5]]
## [1] "improved"
## 
## [[6]]
## [1] "not improved"
## 
## [[7]]
## [1] "not improved"
## 
## [[8]]
## [1] "not improved"
## 
## [[9]]
## [1] "not improved"
## 
## [[10]]
## [1] "improved"

Using functions *_dfr and *_dfc we can present the output as tibbles constructed by rows or columns, respectively. Let’s modify the function above to return a row of a data frame for each observation.

check_improvement2 <- function(past, present){
  
  if(past < present){
    report <- "improved"
  }else{
    report <- "not improved"
  }
  
  return(list(past = past, present = present, report = report))
}

Now we get the data frame binding rows with map2_dfr:

map2_dfr(past_grades, present_grades, ~ check_improvement2(.x, .y)) %>%
  kbl() %>%
  kable_styling(full_width = FALSE)
past present report
6 6 not improved
2 8 improved
10 2 not improved
4 2 not improved
1 7 improved
6 5 not improved
6 1 not improved
10 5 not improved
7 4 not improved
1 4 improved

Mapping more than two lists

We can map functions taking three or more arguments using the pmap family. They work similary to map2, but taking arguments of the form ..1, ..2, ..3 and so on. The input of those functions is a list with the elements to iterate. Let’s see how the pmap functions work with an example.

Let’s consider a quiz where you are betting on the results of football matches. If your result has the same winning team as the real match, or you correctly guess a tie, you get two points. If your bet matches the exact result, you get three points. If f1 and f2 are the forecasted goals for each team, and r1 and r2 the real result, we can get the points of the bet with the function:

score <- function(f1, f2, r1, r2){
  
  points <- 0
  
  if(sign(f1-f2) == sign(r1-r2))
    points <- 2
  
  if(f1 == r1 & f2 == r2)
    points <- 3
  
  return(points)
}

Let’s test the function with a list of bets and results matches_list and the pmap_dbl function:

matches_list <- list(mf1 = c(0, 0, 0, 1),
                     mf2 = c(0, 2, 3, 1),
                     mr1 = c(1, 0, 1, 1),
                     mr2 = c(1, 2, 0, 1))

pmap_dbl(matches_list, ~score(..1, ..2, ..3, ..4))
## [1] 2 3 0 3

Mapping in a data frame

We can use the map, map2 and pmap families of functions inside a data frame or tibble using mutate. For the example above we can store bets and results in a tibble, and then use mutate to add a column with the results of the score function. Note that the argument of pmap can be also a data frame.

matches <- tibble(mf1 = c(0, 0, 0, 1),
                  mf2 = c(0, 2, 3, 1),
                  mr1 = c(1, 0, 1, 1),
                  mr2 = c(1, 2, 0, 1))

matches <- matches %>%
  mutate(result = pmap_dbl(matches, ~score(..1, ..2, ..3, ..4)))

matches %>%
  kbl() %>%
  kable_styling(full_width = FALSE)
mf1 mf2 mr1 mr2 result
0 0 1 1 2
0 2 0 2 3
0 3 1 0 0
1 1 1 1 3

If we use functions of the map or map2 family, the arguments of the functions are the columns of the data frame.

grades <- tibble(past = past_grades, present = present_grades)

grades <- grades %>%
  mutate(check = map2_chr(past, present, ~ check_improvement(.x, .y)))

grades %>%
  kbl() %>%
  kable_styling(full_width = FALSE)
past present check
6 6 not improved
2 8 improved
10 2 not improved
4 2 not improved
1 7 improved
6 5 not improved
6 1 not improved
10 5 not improved
7 4 not improved
1 4 improved

References

Session info

## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 19.2
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
## 
## locale:
##  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.3.4 forcats_0.5.2    stringr_1.4.1    dplyr_1.0.10    
##  [5] purrr_0.3.5      readr_2.1.3      tidyr_1.2.1      tibble_3.1.8    
##  [9] ggplot2_3.4.0    tidyverse_1.3.1 
## 
## loaded via a namespace (and not attached):
##  [1] svglite_2.1.0     lubridate_1.9.0   assertthat_0.2.1  digest_0.6.30    
##  [5] utf8_1.2.2        R6_2.5.1          cellranger_1.1.0  backports_1.4.1  
##  [9] reprex_2.0.2      evaluate_0.17     highr_0.9         httr_1.4.4       
## [13] blogdown_1.9      pillar_1.8.1      rlang_1.0.6       readxl_1.4.1     
## [17] rstudioapi_0.13   jquerylib_0.1.4   rmarkdown_2.14    webshot_0.5.3    
## [21] munsell_0.5.0     broom_1.0.1       compiler_4.2.2    modelr_0.1.10    
## [25] xfun_0.34         pkgconfig_2.0.3   systemfonts_1.0.4 htmltools_0.5.3  
## [29] tidyselect_1.1.2  bookdown_0.26     fansi_1.0.3       viridisLite_0.4.1
## [33] crayon_1.5.2      tzdb_0.3.0        dbplyr_2.2.1      withr_2.5.0      
## [37] grid_4.2.2        jsonlite_1.8.3    gtable_0.3.0      lifecycle_1.0.3  
## [41] DBI_1.1.2         magrittr_2.0.3    scales_1.2.1      cli_3.4.1        
## [45] stringi_1.7.8     fs_1.5.2          xml2_1.3.3        bslib_0.3.1      
## [49] ellipsis_0.3.2    generics_0.1.2    vctrs_0.5.0       tools_4.2.2      
## [53] glue_1.6.2        hms_1.1.2         fastmap_1.1.0     yaml_2.3.6       
## [57] timechange_0.1.1  colorspace_2.0-3  rvest_1.0.3       knitr_1.40       
## [61] haven_2.5.1       sass_0.4.1