In this post, I will present the functionalities of the purrr
package for mapping (iterating) along vectors, lists or data frames using functional programming. purrr
is included in the tidyverse and it is loaded with the tidyverse
(meta-)package.
I will also be using kableExtra
to present data frames nicely.
library(tidyverse)
library(kableExtra)
Mapping one list
In mathematics, mapping is an operation that associates each element of a given set (the domain) with one or more elements of a second set (the range). This is precisely what the map
function of purrr
is doing.
map
has two arguments. The first one is a list, vector or data frame. The second is the function to be applied to each element of the first argument.
Let’s build a list of a vector and two data frames:
l <- list(a = LETTERS, b = iris, c = mtcars)
The outcome of map
is always a list:
map(l, length)
## $a
## [1] 26
##
## $b
## [1] 5
##
## $c
## [1] 11
We can obtain a similar output using the lapply
R base function:
lapply(l, length)
## $a
## [1] 26
##
## $b
## [1] 5
##
## $c
## [1] 11
The length
function always returns an integer, so it makes sense to obtain a vector of integers instead of a list. We can achieve that with map_int
.
map_int(l, length)
## a b c
## 26 5 11
Again, we can obtain a similar output using sapply
:
sapply(l, length)
## a b c
## 26 5 11
All mapping functions of purrr
include variants that allow specifying the class of the output. Using map_int
, map_dbl
, map_chr
and map_lgl
we obtain, if possible, outputs of class integer, double, character and logical respectively.
Let’s obtain the output of length
for list l
as a character vector:
map_chr(l, length)
## a b c
## "26" "5" "11"
The first argument of the map
family functions can also be a data frame. Those functions treat a data frame as a list of columns. Let’s see how can we calculate the mean of each of the columns of mtcars
.
map_dbl(mtcars, mean)
## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
Using R base we can obtain the same result using apply
across columns:
apply(mtcars, 2, mean)
## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
In purrr
functions we can use function shortcuts, where the function is introduced with ~
. In map
, the element of list or data frame is represented as .
.
map_dbl(mtcars, ~ round(mean(.), 4))
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
We can produce a similar output using R base, but not with the function shortcut.
sapply(mtcars, \(i) round(mean(i), 4))
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
Mapping two lists
The map2
family of functions allows iterating a function with two arguments along two lists.
To illustrate how map2
functions work, let’s build a function that tells us if we have improved or worsened our performance when comparing past
and present
grades:
check_improvement <- function(past, present){
if(past < present){
report <- "improved"
}else{
report <- "not improved"
}
return(report)
}
We want to apply check_improvement
to two vectors of past and present grades:
set.seed(1111)
past_grades <- sample(1:10, 10, replace = TRUE)
present_grades <- sample(1:10, 10, replace = TRUE)
We cannot apply check_improvement
to the vectors past_grades
and present_grades
directly, as if
only logical evaluations arguments of length one. We can iterate along these two vectors using map2
. The two inputs of the function are labeled as ´.xand
.y`.
map2(past_grades, present_grades, ~ check_improvement(.x, .y))
## [[1]]
## [1] "not improved"
##
## [[2]]
## [1] "improved"
##
## [[3]]
## [1] "not improved"
##
## [[4]]
## [1] "not improved"
##
## [[5]]
## [1] "improved"
##
## [[6]]
## [1] "not improved"
##
## [[7]]
## [1] "not improved"
##
## [[8]]
## [1] "not improved"
##
## [[9]]
## [1] "not improved"
##
## [[10]]
## [1] "improved"
Using functions *_dfr
and *_dfc
we can present the output as tibbles constructed by rows or columns, respectively. Let’s modify the function above to return a row of a data frame for each observation.
check_improvement2 <- function(past, present){
if(past < present){
report <- "improved"
}else{
report <- "not improved"
}
return(list(past = past, present = present, report = report))
}
Now we get the data frame binding rows with map2_dfr
:
map2_dfr(past_grades, present_grades, ~ check_improvement2(.x, .y)) %>%
kbl() %>%
kable_styling(full_width = FALSE)
past | present | report |
---|---|---|
6 | 6 | not improved |
2 | 8 | improved |
10 | 2 | not improved |
4 | 2 | not improved |
1 | 7 | improved |
6 | 5 | not improved |
6 | 1 | not improved |
10 | 5 | not improved |
7 | 4 | not improved |
1 | 4 | improved |
Mapping more than two lists
We can map functions taking three or more arguments using the pmap
family. They work similary to map2
, but taking arguments of the form ..1
, ..2
, ..3
and so on. The input of those functions is a list with the elements to iterate. Let’s see how the pmap
functions work with an example.
Let’s consider a quiz where you are betting on the results of football matches. If your result has the same winning team as the real match, or you correctly guess a tie, you get two points. If your bet matches the exact result, you get three points. If f1
and f2
are the forecasted goals for each team, and r1
and r2
the real result, we can get the points of the bet with the function:
score <- function(f1, f2, r1, r2){
points <- 0
if(sign(f1-f2) == sign(r1-r2))
points <- 2
if(f1 == r1 & f2 == r2)
points <- 3
return(points)
}
Let’s test the function with a list of bets and results matches_list
and the pmap_dbl
function:
matches_list <- list(mf1 = c(0, 0, 0, 1),
mf2 = c(0, 2, 3, 1),
mr1 = c(1, 0, 1, 1),
mr2 = c(1, 2, 0, 1))
pmap_dbl(matches_list, ~score(..1, ..2, ..3, ..4))
## [1] 2 3 0 3
Mapping in a data frame
We can use the map
, map2
and pmap
families of functions inside a data frame or tibble using mutate
. For the example above we can store bets and results in a tibble, and then use mutate
to add a column with the results of the score
function. Note that the argument of pmap
can be also a data frame.
matches <- tibble(mf1 = c(0, 0, 0, 1),
mf2 = c(0, 2, 3, 1),
mr1 = c(1, 0, 1, 1),
mr2 = c(1, 2, 0, 1))
matches <- matches %>%
mutate(result = pmap_dbl(matches, ~score(..1, ..2, ..3, ..4)))
matches %>%
kbl() %>%
kable_styling(full_width = FALSE)
mf1 | mf2 | mr1 | mr2 | result |
---|---|---|---|---|
0 | 0 | 1 | 1 | 2 |
0 | 2 | 0 | 2 | 3 |
0 | 3 | 1 | 0 | 0 |
1 | 1 | 1 | 1 | 3 |
If we use functions of the map
or map2
family, the arguments of the functions are the columns of the data frame.
grades <- tibble(past = past_grades, present = present_grades)
grades <- grades %>%
mutate(check = map2_chr(past, present, ~ check_improvement(.x, .y)))
grades %>%
kbl() %>%
kable_styling(full_width = FALSE)
past | present | check |
---|---|---|
6 | 6 | not improved |
2 | 8 | improved |
10 | 2 | not improved |
4 | 2 | not improved |
1 | 7 | improved |
6 | 5 | not improved |
6 | 1 | not improved |
10 | 5 | not improved |
7 | 4 | not improved |
1 | 4 | improved |
References
purrr
page in tidyverse website https://purrr.tidyverse.org/- 21: iteration in Wickham, H. and Grolemund, G. (in progress). R for data science. https://r4ds.had.co.nz/iteration.html
Session info
## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 19.2
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
##
## locale:
## [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_1.3.4 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10
## [5] purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8
## [9] ggplot2_3.4.0 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] svglite_2.1.0 lubridate_1.9.0 assertthat_0.2.1 digest_0.6.30
## [5] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0 backports_1.4.1
## [9] reprex_2.0.2 evaluate_0.17 highr_0.9 httr_1.4.4
## [13] blogdown_1.9 pillar_1.8.1 rlang_1.0.6 readxl_1.4.1
## [17] rstudioapi_0.13 jquerylib_0.1.4 rmarkdown_2.14 webshot_0.5.3
## [21] munsell_0.5.0 broom_1.0.1 compiler_4.2.2 modelr_0.1.10
## [25] xfun_0.34 pkgconfig_2.0.3 systemfonts_1.0.4 htmltools_0.5.3
## [29] tidyselect_1.1.2 bookdown_0.26 fansi_1.0.3 viridisLite_0.4.1
## [33] crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1 withr_2.5.0
## [37] grid_4.2.2 jsonlite_1.8.3 gtable_0.3.0 lifecycle_1.0.3
## [41] DBI_1.1.2 magrittr_2.0.3 scales_1.2.1 cli_3.4.1
## [45] stringi_1.7.8 fs_1.5.2 xml2_1.3.3 bslib_0.3.1
## [49] ellipsis_0.3.2 generics_0.1.2 vctrs_0.5.0 tools_4.2.2
## [53] glue_1.6.2 hms_1.1.2 fastmap_1.1.0 yaml_2.3.6
## [57] timechange_0.1.1 colorspace_2.0-3 rvest_1.0.3 knitr_1.40
## [61] haven_2.5.1 sass_0.4.1