Purrr is a new package that fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr. Like many of my recent packages, it works with magrittr to allow you to express complex operations by combining simple pieces in a standard way.

Install it with:

install.packages("purrr")

Purrr wouldn’t be possible without Lionel Henry. He wrote a lot of the package and his insightful comments helped me rapidly iterate towards a stable, useful, and understandable package.

Map functions

The core of purrr is a set of functions for manipulating vectors (atomic vectors, lists, and data frames). The goal is similar to dplyr: help you tackle the most common 90% of data manipulation challenges. But where dplyr focusses on data frames, purrr focusses on vectors. For example, the following code splits the built-in mtcars dataset up by number of cylinders (using the base split() function), fits a linear model to each piece, summarises each model, then extracts the the (R^2):

mtcars %>% split(.$cyl) %>% map(~lm(mpg ~ wt, data = .)) %>% map(summary) %>% map_dbl("r.squared") #> 4 6 8 #> 0.509 0.465 0.423

The first argument to all map functions is the vector to operate on. The second argument, .f specifies what to do with each piece. It can be:

A function, like summary() .

A formula, which is converted to an anonymous function, so that ~ lm(mpg ~ wt, data = .) is shorthand for function(x) lm(mpg ~ wt, data = x) .

A string or number, which is used to extract components, i.e. "r.squared" is shorthand for function(x) x[[r.squared]] and 1 is shorthand for function(x) x[[1]] .

Map functions come in a few different variations based on their inputs and output:

map() takes a vector (list or atomic vector) and returns a list. map_lgl() , map_int() , map_dbl() , and map_chr() take a vector and return an atomic vector. flatmap() works similarly, but allows the function to return arbitrary length vectors.

map_if() only applies .f to those elements of the list where .p is true. For example, the following snippet converts factors into characters:

iris %>% map_if(is.factor, as.character) %>% str() #> 'data.frame': 150 obs. of 5 variables: #> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... #> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... #> $ Species : chr "setosa" "setosa" "setosa" "setosa" ...

map_at() works similarly but instead of working with a logical vector or predicate function, it works with a integer vector of element positions.

map2() takes a pair of lists and iterates through them in parallel:

map2(1:3, 2:4, c) #> [[1]] #> [1] 1 2 #> #> [[2]] #> [1] 2 3 #> #> [[3]] #> [1] 3 4 map2(1:3, 2:4, ~ .x * (.y - 1)) #> [[1]] #> [1] 1 #> #> [[2]] #> [1] 4 #> #> [[3]] #> [1] 9

map3() does the same thing for three lists, and map_n() does it in general.

invoke() , invoke_lgl() , invoke_int() , invoke_dbl() , and invoke_chr() take a list of functions, and call each one with the supplied arguments:

list(m1 = mean, m2 = median) %>% invoke_dbl(rcauchy(100)) #> m1 m2 #> 9.765 0.117

walk() takes a vector, calls a function on piece, and returns its original input. It’s useful for functions called for their side-effects; it returns the input so you can use it in a pipe.

Purrr and dplyr

I’m becoming increasingly enamoured with the list-columns in data frames. The following example combines purrr and dplyr to generate 100 random test-training splits in order to compute an unbiased estimate of prediction quality. These tools are still experimental (and currently need quite a bit of extra scaffolding), but I think the basic approach is really appealing.

library(dplyr) random_group <- function(n, probs) { probs <- probs / sum(probs) g <- findInterval(seq(0, 1, length = n), c(0, cumsum(probs)), rightmost.closed = TRUE) names(probs)[sample(g)] } partition <- function(df, n, probs) { n %>% replicate(split(df, random_group(nrow(df), probs)), FALSE) %>% zip_n() %>% as_data_frame() } msd <- function(x, y) sqrt(mean((x - y) ^ 2)) # Genearte 100 random test-training splits, cv <- mtcars %>% partition(100, c(training = 0.8, test = 0.2)) %>% mutate( # Fit the model model = map(training, ~ lm(mpg ~ wt, data = .)), # Make predictions on test data pred = map2(model, test, predict), # Calculate mean squared difference diff = map2(pred, test %>% map("mpg"), msd) %>% flatten() ) cv #> Source: local data frame [100 x 5] #> #> test training model pred diff #> (list) (list) (list) (list) (dbl) #> 1 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 3.70 #> 2 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 2.03 #> 3 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 2.29 #> 4 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 4.88 #> 5 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 3.20 #> 6 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 4.68 #> 7 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 3.39 #> 8 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 3.82 #> 9 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 2.56 #> 10 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]> 3.40 #> .. ... ... ... ... ... mean(cv$diff) #> [1] 3.22

Other functions

There are too many other pieces of purrr to describe in detail here. A few of the most useful functions are noted below:

zip_n() allows you to turn a list of lists “inside-out”:

x <- list(list(a = 1, b = 2), list(a = 2, b = 1)) x %>% str() #> List of 2 #> $ :List of 2 #> ..$ a: num 1 #> ..$ b: num 2 #> $ :List of 2 #> ..$ a: num 2 #> ..$ b: num 1 x %>% zip_n() %>% str() #> List of 2 #> $ a:List of 2 #> ..$ : num 1 #> ..$ : num 2 #> $ b:List of 2 #> ..$ : num 2 #> ..$ : num 1 x %>% zip_n(.simplify = TRUE) %>% str() #> List of 2 #> $ a: num [1:2] 1 2 #> $ b: num [1:2] 2 1

keep() and discard() allow you to filter a vector based on a predicate function. compact() is a helpful wrapper that throws away empty elements of a list.

1:10 %>% keep(~. %% 2 == 0) #> [1] 2 4 6 8 10 1:10 %>% discard(~. %% 2 == 0) #> [1] 1 3 5 7 9 list(list(x = TRUE, y = 10), list(x = FALSE, y = 20)) %>% keep("x") %>% str() #> List of 1 #> $ :List of 2 #> ..$ x: logi TRUE #> ..$ y: num 10 list(NULL, 1:3, NULL, 7) %>% compact() %>% str() #> List of 2 #> $ : int [1:3] 1 2 3 #> $ : num 7

lift() (and friends) allow you to convert a function that takes multiple arguments into a function that takes a list. It helps you compose functions by lifting their domain from a kind of input to another kind. The domain can be changed to and from a list (l), a vector (v) and dots (d).

cross2() , cross3() and cross_n() allow you to create the Cartesian product of the inputs (with optional filtering).

A number of functions let you manipulate functions: negate() , compose() , partial() .

A complete set of predicate functions provides predictable versions of the is.* functions: is_logical() , is_list() , is_bare_double() , is_scalar_character() , etc.

Other equivalents functions wrap existing base R functions into to the consistent design of purrr: replicate() -> rerun() , Reduce() -> reduce() , Find() -> detect() , Position() -> detect_index() .

Design philosophy

The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R.