[This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Airbnb, the property-rental marketplace that helps you find a place to stay when you're travelling, uses R to scale data science. Airbnb is a famously data-driven company, and has recently gone through a period of rapid growth. To accommodate the influx of data scientists (80% of whom are proficient in R, and 64% use R as their primary data analysis language), Airbnb organizes monthly week-long data bootcamps for new hires and current team members.

But just as important as the training program is the engineering process Airbnb uses to scale data science with R. Rather than just have data scientists write R functions independently (which not only is a likely duplication of work, but inhibits transparency and slows down productivity), Airbnb has invested in building an internal R package called Rbnb that implements collaborative solutions to common problems, standardizes visual presentations, and avoids reinventing the wheel. (Incidentally, the development and use of internal R packages is a common pattern I've seen at many companies with large data science teams.)

The Rbnb package used at Airbnb includes more than 60 functions and is still growing under the guidance of several active developers. It's actively used by Airbnb's engineering, data science, analytics and user experience teams, to do things like move aggregated or filtered data from a Hadoop or SQL environment into R, impute missing values, compute year-over-year trends, and perform common data aggregations. It has been used to create more than 500 research reports and to solve problems like automating the detection of host preferences and using guest ratings to predict rebooking rates.

The package is also widely used to visualize data using a standard Airbnb “look”. The package includes custom themes, scales, and geoms for ggplot2; CSS templates for htmlwidgets and Shiny; and custom R Markdown templates for different types of reports. You can see several examples in the blog post by Ricardo Bion linked below, including this gorgeous visualization of the 500,000 top Airbnb trips.

Medium (AirbnbEng): Using R packages and education to scale Data Science at Airbnb