AirbnB uses R to scale data science
2016-04-06 11:54
525 查看
(This article was first published on Revolutions, and kindly contributed to R-bloggers)
Airbnb,
the
property-rental marketplace that helps you find a place to stay when you're travelling, uses
R to scale data science. Airbnb is a famously data-driven company, and has recently gone through a period of rapid growth. To accommodate the influx of data scientists (80% of whom are proficient in R, and 64% use R as their primary data analysis language), Airbnb
organizes monthly week-long data bootcamps for new hires and current team members.
But just as important as the training program is the engineering process Airbnb uses to scale data science with R. Rather than just have data scientists write R functions independently (which not only is a likely duplication of work, but inhibits transparency
and slows down productivity), Airbnb has invested in building an internal R package called Rbnb that implements collaborative solutions to common problems, standardizes visual presentations, and avoids reinventing the wheel. (Incidentally, the development
and use of internal R packages is a common pattern I've seen at many companies with large data science teams.)
The Rbnb package used at Airbnb includes more than 60 functions and is still growing under the guidance of several active developers. It's actively used by Airbnb's engineering, data science, analytics and user experience teams, to do things like move aggregated
or filtered data from a Hadoop or SQL environment into R, impute missing values, compute year-over-year trends, and perform common data aggregations. It has been used to create more than 500 research reports and to solve problems likeautomating
the detection of host preferences and using guest ratings
to predict rebooking rates.
The package is also widely used to visualize data using a standard Airbnb "look". The package includes custom themes, scales, and geoms for ggplot2; CSS templates for htmlwidgets and Shiny; and
custom R Markdown templates for different types of reports. You can see several examples in the blog post by Ricardo
Bion linked below, including this gorgeous visualization of the 500,000 top Airbnb trips.
Medium (AirbnbEng): Using R packages
and education to scale Data Science at Airbnb
Airbnb,
the
property-rental marketplace that helps you find a place to stay when you're travelling, uses
R to scale data science. Airbnb is a famously data-driven company, and has recently gone through a period of rapid growth. To accommodate the influx of data scientists (80% of whom are proficient in R, and 64% use R as their primary data analysis language), Airbnb
organizes monthly week-long data bootcamps for new hires and current team members.
But just as important as the training program is the engineering process Airbnb uses to scale data science with R. Rather than just have data scientists write R functions independently (which not only is a likely duplication of work, but inhibits transparency
and slows down productivity), Airbnb has invested in building an internal R package called Rbnb that implements collaborative solutions to common problems, standardizes visual presentations, and avoids reinventing the wheel. (Incidentally, the development
and use of internal R packages is a common pattern I've seen at many companies with large data science teams.)
The Rbnb package used at Airbnb includes more than 60 functions and is still growing under the guidance of several active developers. It's actively used by Airbnb's engineering, data science, analytics and user experience teams, to do things like move aggregated
or filtered data from a Hadoop or SQL environment into R, impute missing values, compute year-over-year trends, and perform common data aggregations. It has been used to create more than 500 research reports and to solve problems likeautomating
the detection of host preferences and using guest ratings
to predict rebooking rates.
The package is also widely used to visualize data using a standard Airbnb "look". The package includes custom themes, scales, and geoms for ggplot2; CSS templates for htmlwidgets and Shiny; and
custom R Markdown templates for different types of reports. You can see several examples in the blog post by Ricardo
Bion linked below, including this gorgeous visualization of the 500,000 top Airbnb trips.
Medium (AirbnbEng): Using R packages
and education to scale Data Science at Airbnb
相关文章推荐
- bzoj 4320: ShangHai2006 Homework
- RAID的简单介绍
- RAID基础知识
- 使用Genymotion调试出现错误INSTALL_FAILED_CPU_ABI_INCOMPATI
- mysql主从监控脚本含自动配置发Gmail邮件客户端
- Git提交错误:RPC failed; result=22, HTTP code = 411
- 关于移动浏览器中实现拨打电话,调用sms发短信,发送email等功能
- 游戏逻辑底层,MainLoop&&FSM&&MSG(三)
- main()方法
- Dynamics AX 2012 R2 AIF 内部异常
- 游戏底层逻辑,MainLoop&&FSM&&MSG(二)
- NOT NULL constraint for VARCHAR colume
- 游戏逻辑底层,MainLoop&&FSM&&MSG(一)
- Math---Trailing Zeros
- poj-1285 Combinations, Once Again(DP)
- Linux内核基础--事件通知链(notifier chain)good【转】
- Ancdroid Stdio上关于gradle project sync failed.Basic functionality(e.g.editing,debugging) will not work
- CodeForces - 616E Sum of Remainders (数论)大数取余求和 好题
- OpenJ_Bailian - 3421
- Dell服务器之RAID卡介绍及监控方法