What We Learned Analyzing Hundreds of Data Science Interviews by Roger Huang | September 2, 2016

This post on analyzing hundreds of data science interviews originally appeared on the Springboard blog. Springboard is on a mission to make high-quality education accessible to everyone in the world.

Introduction

Top data science teams around the world are doing incredible work on some of the most interesting datasets in the world.

Google has more data on human interests than every 20th century researcher, while Uber seamlessly coordinates the itinerary and pricing of more than 1 million trips every day. With machine learning, and artificial intelligence, top data science teams are changing the way we ingest and process data, and they are coming up with actionable insights that impact the lives of millions.

What if there were common patterns between the interviews top data science teams were giving that would let you master the data science interview process? What if the specific differences between various teams and their interview practices could be enumerated so that interviewing with a top data science team were more akin to a science than an art?

At Springboard, we teach data science skills, and many of our students take our course because they are looking to start a data science career. This has led us to write up a guide to data science jobs and a guide to data science interviews in order to help our students take the next step to an ideal job in the field. We’ve always been fascinated by the work top data science teams are doing, and we’ve tried to help our graduates understand what it takes to break into those teams.

Nobody had aggregated different interview stories from those companies so that you could have the data you’d need to ace the data science interview process. We sought to change that.

We took it upon ourselves to source data with Glassdoor testimonials of different data science interview questions from a selection of companies whose data science teams are considered world-class.

You’ll learn what an interview with top data science teams looks like, and how you can join those teams. Here’s everything we could learn about the data science interview process at Google, Airbnb, Facebook, Uber and other top companies.

Things we learned

We started this analysis because we wanted to understand how top data science teams interview and how you should prepare for that process. We’ve managed to condense what we’ve learned into six actionable points.

Research research research. Spend the time to understand what the data science team in each organization is working on. You’ll do better in the interview process, and you’ll be able to relate better to future colleagues. You’ll be asked a lot of situational and product questions that have to do with current work the company is undertaking, whether it’s People You May Know with Linkedin or determining how drivers should be matched with passengers with Uber. Prepare for four categories of data science questions: statistics and probability questions, programming questions, business thinking questions, and culture/role fit questions. Practice statistical modeling/reasoning, describing machine learning concepts, work in SQL, R, and Python from the basics to more advanced work under time constraints. The data science interview process is pretty standard across companies: phone screens, tests, and then on-site interviews. You’ll want to make sure you come off well in interviews and time-constrained assignments. Practice using SQL, R, and Python under time constraints. A lot of take-home assignments try to catch you by surprise on this and test your familiarity with the languages with very little time. Showing you can think in frameworks like Hadoop at speed is impressive for these hiring companies, but don’t forget the basics too! Sometimes companies will ask basic statistical questions to make sure you’re on top of your game. Get a referral. Four out of nine companies we surveyed had internal referral as the top source of interviews (Google, Uber, Facebook, Airbnb), and overall, it was the second largest source of interviews. You’ll want to get to know people in the company and get them to advocate for you rather than just applying online. Prepare your story. You’ll be asked to go over your past work in detail. Be prepared to run over everything you’ve done with as much specificity as possible, from the tools you used, to why you made different decisions. Be ready to weave a coherent narrative of how the amazing things you did improved business outcomes. Prepare for a long, drawn-out process. Interviewing for a data science position can take months and multiple stages. Make sure you’re ready for the wait.

Above all else, we learned that the data science interview process is a complex beast that must be tackled with precise and practiced action.

Categories of Data Science Interview Questions

In the 554 real data science questions that are offered by Glassdoor respondents, we found a treasure trove of data on what skills data science teams were testing. Amongst the largest categories of questions we spotted were the following:

No 1: Statistics and Probability Questions

Statistics and probability are often the meat of data science work. These questions are designed to test your thinking and how you reason with uncertainty, an essential skill for any data scientist to master.

Here’s an article to help you with statistics and probability questions: How Bayes Theorem, Probability, Logic and Data Intersect

Here’s a book to help you with statistics and probability questions: Think Stats, Probability and Statistics for Programmers

Here’s an interactive course to help you with statistics and probability questions: Probability and statistics with KhanAcademy

No 2: Programming Questions

If statistics and probability are the meat of data science work, you can consider programming questions the potatoes that must come with the main meal. Data science requires dealing with data at scale, something that will require programming to automate the vast amount of work required.

Here’s an article to help you with programming questions: Data science sexiness: Your guide to Python and R, and which one is best

Here’s a book to help you with programming questions: Cracking the Programming Interview

Here’s an interactive course from DataCamp to help you with programming questions: Intro to Python for Data Science

No 3: Business Thinking and Case Studies

The third plank of data science is explaining your findings in a way that drives business action and outcomes. These questions test your thinking about what might be causing the behaviors you observe.

Here’s an article to help you with business thinking and case studies questions: Tips for Data Scientists: Think Like a Business Executive

Here’s a book to help you with business thinking and case studies questions: Data Science for Business

Here’s an interactive course to help you with business thinking and case studies questions: Data Analytics for Business

No 4: Culture/Role Fit Questions

The fourth category of question asks about your fit with the role and the culture of the hiring organization. Treat this like a behavioral interview, and be honest about your expectations.

How do top data science teams interview?

After examining the categories of data science interview questions 500+ data scientists were asked,we decided to look more deeply at a few data science teams we knew were highly respected across the industry –from Google to LinkedIn. These were large companies that could afford to spend on top data science talent and had a large collection of data science interview reviews, which allowed us to explore their interview process in-depth.

Of the selected processes, Google had the most difficult data science interview process on average while JPMorgan had the least. Google’s challenge resides not only in the amount of questions but the number of people who are assigned to interview you, according to Glassdoor respondents.

Out of a sample of 113 respondents collated by aggregating company profiles on Glassdoor, 44% applied online to get their interview, and about 33% used an internal referral. Considering most people apply online, given the high barrier of entry of getting an internal referral, it’s an indication of just how important an internal referral can be.

The company with the most positive reviews was Google, with almost 60% of respondents having a positive experience. At the other end, Yelp and JP Morgan had zero positive reviews, though it should be noted that was over a limited sample of nine respondents between the both of them.

We’ve found that former students of ours that use internal referrals are 8 times more likely to get an interview than those that apply online.

Most data science interviews at Facebook went positively, with 49% of respondents having a good time compared to 23% who didn’t. Most candidates were referred in by current employees or a recruiter. The interview process is rated as slightly above average at a difficulty rating of 3.4 on a scale of 1 to 5, with 5 being the most difficult.

The standard process was one phone screen, one take home data challenge, one shared screen SQL challenge, and then an on-site phase with multiple 1:1 i nterviews with everybody on the team. While the beginning phases of the interview process focused mostly on SQL, later parts focused heavily on machine learning and building an ads model (an obvious focus of Facebook). Open-ended scenario questions about how you’d design a specific Facebook feature were included as well, with a special focus on product management as well as data science.

The process has been described as long and drawn-out by some with an average waiting period of 3+ months, so don’t be surprised if it takes a while.

What the data science team at Facebook is doing: The research team at Facebook shares what they’re working on, including an in-depth analysis on what pushes news cycles and how blind people interact with social networking sites.

Uber data science interviews were somewhat negative, with 61% of submitters saying t hey didn’t have a great experience. A high amount of people who got to the interview stage applied online, about the same percentage as those who got in through referral (35%). The interview process is rated as average in difficulty with a 3.1 rating.

This was a standard process with a phone call screen, a homework assignment that was timed to be done in two hours (split into SQL analysis and an open-ended problem with a sample dataset), and then an on-site interview series with a mix of technical and behavioral questions.

Technical questions are specific to Uber’s problems for Uber data science interview questions: you’ll be asked to deal with Poisson distributions, time series analysis, and problems related to how a driver should algorithmically accept bookings. Uber’s data science team is focused on optimizing fast, time-sensitive interactions and they interview accordingly.

What the data science team at Uber is doing: This piece delves into the day-to-day of data science at Uber with Emi Wang, a current employee, who talks about alternating between writing production code, doing business analysis, and creating m odels for new projects, including reconciling supply and demand for Geosurge, Uber’s internal engine for surge pricing.

LinkedIn interviews are largely positive, with a ratio of double the number of positive responses to negative ones. Most candidates came in through online applications, so try your luck there! The interview process is rated as slightly below average in difficulty at a ranking of 2.8.

A LinkedIn recruiter described the process as being a phone screen with a recruiter, a second phone screen with a team lead, then a fly-in interview. A lot of candidates received a take-home data science assignment that took anywhere between three and four hours.

LinkedIn data science interview questions revolved around areas of interest for LinkedIn, such as predicting employee salaries or working on features that have already been built (ex: People You May Know). Knowing Python and machine learning is something the LinkedIn team values strongly, though that will be tested more at later stages. Earlier stages are designed to weed out weaker candidates through SQL and data mining questions.

What the data science team at LinkedIn is doing: Former LinkedIn Director of Product Data Science Daniel Tunkelang gave a brief overview of everybody on the product data science team at LinkedIn, and what they were working on in 2012. This includes updating the network stream so that it is more relevant for users and better at representing job titles.

Twitter’s data science interview process was largely neutral with a response rate of 45% neutral and both 27% positive and 27% negative, and most applicants came in from online applications. The Twitter interview process was rated more difficult than average with a rating of 3.5. Be prepared to be challenged.

People reported being replied to pretty quickly, though it was described as quite a long process. First is an online coding test, then two phone calls, one on programming, and one on statistical reasoning. Then came an on-site interview which comprised two Skype calls, one focused on data science, and one focused on coding.

The coding questions were quite routine for a regular software engineering interview, but Twitter’s data science interview questions were open-ended and focused on what Twitter does now. Candidates were tested on their knowledge of A/B split tests, and they used collabedit.com to do remote coding challenges. One candidate wrote that they received a lot of whiteboard questions on machine learning theory and algorithm design.

What the data science team at Twitter is working on: This Medium article s hares the personal experience of a data scientist who has spent two years doing data science at Twitter. It goes over work like logging why certain countries have higher rates of multiple accounts and causal factors that might go into that, and how many users are eligible for different notification types.

Many people reported having a positive experience with interviews at Airbnb, with 36% being positive and 27% negative. Most interviews came to be because of employee referrals: Airbnb seems to strongly weigh their own internal referral system. The interview is rated as more difficult than average with a 3.5 rating.

The interview process is actually one of the few that have been described publically at length, most notably by the head of data analytics at Airbnb. He describes filtering for people who have worked on data problems before with the phone screen, doing a basic data challenge, then an in-house data case crack, followed by four interviews focused on culture fit and ability to communicate with business partners.

The Glassdoor reviews confirm that this is the process in place, with the take-home data challenge being focused on A/B split tests and the significance of certain results and the in-house data challenge focused on statistical modelling. You’ll want to be familiar with Python and R as the challenges are quite basic, but timed to be really quick, so you’ll have to do your best with little time. The Airbnb data science team differentiates itself from other teams in this analysis by caring deeply about how you think about the Airbnb product and when you’ve used it, so be prepared to field questions about your usage of the Airbnb app and what you think about it.

What the data science team at Airbnb is working on: This piece describes how the data team is democratizing a data-driven culture across the Airbnb team.

Most people applied online to get an interview at Yelp. The interview process was rated a slightly above average one in terms of difficulty with a 3.3 rating.

The process here is as follows: a timed online challenge, a phone screen, then an on-site interview with 4 individual face-to-face interviews.

Yelp has a fairly open culture that prides sharing the different tools they use, akin to Google. Yelp data science interview questions are fairly standard.

What the data science team at Yelp is working on: This article describes a sample project in which deep learning was used to classify restaurant images to determine whether they were images of food, or of the interior/exterior of the restaurant.

Most Google interviews were positive, with 60% of submitters reporting a positive experience. Employee referral was the top way to get an interview with 50% of respondents claiming that was their path. The interview process was rated the hardest out of any, with a difficulty rating of 3.7.

This followed a process of an initial phone screen, a phone interview that was technically focused, then an intense in-person interview cycle with hour-long interviews with several current employees of Google. The phone screen is a mix of basic computer science and statistics questions, with a focus on parsing data in R and SQL questions. Google’s data science interview questions focus on how well you can slice and dice data.

What the data science team at Google is working on: The “unofficial” Google data science blog shares a wealth of projects that the team is working on, and includes a primer on how to join Google as a data scientist.

JPMorgan candidates came evenly from campus recruiting, online application, and a recruiter. The interview process was rated below average in difficulty at 2.7.

The process is an initial 30-minute phone call followed by video interviews with the hiring manager and a more junior person, then face-to-face interviews with several people. JPMorgan is mostly interested in testing for financial knowledge as well as knowledge of machine learning. They also placed an emphasis on communication with business teams, at one point asking a candidate how they would explain linear regression to a non-technical team member.

What the data science team at JPMorgan is working on: JP Morgan uses Hadoop to take large amounts of customer and transactional data and combine it together with social media mentions to get a complete view of the customers they serve.

Conclusion

The world of data science is vast with potential, as companies look to leverage their data for insights that will help them compete on the frontiers of the 21st century economy. With the insights we’ve garnered here, we hope you can turn this knowledge into actionable steps–and break into a data science career at a top team.