A curated list of 70 data science interview questions

Interviewing for a role as a data scientist or analyst? You've come to the right place! Below we've curated a list of data science interview questions from multiple sources to help make your preparation easier.

For a data science or data analyst interview, the interviewer will ask a wide range of topics covering statistics, programming (Python and SQL), data modeling (including machine learning), and overall business acumen. This guide contains 70 data science interview questions, broken out by high-level topics.

If you're interested in practicing further (and having the option to receive solutions), sign up for our email newsletter, where we send a few interview questions per week.

Statistics

Statistics are the guiding principles to collection, organizing, and interpreting data, sounds pretty core to data science huh? Below are some data science interview questions covering statistics.

What is the importance of the Central Limit Theorem? What is sampling? Can you provide an example of a sampling method? Can you provide an example of a time in the past where you needed to use sampling? What is Type I error and how is it different from Type II error? How would you explain a linear regression to a non-technical person? What are the assumptions of linear regression? What is selection bias? How would you explain a logistic regression to a non-technical person? Be prepared to answer conditional probability questions and Bayesian probability questions. What are the assumptions of a logistic regression? Explain collinearity to me. What are key factors to running a successful A/B test? What is a confidence interval and how do you interpret it? How would you estimate the disease probability in a city given the probability is very low nationwide? If you have a monthly collection of time series data, how can you tell if there is a "significant" difference between this month and previous month data? In any 15-minute interval, there is a 20% probability that you will see at least one shooting star. What is the probability that you see at least one shooting star in the period of an hour? You have an 50-50 mixture of two normal distributions with the same standard deviation. How far apart do the means need to be in order for this distribution to be bimodal? A couple tells you that they have two children, at least one of which is a girl. What is the probability that they have two girls? What is the probability of rolling a 4 or 7 for two 6 sided dice? On a dating site, users can select 1 out of 10 adjectives to describe themselves. A match is declared between two users if they match on at least 4 adjectives. Given a pool of 10 independent users, what is the probabilty that at least two will be a match? You have two coins, one of which is fair and comes up heads with a probability 1/2, and the other which is biased and comes up heads with probability 3/4. You randomly pick coin and flip it twice, and get heads both times. What is the probability that you picked the fair coin?

Modeling/Machine Learning

Understanding the fundamentals of modeling data will be important for your data science interview. A lot of data science is interpreting large data sets, and making sense of what is happening. Modeling can play a pivotal role in efficiently making sense of large, multidimensional data sets.

Can you explain the process of developing a ML algorithm. How would you effectively represent data with 5 dimensions? How is kNN different from k-means clustering? How would you create a logistic regression model? What is precision and recall? How do they relate to the ROC curve? Explain the difference between L1 and L2 regularization methods. What is one way that you would handle an imbalanced dataset that’s being used for prediction? Do you think 50 small decision trees are better than a large one? Why? Is it better to spend 5 days developing a 90% accurate solution, or 10 days for 100% accuracy? What is random forest? If you have two algorithms, how do you know one performs better than the other? Given location data of golf balls in games, how would construct a model that can advise golfers where to aim? You have data on all purchases of customers at a grocery store. Describe to me how you would program an algorithm that would cluster the customers into groups. How would you determine the appropriate number of clusters to include? What error metric would you use to evaluate how good a binary classifier is? What if the classes are imbalanced? What if there are more than 2 groups? What are various ways to predict a binary response variable? Can you compare two of them and tell me when one would be more appropriate? What’s the difference between these? (SVM, Logistic Regression, Naive Bayes, Decision Tree, etc.) Given a database of all previous alumni donations to your university, how would you predict which recent alumni are most likely to donate?

Programming

Another large portion of data science is aggregating and processing the data. Most data sets are super messy, and require a lot of pre-processing, and this is why employers want data scientists to understand the fundamentals of programming. For these problems, explaining your logic outloud to the interviewer is the most important thing to do. Once you come up with a solution, you’ll probably be asked to explain the run-time of your algorithm and come up with a way to make your algorithm run faster. Below are some practice data science interview questions covering Python and general programming.

What is your favorite programming language and why? Tell me about an algorithm you’ve recently created. How would you clean a data set in Python (or any other language)? {Whiteboard} Given a list A of objects, and another list B which is identical to A except that one element is removed, how would you find which element is missing? {Whiteboard} Write a Python function that displays the first n Fibonacci numbers. {Whiteboard} Write a function that prints the least integer that is not present in a given list and cannot be represented by the summation of the sub-elements of the list. Given a binary tree and a node, print all cousins of given node. Note that siblings should not be printed. Given a list of tweets, determine the top 10 most used hashtags. You have a stream of data coming in of size n, but you don’t know what n is ahead of time. Can you write an algorithm that will take a random sample of k elements? Can you write an algorithm that can calculate the square root of a number.

SQL

As a data scientist, being able to pull data from the source is very important to a lot of employers. Knowing basic SQL will help you glide right through the database questions of your interview with ease.

What does the GROUP BY function in SQL do? What is the difference between an inner join and a union? How do you eliminate duplicate rows from a query result? Given a impressions table with ad_id, click (a boolean indicator that the ad was clicked), and date, can you write a SQL query that will tell me the monthly click-through-rate all ads? Can you write a query that returns the name of each department and a count of the number of employees in each? Below are the tables' schema: EMPLOYEES containing: Emp_ID (Primary key) and Emp_Name

EMPLOYEE_DEPT containing: Emp_ID (Foreign key) and Dept_ID (Foreign key)

DEPTS containing: Dept_ID (Primary key) and Dept_Name

Behavioral + Cultural fit

You can be a great data scientist or data analyst, but if you don’t mesh with the culture that could prevent you from succeeding in the role, or being hired in the first place. Because of this, it's important to have a few situations prepared prior to interviewing that highlight your character and capabilities! Keep it cool, and let your personality shine through!

Introduce me to something you’re passionate about. Tell me about a time you failed, and what you have learned from it. What have you done in your previous job that you are really proud of? What’s a project you would want to work on at our company? Tell me about a time where you resolved a conflict. Tell me about a time when you took initiative. Suppose you encounter a tedious, boring task. How would you deal with it and motivate yourself to complete it? What are your top 5 predictions for the next 20 years? What personality traits do you butt heads with? If you had one superpower, what would it be?

Interested in practicing for data scientist or analyst interviews?

We send 3 questions each week to thousands of data scientists and analysts preparing for interviews or just keeping their skills sharp. You can sign up to receive the questions for free on our home page.