Data Science Interview Questions and Solutions — Linear and Logistic regression

Linear Regression and Logistic Regression

Linear and Logistic regression are the most commonly used ML Algorithms. Usually, in a data science interview, at least one or two questions can be expected on this topic. It is the basis of many different ML Algorithms so if you make a mistake in giving these answers during an interview, it might be the end of the interview.

Some questions on Linear and Logistic regression frequently asked in the interviews:

Explain Linear Regression in Layman’s terms What is Linear Regression and Logistic regression? What is the role of Linear Regression in EDA (Exploratory Data Analysis) How do you know which regression model you should use? (Given a Dataset) Analyze this dataset and give me a model that can predict this response variable.

Source: Wikipedia

The cost of one pen is x$ . The cost of ten pens is 10x$ . This is the most classic layman’s form of linear regression. The simplest form of the regression equation with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent variable score, c = constant, b = regression coefficient, and x = score on the independent variable. In our pen example, c=0, y is the cost of pens and x is the number of pens. If we know the unit cost of one pen b we can calculate the cost of any number of pens. A complex form of linear regression is used in housing price predictions.

For any scenario based problem in an interview, it is an easy mistake to start with a complex ML Algorithm. Most interviewee’s make the mistake of starting with something that the problem resembles to. They may start with neural networks or SVMs. ALWAYS start with linear/logistic regression if possible. This helps you level set on the most basic benchmark performance for the solution. Approach that question like a programming interview where you start with a benchmark and you proceed to a more optimized solution.

Linear regression is used for continuous targets while logistic regression is used for binary targets as sigmoid curve in the logistic model forces the features to either 0 or 1.

This topic brings to light two types of supervised learning algorithms: Classification (Logistic Regression) and Regression (Linear Regression). Exploratory data analysis is something every Data Scientist does on a per project basis. It is the analysis done before the application of a predictive model on a dataset. During EDA we find different characteristics of the data set, plot graphs and decide on some features we will be using. Also we are getting an idea about how to prepare your data, what challenges might be there (feature selection), model measurement, what should we use Accuracy, Precision-Recall, ROC AUC or Mean Squared Error and Pearson Correlation? This usually should start by using Linear/Logistic Regression type of models. The complex models will become clearer if the simple models are benchmarked revealing all characteristics of the dataset. Sometimes, linear/logistic regression might be give you great benchmark results. For the MINST dataset logistic regression generates 95% accuracy which is a great outcome for preliminary analysis. For selecting between different regression models there are some articles which explain the process here.

It is worth investing time in knowing more deeply about Linear and Logistic regression. It is important to know the derivations for both Linear and Logistic regression. They are used as the base to a lot of ML models and hence, most interviewers want to dig deeper into these basic models. Neural networks for example, each neuron which is a building block of the network is a logistic regression. Each neuron has the input, the weights, the bias we do a dot product to all of that and then apply a non linear function.

Conclusion

The Linear and Logistic regression models are an indispensable part of a Data Scientist’s problem solving toolkit and hence and indispensable part of a Data Science Interview as well. These basic algorithms help you do better data analysis, learn basics of ML and also understand Neural Networks well.