A sneak peek into what Linear Regression is and how it works.

Linear regression is a simple machine learning method that you can use to predict an observations of value based on the relationship between the target variable and the independent linearly related numeric predictive features.

For example: Imagine you have a data-set that describes key characteristics of a set of homes like land acreage, number of storeys, building area, and sales. Based on these features and the relationship with the sales price of these homes, you could build a multivariate linear model that predicts the price a house can be sold for based on its features.

Linear regression is a statistical machine learning method you can use to quantify and make predictions based on relationships between numerical variables which assumes that the data is free from missing values and outliers.

It also assumes that there’s a linear relationship between predictors and predictants & that all predictors and independent of each other.

Lastly, it assumes that residuals are normally distributed.

Ready for a mini-project?

We have all the libraries we need in our Jupyter Notebook. Now let’s set up our plotting perimeters. We want matplotlib to plot out inline within our Jupyter Notebook, so we will say percentage sign matplotlib inline and then let’s just set our dimensions for our data visualizations to be 10 inches wide and eight inches high.



import numpy as np import pandas as pd import matplotlib.pyplot as plt import sklearn

from pylab import rcParams from sklearn.linear_model import LinearRegression from sklearn.preprocessing import scale %matplotlib inline rcParams['figure.figsize']=10,8

-So we’re just going to create some synthetic data in order to do a linear regression. Let’s first create a variable called ‘rooms’. We’re going to set rooms equal to two times a set of random numbers (so we’re going to need to call the random number generator which is going to be np.random.randn) and we’ll pass in how many ever values we want- in this case it will be 100. We pass in a one and say plus three. This is the equation we’re using to generate random values to populate the rooms field or to create a synthetic variable that represents the number of rooms in a home.



rooms=2*np.random.rand(100,1)+3 rooms[1:10] array([[4.04467357], [3.77241135], [3.14321164], [4.48142986], [3.18493126], [3.8132922 ], [4.72655406], [3.08916389], [3.89772928]])

Now, let’s create a synthetic variable called ‘price’. We’ll say that price is equal to 265 plus six times the number of rooms plus the absolute values (we call the abs function). The absolute value of, & again we’re going to call a random number generator, so that’s np.random.randn and 100 values, a pass of one, and then let’s just take a look at the first 10 records, so we’ll say price one through 10, run this.



price=265+6*rooms +abs(np.random.randn(100,1)) price[1:10] array([[290.20050075], [287.83631918], [284.26968068], [292.46209605], [285.20161696], [288.07388113], [293.77699261], [284.59783984], [289.71316513]])

Now, let’s create a scatter plot of our synthetic variables just so we get an idea of what they look like and the relationship between them. So to do that we’re going to call the plot function- plt.plot and we’ll pass in rooms in price. price is going to be on our y-axis and rooms is going to be on our x-axis. Let’s also pass a string that reads r hat, this specifies that a point plot should be generated instead of the default line plot.



plt.plot(rooms,price,'r.') plt.xlabel("no. of rooms,2020 Average") plt.ylabel("2020 Avg home price") plt.show()

To see the plot see the cover image:) :)

What this plot says is, as the number of rooms increase, the price of the house increases.

Makes sense, right?

So now, let’s just do a really simple linear regression. So for our model here, we’re going to use rooms as the predictor, so we’re going to say, x is equal to rooms and we want to predict for the price, so y is going to be equal to price. Let’s instantiate a linear regression object, we’ll call it LinReg and then we’ll say LinReg is equal to LinearRegression and then we’ll fit the model to the data. So to do that we will say LinReg.fit and we’ll pass in our variables x and y.



X=rooms y=price LinReg= LinearRegression() LinReg.fit(X,y) print(LinReg.intercept_,LinReg.coef_) [265.39215904] [[6.10708427]]

Holding all other features fixed, a 1 unit increase in Rooms is associated with an increase of 6.10708427 in price

The intercept (often labeled as constant) is the point where the function crosses the y-axis. In some analysis, the regression model only becomes significant when we remove the intercept, and the regression line reduces to Y = bX + error. A regression without a constant means that the regression line goes through the origin wherein the dependent variable and the independent variable is equal to zero.



print(LinReg.score(X,y)) 0.9679030603885265

-Our linear regression model is performing really well! Our r squared value is close to 1 and that’s a good thing!

This was just a small sneak peek into what Linear Regression is. I hope you got an idea as to how Linear Regression works through the mini-project!

Feel free to respond to this blog below for any doubts and clarifications!