Let’s start with a simple problem, we suppose that we have a small dataset with house prices for a specific area in a city, the database contains two fields, the size of the house and its price (SIZE, PRICE), and I would like to know the price of a house with a specific size, the problem is that I don’t have that size in my dataset, what should I do ?

We already know from the title that the solution is linear regression, but to explain more easier, I’ve a collected a little dataset that contains house prices, in the table below a snippet from the dataset:

Table 1 : House prices dataset

Visualization helps us a lot in identifying patterns in data, that’s way to have a better view to our dataset, I m going to plot it using matplotlib python library :

Figure 1 : House prices

From the plotting we can see that the price grows with the size, but the points don’t make a prefect line that can help us predict the price of a new size, so we need to find a linear function h(x) that passes next to all the points but not necessary over them, we call the function the hypothesis:

Equation 1 : the hypothesis

θ0 and θ1 are unknown variables, that we need to identify so we can have the linear function h(x). In figure 2, the lines in green are the distances between the predictable and the real price, we need to find a combination of (θ0, θ1) such that the sum of all distances is the smallest one, in a mathematical notation :

Equation 2 : The objective function

In the equation 2, m is the size of our dataset, Xi is the ith price and Yi is the ith size in the dataset, we call J the error function (or the objective function) that we need to minimize.

Figure 2 : Distance between the hypothesis and dataset

There are other error functions or estimators in statistics that we can use, but in our case we’ll use the MSE or the mean squared error estimator, because it will help us find our unknowns parameters more easier, our function will become :

Equation 3 : MSE estimator

The estimator J takes two arguments, which means it’s a 3D function, the figure 3 shows how the function looks like in a 3D graph, our goal here is to find the minimum value, which is the lowest point in the graph below, imagine putting a ball inside the graph, the ball will slide into the bottom of the shape.

Figure 3 : J(θ0, θ1) plotting

To find the lowest point in the shape, or in another word minimizing the objective function, we’ll use the gradient descent algorithm, which is very simple to understand. To reach the bottom of the shape, we will choose randomly a point in the graph, that’s mean setting θ0 and θ1 to a random value, at that point we need to decide, do we need to go up or down ? In math to know if the function is increasing or decreasing we use the gradient or the slope, so we need to calculate the derivative of the function J at some point and update our position in the shape by subtracting the gradient from θ0 and θ1, after n iteration J(θ0, θ1) will be the lowest value possible.

Below we have the gradient descent algorithm for a hypothesis with one parameter h(x), we can use 𝛂 to change the speed of the update, we should choose it carefully, so we don’t over-fit the line with the points, or in another word, our ball will not stabilize in the bottom of the shape if we throw it with a high speed, 𝛂 is mostly is between 0 and 1.

Loop {

for i = 1 to m {

θ₀ := θ₀ - 𝛂(h(x⁽ⁱ⁾) - y⁽ⁱ⁾)

θ₁ := θ₁ - 𝛂(h(x⁽ⁱ⁾) - y⁽ⁱ⁾)x⁽ⁱ⁾

}

}

Now we have reached the fun part, below, an implementation with python, using numpy, pandas and matplotlib. LinearRegression class takes three arguments the dimension of the hypothesis or the number of parameters, in our case we have just one, 2nd argument is the iteration step or 𝛂 the last argument is max_iteration which is max number of iteration in the main loop from the above algorithm. train method takes two arguments the training input and expected output, in our case the input is the size and output is the price from our dataset, finally we have __call__ which is the override of the operator (), it will give us the possibility to call our object as a function and it will be hypothesis function.

The animation below shows our hypothesis state as a line after each iteration :