Code insight:

The above code is divided into 5 steps..

Load and preety print the dataset.

Fig: Importing the Dataset

from sklearn.datasets import load_boston

load = load_boston()

First line above imports the load_boston dataset from the datasets collection which is present in the sklearn library.

Second line just puts the load_boston in the load variable for further use. Simple enough…

import pprint

pprint.pprint(boston)

Importing the preetprint library, its used to make our output look preety and understandable and then we are just preety printing it.

Note :- There is no need to use the last two line of preety print when we are predicting. Since you don’t need to print the data file again, this preety print is only for your understanding purpose of the dataset file.

2. Create the DataFrame .

Fig: Creating two (DataFrame) array of data

In this we will create two DataFrame using the pandas library. Dataframe is just a fancy word for making the sets or arrays of data. The df_x contains the data or features of the houses with the columns = boston. features_names which is also an array present in the dataset for more clear understanding and the df_y contains the target prices respectively.

3. Selecting the Model (i.e. LinearRegression)

Fig: Selecting LinearRegression() model

Simple as it can be, we are importing the LinearRegression() model from the sklearn library and giving the model to the variable model for further use in the code.

4. Splitting Test and Train Datasets with randomness.

Fig: Train and Test Splitting

Here, we are using the train_test_split from the sklearn library. This function in above code requires four parameters the data DataFrame, target DataFrame, test_size, random_state (Just to shuffle the data).

Note: In the above line train_size is automatically given as 0.8 since test_size =0.2 therefore, test_size + train_size = 0.2+0.8 = 1

5. Train and Predict .

Fig: Train and predict the dataset

In the above code, fit (aka TRAINING )function takes two parameters x_train, y_train and train the model with the given datasets.

Next, we will predict the y_test from the x_test dataset using the selected model (i.e. LinearRegression() ) and put the pedeicted array of values in the result variable. and then we are just printing the fifth predicted value (i.e. result[5]) and the full y_test dataset. You should the 5 from something else just remember the counting starts from 0 not 1 in the list.

The Results

Fig: Results

As you can see, the first [ 25.43573299 ] is the prediction for the fifth element in the array i.e. 70 — — 24.2 , pretty close. If you are wondering what is this first line of random numbers uhh… . It is the serial number against the values of the y_test dataset which has been selected randomly ( in the 4th step) .

Just a gif of Linear Regression to understand it.