As the name of the suggests, cross-validation is the next fun thing after learning Linear Regression because it helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Everything is explained below with Code.

The Full Code :)

Fig:- Cross Validation with Visualization

Code Insight:

The above code is divided into 4 steps…

Load and Divide target dataset.

Fig:-Load the dataset

We are copying the target in dataset to y variable. To see the dataset uncomment the print line.

2. Model Selection

Fig:- Model Selection (LinearRegression())

Just to make things simple we will use Linear Regression. To learn about it more click on “Linear Regression: The Easier Way” Post.

3. Cross-Validation :)

Fig:- Cross Validation in sklearn

It is a process and also a function in the sklearn

cross_val_predict(model, data, target, cv)

where,

model is the model we selected on which we want to perform cross-validation

data is the data.

target is the target values w.r.t. the data.

cv (optional)is the total number of folds (a.k.a. K-Fold ).

In this process, we don’t divide the data in two sets like usual (training and test set) like shown below.

Fig:- Train set (Blue)and Test set (Red)

But we divide the dataset into equal K parts (K-Folds or cv). To improve the prediction and to generalize better. Then train the model on the bigger dataset and test on the smaller dataset.Let us say the cv is 6.

Fig:- 6 equal folds or parts

Now, the first iteration of model splitting will look something like this where Red is test & Blue is Train data.

Fig:- cross_val first iteration

The second iteration will look like the Figure below.

Fig:- cross_val second iteration

and so on till the last or 6th iteration that will look something like this figure below.

Fig:- cross_val sixth iteration

4. Visualize the Data with Matplotlib.

Fig:- Visualize with Matplotlib

For visualisation we are importing matplotlib library. Then making a subplot.

Creating scatter points with black (i.e(0,0,0)) outline or edgecolors.

Using ax.plot to give minimum & maximum values for both axis with 'k--' representing the type of line with the line width i.e. lw = 4.

Next, giving label to x and y axis.

plt.show() at last to show the graph.

The Result

Fig:- Prediction

This graph represents the k- folds Cross Validation for the Boston dataset with Linear Regression model.