The optimization technique

Phew, finally we’ve reached the climax. The issue with the squared error function was that it even took the large error values into consideration from the correctly classified points. We need to extract as much information as we can only from the wrongly classified points.

There are two cases where the straight line boundary would be wrong,

Case 1: When the straight line is too much into the red points area.

This is the case where lot of red point are misclassified and almost none of the green points are wrongly classified.

The perceptrons parameter set characterizing the straight line are ( 5, 3, -2 ).

The prediction of perceptron in this case for all the misclassified red points is 1, but their actual label is 1.

In this case the line has to be pulled back towards the green cluster.

This can be done by updating the parameters by subtracting the values of these wrongly classified points with them.

Here is the sequence of steps to be followed to optimize the parameters for this case,

initialize w1, w2 and b For all wrongly classified red points(that is when y = 0 and y^ = 1) # x1 represents the test score of misclassified red point

w1 = w1 - (a small number) * x1



# x2 represents the y-axis value of misclassified red point

# x2 represents the grades.

w2 = w2 - (a small number) * x2 # update bias

b = b - (a small number)

To make sure that the update to parameters w1, w2 and b are not very large, we multiply the values with a small number before the update. This small number is called learning rate and is represented by value α (alpha). Usually the learning rate will of order 0.1 to 0.0001. The convergence of the parameters to optimal values slows down with decrease in learning rate. But the large value of learning rate would also cause some problems. A reasonable value for learning rate has to set by trial and observation, the learning process doesn’t help you obtain a practical learning rate, such values are called hyper parameters.

Here is the snippet for updating parameter for the first case,

Let us run one iteration of the optimization and see whether the classifier gets better. The number of wrongly classified points should reduce if the optimization is effective. Here are the results before and after the classification,

The first plot corresponds to perceptrons prediction before the optimization, it wrongly classified 39 data points with its initial set of parameters

But after the optimization the performance significantly improves and the number of wrongly classified points reduces to just 6.

Here is the link to the notebook, make a copy, run it yourself and play around. Try running with various values of learning rate. With decrease in learning rate the optimization algorithm has to be run multiple times on the parameters.

Below are series of images which shows the state of the parameters and thus the classification after one run of optimization for various values of learning rates.