Hyperplanes

Hyperplanes can be considered decision boundaries that classify data points into their respective classes in a multi-dimensional space. Data points falling on either side of the hyperplane can be attributed to different classes.

A hyperplane is a generalization of a plane:

in two dimensions, it’s a line.

in three dimensions, it’s a plane.

in more dimensions, you can call it a hyperplane.

Let’s consider a two-dimensional space. The two-dimensional linearly separable data can be separated by the equation of a line—with the data points lying on either sides representing the respective classes.

The function of the line is y=ax+by. Considering x and y as features and naming them as x1,x2….xn, it can be re-written as:

ax1−x2+b=0

If we define x = (x1, x2) and w = (a, −1), we get:

w⋅x+b=0

This equation is derived from two-dimensional vectors. But in fact, it also works for any number of dimensions. This is the equation for a hyperplane:

Finding the best hyperplane

By looking at the data points and the resultant hyperplane, we can make the following observations:

Hyperplanes close to data points have smaller margins.

The farther a hyperplane is from a data point, the larger its margin will be.

This means that the optimal hyperplane will be the one with the biggest margin, because a larger margin ensures that slight deviations in the data points should not affect the outcome of the model.

For linear data:

For non-linear data:

What is a large margin classifier?

SVM is known as a large margin classifier. The distance between the line and the closest data points is referred to as the margin. The best or optimal line that can separate the two classes is the line that has the largest margin. This is called the large-margin hyperplane.

The margin is calculated as the perpendicular distance from the line to only the closest points.

What are support vectors?

Since the margin is calculated by taking into account only specific data points, support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane.

Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.

Basically, support vectors are imaginary or real data points that are considered landmark points to determine the shape and orientation of the margin.

The objective of the SVM is to find the optimal separating hyperplane that maximizes the margin of the training data.