As we can see, it is impossible to draw a line that separates the blue points from the red points. Instead, our decision boundary has to have a rather complex shape. This is where multi-layer perceptrons come into play: They allow us to train a decision boundary of a more complex shape than a straight line.

Computational graph

As their name suggests, multi-layer perceptrons (MLPs) are composed of multiple perceptrons stacked one after the other in a layer-wise fashion. Let’s look at a visualization of the computational graph:

As we can see, the input is fed into the first layer, which is a multidimensional perceptron with a weight matrix $W_1$ and bias vector $b_1$. The output of that layer is then fed into second layer, which is again a perceptron with another weight matrix $W_2$ and bias vector $b_2$. This process continues for every of the $L$ layers until we reach the output layer. We refer to the last layer as the output layer and to every other layer as a hidden layer.

an MLP with one hidden layers computes the function

$$\sigma(\sigma(X \, W_1 + b_1) W_2 + b_2) \,,$$

an MLP with two hidden layers computes the function

$$\sigma(\sigma(\sigma(X \, W_1 + b_1) W_2 + b_2) \, W_3 \,,$$

and, generally, an MLP with $L-1$ hidden layers computes the function

$$\sigma(\sigma( \cdots \sigma(\sigma(X \, W_1 + b_1) W_2 + b_2) \cdots) \, W_L + b_L) \,.$$

Implementation

Using the library we have built, we can now easily implement multi-layer perceptrons without further work.



