It’s worth noting that when activation functions are applied to convolutional layers they are applied to every element individually and that when bias is applied the same bias term is applied to every element ie. there is only one bias term per filter.

When performing image classification, a series of convolutional layers is typically followed by a dense layer as the final layer. This is because convolutional layers produce an image as output so a dense layer is required to take the image output of the final convolutional layer and produce the number(s) that the desired output (typically a class) consists of.

The outputs of all the neurons/filters in the last convolutional layer have to be “flattened” into 1D data (turned into one really long row of data instead of the rows and columns of an image) prior to being passed as input to the dense layer as the weights of dense layers are 1D (a single row of many weights).

Other parts of a CNN

Pooling layers

While convolutional layers are very useful every neuron produces an image as output so for just the first layer alone (presuming 256 neurons on the first layer) you are looking at 256 images which have to be stored in RAM, add the images of all the other layers in and you are looking at very high memory usage.

Pooling layers resolve this by reducing the image size as the network only needs to know that shapes are occurring approximately in places as where features are relative to each other may be useful but the precise positons are irrelevant (remember we want location invariance for the object we are trying to detect).

Pooling layers can also help to improve model accuracy as well as reducing memory usage and therefore training time, the precise reason why this occurs is unknown but common theories include it helping to stop overfitting, it providing further location invariance and (in the case of max polling) it meaning the next layer on receives the most “interesting”/”useful” input.

The two main types of pooling layers are max pooling (which keeps the maximum of the values it’s picking between) and average pooling (which takes the mean of the values it’s picking between). In the early days of neural networks average pooling was typically used as logic would suggest that it should be the better option but these days max pooling is typically used as it is the one that performs better in practice.

Pooling layers are normally either placed after every convolutional layer or after every other convolutional layer.

Further reading/watching

Please share this post on social media if you enjoyed it/found it useful. If there are any inaccuracies in this article please ensure to let me know so I can improve my knowledge and avoid giving people wrong information. Please feel free to leave feedback in the comments so I know how to improve for the next post, also feel free to ask questions in the comments.