All Convolution Network: (https://arxiv.org/abs/1412.6806#)

Most modern convolution neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers. Now in a recent paper it was noted that max-pooling can simply be replaced by a convolution layer with an increased stride without loss in accuracy on several image recognition benchmarks. Also the next interesting thing mentioned in the paper was removing the Fully Connected layer and put a Global Average pooling instead.

Removing the Fully Connected layer may not seem that big of a surprise to everybody, people have been doing the “no FC layers” thing for a long time now. Yann LeCun even mentioned it on Facebook a while back — he has been doing it since the beginning.

Intuitively this makes sense, the Fully connected network are nothing but Convolution layers with the only difference is that the neurons in the Convolution layers are connected only to a local region in the input, and that many of the neurons in a Conv volume share parameters. However, the neurons in both layers still compute dot products, so their functional form is identical. Therefore, it turns out that it’s possible to convert between FC and CONV layers and sometimes replace FC with Conv layers

As mentioned, the next thing is removing the spatial pooling operation from the network, now this may raise few eyebrows. Let’s take a closer look at this concept.

The spatial Pooling (also called subsampling or downsampling) reduces the dimensionality of each feature map but retains the most important information.

For example, let’s consider Max Pooling. In case of Max Pooling, we define a spatial window and take the largest element from the feature map within that window. Now remember How Convolution works (Fig. 2). Intuitively the convolution layer with higher strides can serve as subsampling and downsampling layer it can make the input representations smaller and more manageable. Also it can reduce the number of parameters and computations in the network, therefore, controlling things like overfitting.

To reduce the size of the representation using larger stride in CONV layer once in a while can always be a preferred option in many cases. Discarding pooling layers has also been found to be important in training good generative models, such as variational autoencoders (VAEs) or generative adversarial networks (GANs). Also it seems likely that future architectures will feature very few to no pooling layers.

Considering all of the above tips and tweaks, we have published a Keras model implementing the All Convolutional Network on Github.