ReLU

This is the standard ReLU activation function, it basically thresholds all incoming features to be 0 or greater. In simple English, when you apply relu to the incoming features, any number less than 0 is changed to zero, while others are kept the same.

MaxPool2d

This layer reduces the dimension of the image by setting the kernel_size to be 2, reducing our image width and height by a factor of 2. What it essentially does is take the maximum of the pixels in a 2 x 2 region of the image and use that to represent the entire region; hence 4 pixels become just one.

Linear

The final layer of our network would almost always be the linear layer. It’s a standard, fully connected layer that computes the scores for each of our classes — in this case ten classes.

Note that we have to flatten the entire feature map in the last conv-relu layer before we pass it into the image. The last layer has 24 output channels, and due to 2 x 2 max pooling, at this point our image has become 16 x 16 (32/2 = 16). Our flattened image would be of dimension 16 x 16 x 24. We do this with the code:

output = output.view(-1, 16 * 16 * 24)

In our linear layer, we have to specify the number of input_features to be 16 x 16 x 24 as well, and the number of output_features should correspond to the number of classes we desire.

Note the simple rule of defining models in PyTorch. Define layers in the constructor and pass in all inputs in the forward function.

That hopefully gives you a basic understanding of constructing models in PyTorch.

Modularity

The code above is cool but not cool enough — if we were to write very deep networks, it would look cumbersome. The key to cleaner code is modularity. In the above example, we could put convolution and relu in one single separate module and stack much of this module in our SimpleNet.

To do that, we first define a new module as below

Consider the above as a mini-network meant to form a part of our larger SimpleNet.

As you can see above, this Unit consists of convolution-batchnormalization-relu.

Unlike in the first example, here I included BatchNorm2d before ReLU. Batch Normalization essentially normalizes all inputs to have zero mean and unit variance. It greatly boosts the accuracy of CNN models.

Having defined the unit above, we can now stack many of them together.

That’s a whole 15 layer network, made up of 14 convolution layers, 14 ReLU layers, 14 batch normalization layers, 4 pooling layers, and 1 Linear layer, totalling 62 layers! This was made possible through the use of sub-modules and the Sequential class.

The above code is made up of a stack of the unit and the pooling layers in between.

Notice how I made the code more compact by putting all layers except the fully connected layer into a sequential class. This further simplifies the code in the forward function.

self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,self.unit12, self.unit13, self.unit14, self.avgpool)

Also the AvgPooling layer after the last unit computes the average of all activations in each channel. The output of the unit has 128 channels, and after pooling 3 times, our 32 x 32 images have become 4 x 4. We apply the AvgPool2D of kernel size 4, turning our feature map into 1 x 1 x 128.

self.avgpool = nn.AvgPool2d(kernel_size=4)

Consequently, the linear layer would have 1 x 1 x 128 = 128 input features.

self.fc = nn.Linear(in_features=128,out_features=num_classes)

We also flatten the output of the network to have 128 features.

output = output.view(-1,128)

Loading and Augmenting data

Data loading is very easy in PyTorch thanks to the torchvision package. To demonstrate this, I’ll be loading the CIFAR10 dataset that we’ll make use of in this tutorial.

First we need three additional import statements

To load the dataset we do the following:

Define transformations to be applied on the image

Load the dataset using torchvision

Create an instance of the DataLoader to hold the images

We do this for the training set as below:

First we pass an array of transformations using transform.Compose. RandomHorizontalFlip randomly flips the images horizontally. RandomCrop randomly crops the images. Below is an example of horizontal flipping.

Lastly, the two most important; ToTensor converts the images into a format usable by PyTorch. Normalize with the values given below would make all our pixels range between -1 to +1. Note that when stating the transformations, ToTensor and Normalize must be last in the exact order as defined above. The primary reason for this is that the other transformations are applied on the input which is a PIL image, however, this must be converted to a PyTorch tensor before applying normalization.

Data Augmentation helps the model to classify images properly irrespective of the perspective from which it is displayed.

Next, we load the training set using the CIFAR10 class, and finally we create a loader for the training set, specifying a batch size of 32 images.

This is repeated for the test set as below, except that the transformations only include ToTensor and Normalize. We do not apply other types of transformations on the test set.

The first time you run this code, the dataset of about 170 mb would be downloaded to your system.