What Is Transfer Learning ?

If not for Transfer Learning, Machine Learning is a pretty tough thing to do for an absolute beginner. At the lowest level, machine learning involves computing a function that maps some inputs to their corresponding outputs. Though the function itself is just a bunch of addition and multiplication operations, when passed through a non linear activation function and stacking a bunch of these layers together, functions can be made, to learn literally anything, Provided there’s enough data to learn from, and an enormous amount of computational power.

Welcome to Deep Learning.

Convolutional Neural Networks can learn extremely complex mapping functions when trained on enough data. We can’t yet understand how a convolutional net learns such complicated functions.

At a base level, the weights of a CNN (Convolutional Neural Network) consist of filters. Think of a filter as an (n*n) matrix which consists of certain numbers. Now this filter is convoluted(slide and multiply) through the provided image. Assume the input image is of size (10,10) and the filter is of size (3,3), first the filter is multiplied with the 9 pixels on the top-left of the input image, this multiplication produces another (3,3) matrix. The values of the 9 pixels of this matrix are summed up and this value becomes a single pixel value on the top-left of layer_2 of the CNN.

representation of convolutional networks

Basically the training of a CNN involves, finding of the right values on each of the filters so that an input image when passed through the multiple layers, activates certain neurons of the last layer so as to predict the correct class.

Though training a CNN from scratch is possible for small projects, most applications require the training of very large CNN’s and this as you guessed, takes extremely huge amounts of processed data and computational power. And both of these are not found so easily these days.

That’s where transfer learning comes into play. In transfer learning, we take the pre-trained weights of an already trained model(one that has been trained on millions of images belonging to 1000’s of classes, on several high power GPU’s for several days) and use these already learned features to predict new classes.

The advantages of transfer learning are that:

1: There is no need of an extremely large training dataset.

2: Not much computational power is required.As we are using pre-trained weights and only have to learn the weights of the last few layers.

There are several models that have been trained on the image net dataset and have been open sourced.

For example, VGG-16, VGG-19, Inception-V3 etc. For more details about each of these models, read the official keras documentation here.