Loading the MobileNet base model

Let’s first load the default model to understand what exactly is happening later when we apply transfer learning.

model = tf.keras.applications.mobilenet.MobileNet()

model.summary()

Running the previous snippet will yield a long list of all the layers in the model from the lowest layer (InputLayer)to the highest layer (here called act_softmax).

Screenshot of the lowest and topmost layer of the MobileNet model.summary(). I omitted the middle layers for space reasons.

We can see that the last layer has an output shape of (None, 1000), which means that it will produce 1000 values. Let’s quickly verify by running:

dog_image_id = os.listdir('images/dog')[0] dog_image = load_image(os.path.join('images/dog',dog_image_id)) print(f'shape: {dog_image.shape}')

print(f'type: {type(dog_image)}') model.predict(dog_image)

This command will create a Numpy array with 1000 values.

A random image of a dog fed to the default MobileNet configuration

Loading the MobileNet model for transfer learning

model = tf.keras.applications.mobilenet.MobileNet(

input_shape=(224, 224, 3),

include_top=False,

pooling='avg'

)

The main difference in how we load the model is that we now use the parameter include_top=False . When using this parameter, we also have to specify the input_shape and pooling, as can be read in the documentation. By specifying include_top=False we instantiate the model without the highest layer (i.e., the prediction of the 1000 classes).

Running model.summary() gives us:

MobileNet model with include_top=False

We can now clearly see that the topmost layer of the model is a global_average_pooling and not the softmax layer we had previously seen. Also, the total number of params has gone down significantly. To confirm the new resulting values, we can run model.predict(dog).shape to see that we will now get 1024 instead of 1000 values. Inspecting those also shows that they are quite different from the previous predictions. The difference is because no classification has yet taken place, and those 1024 valued indicate the presence of some abstract features.

Adding additional layers

To predict cats and dogs and Elon Musk, we have to replace the prediction layers from the original model. To do so, we add three new layers:

Dropout: Removes nodes during training to prevent overfitting

Removes nodes during training to prevent overfitting Dense: Fully connected layer (i.e., each node in the layer is connected to every node in the previous layer)

Fully connected layer (i.e., each node in the layer is connected to every node in the previous layer) Softmax: Function that specifies our output value. Softmax means that the sum of all nodes in the layer (our three dense nodes) has to be 1. Can be understood as a probability

from tensorflow.keras.models import Model

from tensorflow.keras.layers import (Dropout, Dense, Softmax) x = Dropout(rate=0.4)(model.output)

x = Dense(3)(x)

x = Softmax()(x)

model= Model(model.inputs, x)

Running model.summary() again will show the new layers at the very top of the model.

⑤ Training the model

Photo by Josh Riemer on Unsplash

This is starting to get exciting! We are almost there.

Specifying the layers to be trained

If we were to start from scratch, we would now train the entire net and the millions of parameters that come with it. But luckily enough, we don’t have to do that. All the lower layers have been previously trained! So let’s make sure that only to train the new layers. For a production model, you would typically also train the lower layers after an initial burn-in period, where you only train your new layers.

for layer in model.layers[:-3]:

layer.trainable = False

Compile the model

Let’s configure our model for training by running model.compile with an optimizer and a loss function. There are loads of different optimizers and loss functions out there, but Adam and categorial_crossentropy are good defaults. Read up on them if you are curious, but don’t get lost in the jungle.

from tensorflow.keras.optimizers import Adam

model.compile(

optimizer=Adam(lr=0.001),

loss='categorical_crossentropy'

)

Build the data generator

Almost there, we have to specify the training and validation data, and we are good to go!

We build a data generator datagen first and specify a couple of parameters to define the augmentations that we want to apply to our images during the training process. We also specify a save_to_dir folder for training and validation and guarantee their existence beforehand. Doing so will allow us to inspect the augmented pictures created during the training process. If you don’t want that, remove the line.

Train the model

Alright, time to train the model! We use fit_generator because we have previously created two generators, one for the training data and one for the validation data. We also use a callback to indicate the progress of our training visually.

Running this means, we have to wait a little bit. On my MacBook Pro, it’s about 10 minutes. Your mileage might vary depending on your hardware. If you want to go through the motions, you could also set epochs=1 , which is much faster.

During training you should see something like this:

After training has finished, we can inspect the progress by running:

import matplotlib.pyplot as plt

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.title('Model loss')

plt.ylabel('Loss')

plt.xlabel('Epoch')

plt.legend(['Train', 'Validation'], loc='upper left')

plt.show()

Which will give us: