Overview

Have you ever wanted to try machine learning? If so, this is the tutorial for you. You’ll learn to build an image classifier and train it using Cloud GPUs. What is an image classifier? It is a machine learning model which can classify an image. You’ll show it an image, and it’ll tell you what it thinks it is. Although our model won’t be as accurate as state-of-the-art models, it is a good starting point.

Objective

Our objective is to build an image classifier then use it to make predictions. This process includes the following phases:

Building an image dataset (we’ve done this for you) Loading the image dataset in a way that is for efficient training Building the image classifier model Training the model using Cloud GPUs Validating the trained model

It is important to remember that the process above is not linear. Generally, each phase will need to be revisited multiple times to optimize the model’s accuracy.

Additionally, we will also make predictions using the trained model, and export it for use elsewhere.

Getting started

Follow these steps to try this code:

Open JupyterLab with pre-installed TensorFlow 1.11 . Open a Terminal in the opened lab. Clone this project’s Github repository with: $ git clone https://github.com/PeterChauYEG/animal_classifier.git Open Animal Classifier.pynb inside the animal_classifier directory. This will open the Jupyter notebook. Run the whole notebook by using the Run menu and selecting Run All Cells Scroll down to see training in action. The first part of training occurs at cell 20 .

Loading Dependencies

We need to import a number of Python dependencies.

# Allows division to return a float

from __future__ import division # Allows access to the file system

import os # Provides an API for scientific computing

import numpy as np # Allows use to timestamp the training run

from datetime import datetime # Allows us to render images and plot data

from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img

import math

import matplotlib.pyplot as plt # Machine learning framework that provides an abstract API on top of Tensorflow

import keras

from keras.callbacks import TensorBoard

from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D

from keras.models import Sequential

from keras import optimizers

Configurations

Because we are plotting in a Jupyter Lab, we need to configure matplotlib to render plots inline.

# configure the matplotlib for Jupyter Lab used for rendering the images

%matplotlib inline

Dataset Directories

The dataset holds all the images. A separate directory should be created for each of the following:

Training Dataset: Images which are used during model training Validation Dataset: Images which are used during model validation

Each image dataset should be organized as directories of images. They should be named by the class (eg. cat) of images it holds. It is important that none of the images in the training dataset is in the validation dataset.

# Paths to datasets to be used

train_dir = 'dataset/train'

validate_dir = 'dataset/validate'

Hyperparameters

Hyperparameters are used to tune the model and model training. They greatly influence the resulting metrics. Let’s examine them:

Images will be resized to 200x200x3. This means each image will be 200 pixels by 200 pixels with 3 color channels (red, green, blue). More pixels tends to help the model as it can increase details of the image. The learning rate is the rate which the model will update the gradients which it is trying to optimize. The batch size is the number of images that will be feed into the model in one iteration. The epoch is the number of times the model should iterate over the entire dataset and update the weights of the model. At some number of epochs, the gains of training approach 0. It is possible to overtrain a model. It is often recommended to split the dataset in an 80:20 ratio. This is a general rule that works reasonably well.

# number of images in the training dataset

n_train = 8000 # number of images in the validation dataset

n_validation = 2000 # the number of pixels for the width and height of the image

image_dim = 200 # the size of the image (h,w,c)

input_shape = (image_dim, image_dim, 3) # the rate which the model learns

learning_rate = 0.001 # size of each mini-batch

batch_size = 32 # nunmber of training episodes

epochs = 10

Outputs

We will output 2 items:

Training logs: These can be feed into Tensorboard for analysis Trained model: So it can be used elsewhere

We want to save the training logs to a directory with a timestamp of when training started, and some data about the hyperparameters used. We also want to give the trained model a name when we save it.

# directory which we will save training outputs to

# add a timestamp so that tensorboard show each training session as a different run

timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

output_logs_dir = 'logs/' + timestamp + '-' + str(batch_size) + '-' + str(epochs) # directory to save the model

model_name = 'trained_model'

Loading the image dataset in a way that is for efficient training

Image Data Generators

A naive approach to data loading is to load all the images and transform them up front. This would result in a huge amount of used RAM before training starts. Your machine might not be able to handle this, which would result in crashing kernels. It can also take a very long time depending on the dataset.

Instead, we can load and transform images required exactly when we need it. This would be when feeding a batch of images to the model during training.

Keras provides an optimized method of doing this with the Image Data Generator class. It allows us to load images from a directory efficiently. These generators can also transform the dataset in many other ways to augment it. Explore these optional transformations to help make your model more general, and improve accuracy.

# define data generators

train_data_generator = ImageDataGenerator(rescale=1./255,

fill_mode='nearest')

validation_data_generator = ImageDataGenerator(rescale=1./255,

fill_mode='nearest') # tell the data generators to use data from the train and validation directories

train_generator = train_data_generator.flow_from_directory(train_dir,

target_size=(image_dim, image_dim),

batch_size=batch_size,

class_mode='categorical') validation_generator = validation_data_generator.flow_from_directory(validate_dir,

target_size=(image_dim, image_dim),

batch_size=batch_size,

class_mode='categorical')

Get Class Names

It is useful to have a dictionary of image classes. We can use this dictionary to make our predictions more human-readable.

# get a dictionary of class names

classes_dictionary = train_generator.class_indices # turn classes dictionary into a list

class_keys = list(classes_dictionary.keys()) # get the number of classes

n_classes = len(class_keys)

Load Image Paths of the Validation Dataset

Load the paths for all of the images in the validation dataset. These will be used later when we make predictions.

# Get the name of each directory in the root directory and store them as an array.

classes = get_class_labels(validate_dir) # Get the paths of all the images in the first class directory and store them as a 2d array.

image_paths = get_class_images(classes, validate_dir)

Building the image classifier model

Our model consists of many layers. Images are passed through the model and a set of numbers are outputted. This set of numbers describe the probability of class the image is. We take the largest of these numbers as the most likely class.

We will use several types of layers and activations:

Conv2D is a 2-dimensional convolutional layer. It applies filters over the inputted image. This helps the model learn about spatial relationships in the image. ReLu is a type of non-linear activation function. It helps the model understand which neurons are activating. MaxPooling2D downsamples its input. We use It to reduce the dimensionality of input. This creates a more abstract form of the input. Flatten will turn a matrix into a row. Like flattening a muffin into a pancake. We use it so that we can feed the output into dense layers. Dense is a densely-connected neural network layer. Softmax is an activation function. We use it to turn the output numbers into a range of 0 and 1. It will also cause all the outputted numbers to add up to 1. This can be interpreted as the decimal probability of a class.

Note that the last layer has the same number of neurons as classes. This means that this layer will output 10 numbers, mapping to a class.

# define the model

# takes in images, convoles them, flattens them, classifies them

model = Sequential([

Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=input_shape),

Conv2D(16, (3, 3), activation='relu', padding='same'),

MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),

Conv2D(32, (3, 3), activation='relu', padding='same'),

Conv2D(32, (3, 3), activation='relu', padding='same'),

MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),

Conv2D(64, (3, 3), activation='relu', padding='same'),

Conv2D(64, (3, 3), activation='relu', padding='same'),

MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),

Conv2D(128, (3, 3), activation='relu', padding='same'),

Conv2D(128, (3, 3), activation='relu', padding='same'),

MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),

Flatten(),

Dense(256, activation='relu'),

Dense(n_classes, activation='softmax')

]) # define the optimizer and loss to use

model.compile(optimizer=optimizers.SGD(lr=learning_rate, momentum=0.9),

loss='categorical_crossentropy',

metrics=['accuracy'])

Examine the Model

We can generate a high-level overview of the model structure. Each row is a layer of the model.

# look at the defined model

model.summary()

Model Structure

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 200, 200, 16) 448 _________________________________________________________________ conv2d_2 (Conv2D) (None, 200, 200, 16) 2320 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 100, 100, 16) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 100, 100, 32) 4640 _________________________________________________________________ conv2d_4 (Conv2D) (None, 100, 100, 32) 9248 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 50, 50, 32) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 50, 50, 64) 18496 _________________________________________________________________ conv2d_6 (Conv2D) (None, 50, 50, 64) 36928 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 25, 25, 64) 0 _________________________________________________________________ conv2d_7 (Conv2D) (None, 25, 25, 128) 73856 _________________________________________________________________ conv2d_8 (Conv2D) (None, 25, 25, 128) 147584 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 12, 12, 128) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 18432) 0 _________________________________________________________________ dense_1 (Dense) (None, 256) 4718848 _________________________________________________________________ dense_2 (Dense) (None, 10) 2570 ================================================================= Total params: 5,014,938 Trainable params: 5,014,938 Non-trainable params: 0 _________________________________________________________________

Examine Model Accuracy Before Training

Let’s examine how well the model performs before we train it. We will determine the model’s accuracy on 1 class. This will be done by making predictions with all the images of 1 class. Remember that this isn’t representative of the whole model as it is only 1 class of 10.

# label of the class we are making predictions on

single_class = class_keys[0] # first class image paths

single_class_image_paths = image_paths[0] # make predictions on the first class

single_class_predictions = predict(int(n_validation / n_classes), single_class_image_paths, model) # get the accuracy of predictions on the first class

single_class_accuracy = predictions_accuracy(class_keys, single_class, single_class_predictions) print("Current accuracy of model for class " + single_class + ": " + str(single_class_accuracy))

Training the model using Cloud GPUs

This model has over 5000000 trainable parameter — far too many to set manually. We need to train the model with the training dataset so that the model can to learn the optimal weights that should be used. These weights are the parameter values of the model.

# log information for use with tensorboard

tensorboard = TensorBoard(log_dir=output_logs_dir) # train the model using the training data generator

model.fit_generator(train_generator,

steps_per_epoch=math.floor(n_train/batch_size),

validation_data=validation_generator,

validation_steps=n_validation,

epochs=epochs,

callbacks=[tensorboard])

Examine Model Accuracy After Some Training

Let’s examine how well the model performs now that we’ve trained it a bit. Again, we will determine the model’s accuracy on 1 class.

# make predictions on the first class

single_class_predictions = predict(int(n_train / n_classes), single_class_image_paths, model) # get the accuracy of predictions on the first class

single_class_accuracy = predictions_accuracy(class_keys, single_class, single_class_predictions) print("Current accuracy of model for class " + single_class + ": " + str(single_class_accuracy))

Continue Training the Model

Let’s continue training the model.

# train the model using the training data generator

model.fit_generator(train_generator,

steps_per_epoch=math.floor(n_train/batch_size),

validation_data=validation_generator,

validation_steps=n_validation,

epochs=epochs,

callbacks=[tensorboard])

Examine Model Accuracy After Training

Now that we’ve completed training the model, let’s examine its accuracy on 1 class.

# make predictions on the first class

single_class_predictions = predict(int(n_train / n_classes), single_class_image_paths, model) # get the accuracy of predictions on the first class

single_class_accuracy = predictions_accuracy(class_keys, single_class, single_class_predictions) print("Current accuracy of model for class " + single_class + ": " + str(single_class_accuracy))

Understanding training metrics

Our goal is to maximize validation accuracy while minimizing validation loss. The validation dataset is never used for training. This allows us to measure how well the model performs on images it’s never seen before.

The training and validation accuracies should be similar at the end of training. If these values aren’t, this could be a sign of overfitting.

You should see training loss (loss) decrease, training accuracy (acc) increase for the training data.

You should see validation loss (val_loss) decrease, validation accuracy (val_acc) increase for the validation data.

Tensorboard

Tensorboard is “a suite of visualization tools called TensorBoard. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it”. This is useful for understanding how models/hyperparameters compare.

In JupyterLab, you can use the commands tab to create a new Tensorboard.

Open the commands panel with CTRL + SHIFT + C Search for Create a new tensorboard Select this option and point it to animal_classifier/logs .

You’ll be able to visualize the accuracy of your model over epochs. Each training run creates a new set of logs. This appears in Tensorboard as a separate plotted line.

The following are screenshots of my training results plotted on Tensorboard. Your results should look similar.

Training Loss

Training Accuracy

Validation Loss

Validation Accuracy

Predict

It is useful to know which image predictions were correct and which were wrong. Let’s examine 10 predictions, 1 prediction per class.