You might have already heard of image or facial recognition or self-driving cars. These are real-life implementations of Convolutional Neural Networks (CNNs). In this blog post, you will learn and understand how to implement these deep, feed-forward artificial neural networks in Keras and also learn how to overcome overfitting with the regularization technique called "dropout".

More specifically, you'll tackle the following topics in today's tutorial:

Would you like to take a course on Keras and deep learning in Python? Consider taking DataCamp's Deep Learning in Python course!

Also, don't miss our Keras cheat sheet, which shows you the six steps that you need to go through to build neural networks in Python with code examples!

Convolutional Neural Network: Introduction

By now, you might already know about machine learning and deep learning, a computer science branch that studies the design of algorithms that can learn. Deep learning is a subfield of machine learning that is inspired by artificial neural networks, which in turn are inspired by biological neural networks.

A specific kind of such a deep neural network is the convolutional network, which is commonly referred to as CNN or ConvNet. It's a deep, feed-forward artificial neural network. Remember that feed-forward neural networks are also called multi-layer perceptrons(MLPs), which are the quintessential deep learning models. The models are called "feed-forward" because information fl�ows right through the model. There are no feedback connections in which outputs of the model are fed back into itself.

CNNs specifically are inspired by the biological visual cortex. The cortex has small regions of cells that are sensitive to the specific areas of the visual field. This idea was expanded by a captivating experiment done by Hubel and Wiesel in 1962 (if you want to know more, here's a video). In this experiment, the researchers showed that some individual neurons in the brain activated or fired only in the presence of edges of a particular orientation like vertical or horizontal edges. For example, some neurons fired when exposed to vertical sides and some when shown a horizontal edge. Hubel and Wiesel found that all of these neurons were well ordered in a columnar fashion and that together they were able to produce visual perception. This idea of specialized components inside of a system having specific tasks is one that machines use as well and one that you can also find back in CNNs.

Convolutional neural networks have been one of the most influential innovations in the field of computer vision. They have performed a lot better than traditional computer vision and have produced state-of-the-art results. These neural networks have proven to be successful in many different real-life case studies and applications, like:

Image classification, object detection, segmentation, face recognition;

Self driving cars that leverage CNN based vision systems;

Classification of crystal structure using a convolutional neural network;

And many more, of course!

To understand this success, you'll have to go back to 2012, the year in which Alex Krizhevsky used convolutional neural networks to win that year's ImageNet Competition, reducing the classification error from 26% to 15%.

Note that ImageNet Large Scale Visual Recognition Challenge (ILSVRC) began in the year 2010 is an annual competition where research teams assess their algorithms on the given data set and compete to achieve higher accuracy on several visual recognition tasks.

This was the time when neural networks regained prominence after quite some time. This is often called the "third wave of neural networks". The other two waves were in the 1940s until the 1960s and in the 1970s to 1980s.

Alright, you know that you'll be working with feed-forward networks that are inspired by the biological visual cortex, but what does that actually mean?

Take a look at the picture below:

Figure: Convolutional Neural Network from Wikimedia

The image shows you that you feed an image as an input to the network, which goes through multiple convolutions, subsampling, a fully connected layer and finally outputs something.

But what are all these concepts?

The convolution layer computes the output of neurons that are connected to local regions or receptive fields in the input, each computing a dot product between their weights and a small receptive field to which they are connected to in the input volume. Each computation leads to extraction of a feature map from the input image. In other words, imagine you have an image represented as a 5x5 matrix of values, and you take a 3x3 matrix and slide that 3x3 window or kernel around the image. At each position of that matrix, you multiply the values of your 3x3 window by the values in the image that are currently being covered by the window. As a result, you'll get a single number that represents all the values in that window of the images. You use this layer to filtering: as the window moves over the image, you check for patterns in that section of the image. This works because of filters, which are multiplied by the values outputted by the convolution. The objective of subsampling is to get an input representation by reducing its dimensions, which helps in reducing overfitting. One of the techniques of subsampling is max pooling. With this technique, you select the highest pixel value from a region depending on its size. In other words, max pooling takes the largest value from the window of the image currently covered by the kernel. For example, you can have a max-pooling layer of size 2 x 2 will select the maximum pixel intensity value from 2 x 2 region. You're right to think that the pooling layer then works a lot like the convolution layer! You also take a kernel or a window and move it over the image; The only difference is the function that is applied to the kernel and the image window isn't linear. Figure: Max-Pooling from Wikipedia

The objective of the fully connected layer is to flatten the high-level features that are learned by convolutional layers and combining all the features. It passes the flattened output to the output layer where you use a softmax classifier or a sigmoid to predict the input class label.

For more information, you can go here.

The Fashion-MNIST Data Set

Before you go ahead and load in the data, it's good to take a look at what you'll exactly be working with! The Fashion-MNIST dataset is a dataset of Zalando's article images, with 28x28 grayscale images of 70,000 fashion products from 10 categories, and 7,000 images per category. The training set has 60,000 images, and the test set has 10,000 images. You can double check this later when you have loaded in your data! ;)

Fashion-MNIST is similar to the MNIST dataset that you might already know, which you use to classify handwritten digits. That means that the image dimensions, training and test splits are similar to the MNIST dataset. Tip: if you want to learn how to implement an Multi-Layer Perceptron (MLP) for classification tasks with this latter dataset, go to this tutorial.

You can find the Fashion-MNIST dataset here, but you can also load it with the help of specific TensorFlow and Keras modules. You'll see how this works in the next section!

Load the Data

Keras comes with a library called datasets , which you can use to load datasets out of the box: you download the data from the server and speeds up the process since you no longer have to download the data to your computer. The train and test images along with the labels are loaded and stored in variables train_X , train_Y , test_X , test_Y , respectively.

from keras.datasets import fashion_mnist (train_X,train_Y), (test_X,test_Y) = fashion_mnist.load_data()

Using TensorFlow backend.

Great! That was pretty simple, wasn't it?

You have probably done this a million times by now, but it's always an essential step to get started. Now you're completely set to start analyzing, processing and modeling your data!

Analyze the Data

Let's now analyze how images in the dataset look like. Even though you know the dimension of the images by now, it's still worth the effort to analyze it programmatically: you might have to rescale the image pixels and resize the images.

import numpy as np from keras.utils import to_categorical import matplotlib.pyplot as plt %matplotlib inline print('Training data shape : ', train_X.shape, train_Y.shape) print('Testing data shape : ', test_X.shape, test_Y.shape)

('Training data shape : ', (60000, 28, 28), (60000,)) ('Testing data shape : ', (10000, 28, 28), (10000,))

From the above output, you can see that the training data has a shape of 60000 x 28 x 28 since there are 60,000 training samples each of 28 x 28 dimension. Similarly, the test data has a shape of 10000 x 28 x 28 since there are 10,000 testing samples.

# Find the unique numbers from the train labels classes = np.unique(train_Y) nClasses = len(classes) print('Total number of outputs : ', nClasses) print('Output classes : ', classes)

('Total number of outputs : ', 10) ('Output classes : ', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8))

There's also a total of ten output classes that range from 0 to 9.

Also, don't forget to take a look at what the images in your dataset:

plt.figure(figsize=[5,5]) # Display the first image in training data plt.subplot(121) plt.imshow(train_X[0,:,:], cmap='gray') plt.title("Ground Truth : {}".format(train_Y[0])) # Display the first image in testing data plt.subplot(122) plt.imshow(test_X[0,:,:], cmap='gray') plt.title("Ground Truth : {}".format(test_Y[0]))

Text(0.5,1,u'Ground Truth : 9')

The output of above two plots looks like an ankle boot, and this class is assigned a class label of 9. Similarly, other fashion products will have different labels, but similar products will have same labels. This means that all the 7,000 ankle boot images will have a class label of 9.

Data Preprocessing

As you could see in the above plot, the images are grayscale images have pixel values that range from 0 to 255. Also, these images have a dimension of 28 x 28. As a result, you'll need to preprocess the data before you feed it into the model.



As a first step, convert each 28 x 28 image of the train and test set into a matrix of size 28 x 28 x 1 which is fed into the network.

train_X = train_X.reshape(-1, 28,28, 1) test_X = test_X.reshape(-1, 28,28, 1) train_X.shape, test_X.shape

((60000, 28, 28, 1), (10000, 28, 28, 1))

The data right now is in an int8 format, so before you feed it into the network you need to convert its type to float32, and you also have to rescale the pixel values in range 0 - 1 inclusive. So let's do that!

train_X = train_X.astype('float32') test_X = test_X.astype('float32') train_X = train_X / 255. test_X = test_X / 255.

Now you need to convert the class labels into a one-hot encoding vector.

In one-hot encoding, you convert the categorical data into a vector of numbers. The reason why you convert the categorical data in one hot encoding is that machine learning algorithms cannot work with categorical data directly. You generate one boolean column for each category or class. Only one of these columns could take on the value 1 for each sample. Hence, the term one-hot encoding.

For your problem statement, the one hot encoding will be a row vector, and for each image, it will have a dimension of 1 x 10. The important thing to note here is that the vector consists of all zeros except for the class that it represents, and for that, it is 1. For example, the ankle boot image that you plotted above has a label of 9, so for all the ankle boot images, the one hot encoding vector would be [0 0 0 0 0 0 0 0 1 0] .

So let's convert the training and testing labels into one-hot encoding vectors:

# Change the labels from categorical to one-hot encoding train_Y_one_hot = to_categorical(train_Y) test_Y_one_hot = to_categorical(test_Y) # Display the change for category label using one-hot encoding print('Original label:', train_Y[0]) print('After conversion to one-hot:', train_Y_one_hot[0])

('Original label:', 9) ('After conversion to one-hot:', array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]))

That's pretty clear, right? Note that you can also print the train_Y_one_hot , which will display a matrix of size 60000 x 10 in which each row depicts one-hot encoding of an image.

This last step is a crucial one. In machine learning or any data specific task, you should partition the data correctly. For the model to generalize well, you split the training data into two parts, one designed for training and another one for validation. In this case, you will train the model on 80\% of the training data and validate it on 20\% of the remaining training data. This will also help to reduce overfitting since you will be validating the model on the data it would not have seen in training phase, which will help in boosting the test performance.

from sklearn.model_selection import train_test_split train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_Y_one_hot, test_size=0.2, random_state=13)

For one last time let's check the shape of training and validation set.

train_X.shape,valid_X.shape,train_label.shape,valid_label.shape

((48000, 28, 28, 1), (12000, 28, 28, 1), (48000, 10), (12000, 10))

The Network

The images are of size 28 x 28. You convert the image matrix to an array, rescale it between 0 and 1, reshape it so that it's of size 28 x 28 x 1, and feed this as an input to the network.

You'll use three convolutional layers:

The first layer will have 32-3 x 3 filters,

The second layer will have 64-3 x 3 filters and

The third layer will have 128-3 x 3 filters.

In addition, there are three max-pooling layers each of size 2 x 2.

Figure: Architecture of the Model

Model the Data

First, let's import all the necessary modules required to train the model.

import keras from keras.models import Sequential,Input,Model from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras.layers.normalization import BatchNormalization from keras.layers.advanced_activations import LeakyReLU

You will use a batch size of 64 using a higher batch size of 128 or 256 is also preferable it all depends on the memory. It contributes massively to determining the learning parameters and affects the prediction accuracy. You will train the network for 20 epochs.

batch_size = 64 epochs = 20 num_classes = 10

Neural Network Architecture

In Keras, you can just stack up layers by adding the desired layer one by one. That's exactly what you'll do here: you'll first add a first convolutional layer with Conv2D() . Note that you use this function because you're working with images! Next, you add the Leaky ReLU activation function which helps the network learn non-linear decision boundaries. Since you have ten different classes, you'll need a non-linear decision boundary that could separate these ten classes which are not linearly separable.

More specifically, you add Leaky ReLUs because they attempt to fix the problem of dying Rectified Linear Units (ReLUs). The ReLU activation function is used a lot in neural network architectures and more specifically in convolutional networks, where it has proven to be more effective than the widely used logistic sigmoid function. As of 2017, this activation function is the most popular one for deep neural networks. The ReLU function allows the activation to be thresholded at zero. However, during the training, ReLU units can "die". This can happen when a large gradient flows through a ReLU neuron: it can cause the weights to update in such a way that the neuron will never activate on any data point again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. Leaky ReLUs attempt to solve this: the function will not be zero but will instead have a small negative slope.

Next, you'll add the max-pooling layer with MaxPooling2D() and so on. The last layer is a Dense layer that has a softmax activation function with 10 units, which is needed for this multi-class classification problem.

fashion_model = Sequential() fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',input_shape=(28,28,1),padding='same')) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(MaxPooling2D((2, 2),padding='same')) fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same')) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same')) fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same')) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same')) fashion_model.add(Flatten()) fashion_model.add(Dense(128, activation='linear')) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(Dense(num_classes, activation='softmax'))

Compile the Model

After the model is created, you compile it using the Adam optimizer, one of the most popular optimization algorithms. You can read more about this optimizer here. Additionally, you specify the loss type which is categorical cross entropy which is used for multi-class classification, you can also use binary cross-entropy as the loss function. Lastly, you specify the metrics as accuracy which you want to analyze while the model is training.

fashion_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])

Let's visualize the layers that you created in the above step by using the summary function. This will show some parameters (weights and biases) in each layer and also the total parameters in your model.

fashion_model.summary()

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_51 (Conv2D) (None, 28, 28, 32) 320 _________________________________________________________________ leaky_re_lu_57 (LeakyReLU) (None, 28, 28, 32) 0 _________________________________________________________________ max_pooling2d_49 (MaxPooling (None, 14, 14, 32) 0 _________________________________________________________________ conv2d_52 (Conv2D) (None, 14, 14, 64) 18496 _________________________________________________________________ leaky_re_lu_58 (LeakyReLU) (None, 14, 14, 64) 0 _________________________________________________________________ max_pooling2d_50 (MaxPooling (None, 7, 7, 64) 0 _________________________________________________________________ conv2d_53 (Conv2D) (None, 7, 7, 128) 73856 _________________________________________________________________ leaky_re_lu_59 (LeakyReLU) (None, 7, 7, 128) 0 _________________________________________________________________ max_pooling2d_51 (MaxPooling (None, 4, 4, 128) 0 _________________________________________________________________ flatten_17 (Flatten) (None, 2048) 0 _________________________________________________________________ dense_33 (Dense) (None, 128) 262272 _________________________________________________________________ leaky_re_lu_60 (LeakyReLU) (None, 128) 0 _________________________________________________________________ dense_34 (Dense) (None, 10) 1290 ================================================================= Total params: 356,234 Trainable params: 356,234 Non-trainable params: 0 _________________________________________________________________

Train the Model

It's finally time to train the model with Keras' fit() function! The model trains for 20 epochs. The fit() function will return a history object; By storying the result of this function in fashion_train , you can use it later to plot the accuracy and loss function plots between training and validation which will help you to analyze your model's performance visually.

fashion_train = fashion_model.fit(train_X, train_label, batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(valid_X, valid_label))

Train on 48000 samples, validate on 12000 samples Epoch 1/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.4661 - acc: 0.8311 - val_loss: 0.3320 - val_acc: 0.8809 Epoch 2/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.2874 - acc: 0.8951 - val_loss: 0.2781 - val_acc: 0.8963 Epoch 3/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.2420 - acc: 0.9111 - val_loss: 0.2501 - val_acc: 0.9077 Epoch 4/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.2088 - acc: 0.9226 - val_loss: 0.2369 - val_acc: 0.9147 Epoch 5/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.1838 - acc: 0.9324 - val_loss: 0.2602 - val_acc: 0.9070 Epoch 6/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.1605 - acc: 0.9396 - val_loss: 0.2264 - val_acc: 0.9193 Epoch 7/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.1356 - acc: 0.9488 - val_loss: 0.2566 - val_acc: 0.9180 Epoch 8/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.1186 - acc: 0.9553 - val_loss: 0.2556 - val_acc: 0.9149 Epoch 9/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.0985 - acc: 0.9634 - val_loss: 0.2681 - val_acc: 0.9204 Epoch 10/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.0873 - acc: 0.9670 - val_loss: 0.2712 - val_acc: 0.9221 Epoch 11/20 48000/48000 [==============================] - 59s 1ms/step - loss: 0.0739 - acc: 0.9721 - val_loss: 0.2757 - val_acc: 0.9202 Epoch 12/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.0628 - acc: 0.9767 - val_loss: 0.3126 - val_acc: 0.9132 Epoch 13/20 48000/48000 [==============================] - 61s 1ms/step - loss: 0.0569 - acc: 0.9789 - val_loss: 0.3556 - val_acc: 0.9081 Epoch 14/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.0452 - acc: 0.9833 - val_loss: 0.3441 - val_acc: 0.9189 Epoch 15/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.0421 - acc: 0.9847 - val_loss: 0.3400 - val_acc: 0.9165 Epoch 16/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.0379 - acc: 0.9861 - val_loss: 0.3876 - val_acc: 0.9195 Epoch 17/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.0405 - acc: 0.9855 - val_loss: 0.4112 - val_acc: 0.9164 Epoch 18/20 48000/48000 [==============================] - 60s 1ms/step - loss: 0.0285 - acc: 0.9897 - val_loss: 0.4150 - val_acc: 0.9181 Epoch 19/20 48000/48000 [==============================] - 61s 1ms/step - loss: 0.0322 - acc: 0.9877 - val_loss: 0.4584 - val_acc: 0.9196 Epoch 20/20 48000/48000 [==============================] - 61s 1ms/step - loss: 0.0262 - acc: 0.9906 - val_loss: 0.4396 - val_acc: 0.9205

Finally! You trained the model on fashion-MNIST for 20 epochs, and by observing the training accuracy and loss, you can say that the model did a good job since after 20 epochs the training accuracy is 99% and the training loss is quite low.

However, it looks like the model is overfitting, as the validation loss is 0.4396 and the validation accuracy is 92%. Overfitting gives an intuition that the network has memorized the training data very well but is not guaranteed to work on unseen data, and that is why there is a difference in the training and validation accuracy.

You probably need to handle this. In next sections, you'll learn how you can make your model perform much better by adding a Dropout layer into the network and keeping all the other layers unchanged.

But first, let's evaluate the performance of your model on the test set before you come on to a conclusion.

Model Evaluation on the Test Set

test_eval = fashion_model.evaluate(test_X, test_Y_one_hot, verbose=0)

print('Test loss:', test_eval[0]) print('Test accuracy:', test_eval[1])

('Test loss:', 0.46366268818555401) ('Test accuracy:', 0.91839999999999999)

The test accuracy looks impressive. It turns out that your classifier does better than the benchmark that was reported here, which is an SVM classifier with mean accuracy of 0.897. Also, the model does well compared to some of the deep learning models mentioned on the GitHub profile of the creators of fashion-MNIST dataset.

However, you saw that the model looked like it was overfitting. Are these results really all that good?

Let's put your model evaluation into perspective and plot the accuracy and loss plots between training and validation data:

accuracy = fashion_train.history['acc'] val_accuracy = fashion_train.history['val_acc'] loss = fashion_train.history['loss'] val_loss = fashion_train.history['val_loss'] epochs = range(len(accuracy)) plt.plot(epochs, accuracy, 'bo', label='Training accuracy') plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy') plt.title('Training and validation accuracy') plt.legend() plt.figure() plt.plot(epochs, loss, 'bo', label='Training loss') plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training and validation loss') plt.legend() plt.show()

From the above two plots, you can see that the validation accuracy almost became stagnant after 4-5 epochs and rarely increased at certain epochs. In the beginning, the validation accuracy was linearly increasing with loss, but then it did not increase much.

The validation loss shows that this is the sign of overfitting, similar to validation accuracy it linearly decreased but after 4-5 epochs, it started to increase. This means that the model tried to memorize the data and succeeded.

With this in mind, it's time to introduce some dropout into our model and see if it helps in reducing overfitting.

Adding Dropout into the Network

You can add a dropout layer to overcome the problem of overfitting to some extent. Dropout randomly turns off a fraction of neurons during the training process, reducing the dependency on the training set by some amount. How many fractions of neurons you want to turn off is decided by a hyperparameter, which can be tuned accordingly. This way, turning off some neurons will not allow the network to memorize the training data since not all the neurons will be active at the same time and the inactive neurons will not be able to learn anything.

So let's create, compile and train the network again but this time with dropout. And run it for 20 epochs with a batch size of 64.

batch_size = 64 epochs = 20 num_classes = 10

fashion_model = Sequential() fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',padding='same',input_shape=(28,28,1))) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(MaxPooling2D((2, 2),padding='same')) fashion_model.add(Dropout(0.25)) fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same')) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same')) fashion_model.add(Dropout(0.25)) fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same')) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same')) fashion_model.add(Dropout(0.4)) fashion_model.add(Flatten()) fashion_model.add(Dense(128, activation='linear')) fashion_model.add(LeakyReLU(alpha=0.1)) fashion_model.add(Dropout(0.3)) fashion_model.add(Dense(num_classes, activation='softmax'))

fashion_model.summary()

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_54 (Conv2D) (None, 28, 28, 32) 320 _________________________________________________________________ leaky_re_lu_61 (LeakyReLU) (None, 28, 28, 32) 0 _________________________________________________________________ max_pooling2d_52 (MaxPooling (None, 14, 14, 32) 0 _________________________________________________________________ dropout_29 (Dropout) (None, 14, 14, 32) 0 _________________________________________________________________ conv2d_55 (Conv2D) (None, 14, 14, 64) 18496 _________________________________________________________________ leaky_re_lu_62 (LeakyReLU) (None, 14, 14, 64) 0 _________________________________________________________________ max_pooling2d_53 (MaxPooling (None, 7, 7, 64) 0 _________________________________________________________________ dropout_30 (Dropout) (None, 7, 7, 64) 0 _________________________________________________________________ conv2d_56 (Conv2D) (None, 7, 7, 128) 73856 _________________________________________________________________ leaky_re_lu_63 (LeakyReLU) (None, 7, 7, 128) 0 _________________________________________________________________ max_pooling2d_54 (MaxPooling (None, 4, 4, 128) 0 _________________________________________________________________ dropout_31 (Dropout) (None, 4, 4, 128) 0 _________________________________________________________________ flatten_18 (Flatten) (None, 2048) 0 _________________________________________________________________ dense_35 (Dense) (None, 128) 262272 _________________________________________________________________ leaky_re_lu_64 (LeakyReLU) (None, 128) 0 _________________________________________________________________ dropout_32 (Dropout) (None, 128) 0 _________________________________________________________________ dense_36 (Dense) (None, 10) 1290 ================================================================= Total params: 356,234 Trainable params: 356,234 Non-trainable params: 0 _________________________________________________________________

fashion_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])

fashion_train_dropout = fashion_model.fit(train_X, train_label, batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(valid_X, valid_label))

Train on 48000 samples, validate on 12000 samples Epoch 1/20 48000/48000 [==============================] - 66s 1ms/step - loss: 0.5954 - acc: 0.7789 - val_loss: 0.3788 - val_acc: 0.8586 Epoch 2/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.3797 - acc: 0.8591 - val_loss: 0.3150 - val_acc: 0.8832 Epoch 3/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.3302 - acc: 0.8787 - val_loss: 0.2836 - val_acc: 0.8961 Epoch 4/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.3034 - acc: 0.8868 - val_loss: 0.2663 - val_acc: 0.9002 Epoch 5/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.2843 - acc: 0.8936 - val_loss: 0.2481 - val_acc: 0.9083 Epoch 6/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.2699 - acc: 0.9002 - val_loss: 0.2469 - val_acc: 0.9032 Epoch 7/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2561 - acc: 0.9049 - val_loss: 0.2422 - val_acc: 0.9095 Epoch 8/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2503 - acc: 0.9068 - val_loss: 0.2429 - val_acc: 0.9098 Epoch 9/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2437 - acc: 0.9096 - val_loss: 0.2230 - val_acc: 0.9173 Epoch 10/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2307 - acc: 0.9126 - val_loss: 0.2170 - val_acc: 0.9187 Epoch 11/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2307 - acc: 0.9135 - val_loss: 0.2265 - val_acc: 0.9193 Epoch 12/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2229 - acc: 0.9160 - val_loss: 0.2136 - val_acc: 0.9229 Epoch 13/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2202 - acc: 0.9162 - val_loss: 0.2173 - val_acc: 0.9187 Epoch 14/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.2161 - acc: 0.9188 - val_loss: 0.2142 - val_acc: 0.9211 Epoch 15/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2119 - acc: 0.9196 - val_loss: 0.2133 - val_acc: 0.9233 Epoch 16/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2073 - acc: 0.9222 - val_loss: 0.2159 - val_acc: 0.9213 Epoch 17/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2050 - acc: 0.9231 - val_loss: 0.2123 - val_acc: 0.9233 Epoch 18/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.2016 - acc: 0.9238 - val_loss: 0.2191 - val_acc: 0.9235 Epoch 19/20 48000/48000 [==============================] - 65s 1ms/step - loss: 0.2001 - acc: 0.9244 - val_loss: 0.2110 - val_acc: 0.9258 Epoch 20/20 48000/48000 [==============================] - 64s 1ms/step - loss: 0.1972 - acc: 0.9255 - val_loss: 0.2092 - val_acc: 0.9269

Let's save the model so that you can directly load it and not have to train it again for 20 epochs. This way, you can load the model later on if you need it and modify the architecture; Alternatively, you can start the training process on this saved model. It is always a good idea to save the model -and even the model's weights!- because it saves you time. Note that you can also save the model after every epoch so that, if some issue occurs that stops the training at an epoch, you will not have to start the training from the beginning.

fashion_model.save("fashion_model_dropout.h5py")

Model Evaluation on the Test Set

Finally, let's also evaluate your new model and see how it performs!

test_eval = fashion_model.evaluate(test_X, test_Y_one_hot, verbose=1)

10000/10000 [==============================] - 5s 461us/step

print('Test loss:', test_eval[0]) print('Test accuracy:', test_eval[1])

('Test loss:', 0.21460009642243386) ('Test accuracy:', 0.92300000000000004)

Wow! Looks like adding Dropout in our model worked, even though the test accuracy did not improve significantly but the test loss decreased compared to the previous results.



Now, let's plot the accuracy and loss plots between training and validation data for the one last time.

accuracy = fashion_train_dropout.history['acc'] val_accuracy = fashion_train_dropout.history['val_acc'] loss = fashion_train_dropout.history['loss'] val_loss = fashion_train_dropout.history['val_loss'] epochs = range(len(accuracy)) plt.plot(epochs, accuracy, 'bo', label='Training accuracy') plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy') plt.title('Training and validation accuracy') plt.legend() plt.figure() plt.plot(epochs, loss, 'bo', label='Training loss') plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training and validation loss') plt.legend() plt.show()

Finally, you can see that the validation loss and validation accuracy both are in sync with the training loss and training accuracy. Even though the validation loss and accuracy line are not linear, but it shows that your model is not overfitting: the validation loss is decreasing and not increasing, and there is not much gap between training and validation accuracy.

Therefore, you can say that your model's generalization capability became much better since the loss on both test set and validation set was only slightly more compared to the training loss.

Predict Labels

predicted_classes = fashion_model.predict(test_X)

Since the predictions you get are floating point values, it will not be feasible to compare the predicted labels with true test labels. So, you will round off the output which will convert the float values into an integer. Further, you will use np.argmax() to select the index number which has a higher value in a row.

For example, let's assume a prediction for one test image to be 0 1 0 0 0 0 0 0 0 0 , the output for this should be a class label 1 .

predicted_classes = np.argmax(np.round(predicted_classes),axis=1)

predicted_classes.shape, test_Y.shape

((10000,), (10000,))

correct = np.where(predicted_classes==test_Y)[0] print "Found %d correct labels" % len(correct) for i, correct in enumerate(correct[:9]): plt.subplot(3,3,i+1) plt.imshow(test_X[correct].reshape(28,28), cmap='gray', interpolation='none') plt.title("Predicted {}, Class {}".format(predicted_classes[correct], test_Y[correct])) plt.tight_layout()

Found 9188 correct labels

incorrect = np.where(predicted_classes!=test_Y)[0] print "Found %d incorrect labels" % len(incorrect) for i, incorrect in enumerate(incorrect[:9]): plt.subplot(3,3,i+1) plt.imshow(test_X[incorrect].reshape(28,28), cmap='gray', interpolation='none') plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], test_Y[incorrect])) plt.tight_layout()

Found 812 incorrect labels

By looking at a few images, you cannot be sure as to why your model is not able to classify the above images correctly, but it seems like a variety of the similar patterns present on multiple classes affect the performance of the classifier although CNN is a robust architecture. For example, images 5 and 6 both belong to different classes but look kind of similar maybe a jacket or perhaps a long sleeve shirt.

Classification Report

Classification report will help us in identifying the misclassified classes in more detail. You will be able to observe for which class the model performed bad out of the given ten classes.

from sklearn.metrics import classification_report target_names = ["Class {}".format(i) for i in range(num_classes)] print(classification_report(test_Y, predicted_classes, target_names=target_names))

precision recall f1-score support Class 0 0.77 0.90 0.83 1000 Class 1 0.99 0.98 0.99 1000 Class 2 0.88 0.88 0.88 1000 Class 3 0.94 0.92 0.93 1000 Class 4 0.88 0.87 0.88 1000 Class 5 0.99 0.98 0.98 1000 Class 6 0.82 0.72 0.77 1000 Class 7 0.94 0.99 0.97 1000 Class 8 0.99 0.98 0.99 1000 Class 9 0.98 0.96 0.97 1000 avg / total 0.92 0.92 0.92 10000

You can see that the classifier is underperforming for class 6 regarding both precision and recall. For class 0 and class 2, the classifier is lacking precision. Also, for class 4, the classifier is slightly lacking both precision and recall.

Go Further!

This tutorial was good start to convolutional neural networks in Python with Keras. If you were able to follow along easily or even with little more efforts, well done! Try doing some experiments maybe with same model architecture but using different types of public datasets available.

There is still a lot to cover, so why not take DataCamp's Deep Learning in Python course? In the meantime, also make sure to check out the Keras documentation, if you haven't done so already. You will find more examples and information on all functions, arguments, more layers, etc. It will undoubtedly be an indispensable resource when you're learning how to work with neural networks in Python!

If you rather feel like reading a book that explains the fundamentals of deep learning (with Keras) together with how it's used in practice, you should definitely read François Chollet's Deep Learning in Python book.