Fitting the model:

Epoch 1/25

1000/1000 [==============================] - 913s 913ms/step - loss: 0.3476 - acc: 0.8502 - val_loss: 2.2280 - val_acc: 0.5000

Epoch 2/25

1000/1000 [==============================] - 907s 907ms/step - loss: 0.1354 - acc: 0.9564 - val_loss: 0.5738 - val_acc: 0.8629

Epoch 3/25

1000/1000 [==============================] - 904s 904ms/step - loss: 0.0675 - acc: 0.9825 - val_loss: 0.6880 - val_acc: 0.8710

Epoch 4/25

1000/1000 [==============================] - 910s 910ms/step - loss: 0.0170 - acc: 0.9956 - val_loss: 0.7560 - val_acc: 0.8710

Epoch 5/25

1000/1000 [==============================] - 952s 952ms/step - loss: 0.0454 - acc: 0.9893 - val_loss: 0.7865 - val_acc: 0.8710

Epoch 6/25

1000/1000 [==============================] - 908s 908ms/step - loss: 0.0158 - acc: 0.9959 - val_loss: 0.7694 - val_acc: 0.8952

Epoch 7/25

1000/1000 [==============================] - 908s 908ms/step - loss: 0.0833 - acc: 0.9851 - val_loss: 0.7052 - val_acc: 0.8790

Epoch 8/25

1000/1000 [==============================] - 914s 914ms/step - loss: 0.0103 - acc: 0.9977 - val_loss: 0.7506 - val_acc: 0.8952

Epoch 9/25

1000/1000 [==============================] - 909s 909ms/step - loss: 0.0043 - acc: 0.9989 - val_loss: 0.7203 - val_acc: 0.9032

Epoch 10/25

1000/1000 [==============================] - 905s 905ms/step - loss: 0.0035 - acc: 0.9992 - val_loss: 0.7409 - val_acc: 0.8952

Epoch 11/25

1000/1000 [==============================] - 934s 934ms/step - loss: 0.0050 - acc: 0.9992 - val_loss: 0.8968 - val_acc: 0.8952

Epoch 12/25

1000/1000 [==============================] - 1193s 1s/step - loss: 0.0017 - acc: 0.9998 - val_loss: 0.7880 - val_acc: 0.9032

Epoch 13/25

1000/1000 [==============================] - 1189s 1s/step - loss: 0.0017 - acc: 0.9996 - val_loss: 0.7822 - val_acc: 0.9113

Epoch 14/25

1000/1000 [==============================] - 1194s 1s/step - loss: 0.0014 - acc: 0.9996 - val_loss: 0.7832 - val_acc: 0.9032

Epoch 15/25

1000/1000 [==============================] - 1196s 1s/step - loss: 0.0011 - acc: 0.9998 - val_loss: 0.7775 - val_acc: 0.9032

Epoch 16/25

1000/1000 [==============================] - 1195s 1s/step - loss: 8.3008e-04 - acc: 0.9998 - val_loss: 0.8340 - val_acc: 0.9032

Epoch 17/25

1000/1000 [==============================] - 1198s 1s/step - loss: 0.0072 - acc: 0.9988 - val_loss: 0.7819 - val_acc: 0.8952

Epoch 18/25

1000/1000 [==============================] - 1201s 1s/step - loss: 0.0020 - acc: 0.9997 - val_loss: 0.7950 - val_acc: 0.9113

Epoch 19/25

1000/1000 [==============================] - 1202s 1s/step - loss: 0.0011 - acc: 0.9997 - val_loss: 0.7827 - val_acc: 0.9113

Epoch 20/25

1000/1000 [==============================] - 1170s 1s/step - loss: 0.0015 - acc: 0.9996 - val_loss: 0.8283 - val_acc: 0.9032

Epoch 21/25

1000/1000 [==============================] - 906s 906ms/step - loss: 0.0015 - acc: 0.9997 - val_loss: 0.8592 - val_acc: 0.8952

Epoch 22/25

1000/1000 [==============================] - 905s 905ms/step - loss: 0.0010 - acc: 0.9997 - val_loss: 0.8227 - val_acc: 0.9032

Epoch 23/25

1000/1000 [==============================] - 907s 907ms/step - loss: 8.1553e-04 - acc: 0.9997 - val_loss: 0.8221 - val_acc: 0.9113

Epoch 24/25

1000/1000 [==============================] - 934s 934ms/step - loss: 0.0010 - acc: 0.9998 - val_loss: 0.8540 - val_acc: 0.9032

Epoch 25/25

1000/1000 [==============================] - 1189s 1s/step - loss: 7.6795e-04 - acc: 0.9998 - val_loss: 0.8570 - val_acc: 0.9113

This code fits our model to the be able to classify images of Nicolas Cage. As you can see by the end of the 25th epoch we were able to achieve 99% test accuracy and 91% test accuracy.

Since we performing augmentations to our data during training we use the classifier.fit_generator function. The classifier.fit would not work here since we are doing augmentations to our data.

Parameters:

training_set — We pass in our training set ImageDataGenerator to augment our training images during training

steps_per_epoch — our fit_generator loops infinitely so we specify how many times we want it to loop

epochs — defines the number times that the learning algorithm will work through the entire training dataset

test_set — We pass in our test set ImageDataGenerator to augment our test images during training

val_steps — Total number of steps (batches of samples) to yield from our test data generator before stopping at the end of every epoch

The “learning”:

Neural networks learn by a process called backpropagation. The Keras .fit functions perform this process automatically for us so we don’t have to write the code by hand but what exactly is backpropagation? The videos that helped me the most with understanding this was neural networks demystified and 3Blue1Brown’s back prop video

Forward Propagation:

Neural Networks take in an input and perform a process called forward propagation. To understand backpropagation we first need to understand the process of forward propagation.

Each little line connecting each neuron in a neural network is called a synapse and each one holds a “weight” value. The diagram below helps show a visual example of what a small network would look like.

This is obviously a very simple neural network that only contains one hidden layer with 3 nodes but forward propagation works the same on larger and more complicated neural networks. It’s essentially feeding the previous layer input values forward each time and applying matrix multiplication between our input and weights matrix plus our bias. Then it applies an activation function to “squish” our values into the desired range. Starting out forward propagation will output terrible prediction values. In our training you can see an example of this as our test accuracy in the first epoch was 50%. Back prop will help tweak our weight values in our network to help our network actually learn.

Backpropagation:

Neural networks learn by minimizing a ‘cost’ or ‘loss’ function. When a neural network outputs a prediction after forward propagation we look how wrong this prediction is by looking at a cost function. Our cost function is a log loss function. So we minimize this function using gradient descent.

During gradient descent, we take the derivative at the point on the curve. This derivative tells us the slope of the tangent line. We want to move our point on the loss curve to go in the negative slope direction. Basically, We want to guide the point on our loss curve towards the negative slope because we want to minimize the cost function. But how do we get this derivative?

Explained in the amazing 3Blue1Brown backpropagation video, at a high-level backpropagation is the process of determining what changes or tiny “nudges” to the weights and biases in our network that will cause the most efficient and rapid decrease to the cost function based on a single training example. Backpropagation is recursive meaning the output layer slope depends on the previous layer slope which depends on the previous layer slope and so on throughout the network. Let’s look at what this means using calculus:

Putting it all together:

The equations above were simplified for having one neuron in each layer of a simple network. This is the more advanced put together equation for calculating the derivative in backpropagation.

This is essentially the same equation as before but a little more advanced as to handle a network that has more than one neuron in each layer which is where the jk iterable indexes come from.

Is it necessary to understand the gross looking calculus to be able to get our network to perform well? Thankfully not since Keras does this process automatically for us but I thought it was interesting to go down the math rabbit hole. Backpropagation is super confusing and to be completely honest I still sometimes get lost when I look at all the calculus. Many times I still have to go review it and watch helpful youtube videos to try and polish my understanding.