Introduction

The most important component of a deep learning model is its loss function. The loss function calculates how accurate the predictions of the model are. Cross-entropy is one of the many loss functions used in Deep Learning (another popular one being SVM hinge loss).

Definition

Cross-Entropy measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy is defined as the difference between the following two probability distributions:

The distribution generated by your model .

The true distribution (generally a one-hot vector of the true label corresponding to the input in a multi-label dataset).

Cross-Entropy can mathematically be represented as:

Example

Suppose for a specific training instance, the label is B (out of the three possible labels A, B, and C). The one-hot distribution for this training instance is:

Now, suppose your machine learning model predicts the following probability distribution:

Cross-Entropy for these values for the vectors and is :

Gradient

In every Deep Learning model, we need to compute the gradient of the loss function with respect to the input vector .

In this example, we use a softmax function to generate the probability distribution from the input vector .

We know that

So,

can be broken down into two parts for when i = k and when i k

For i = k

For i k

Taking both the results together and using the fact that (by the virtue of it being a probability distribution vector),

As it is very easy to compute, the above result is very useful.