Gradient Weighted Class Activation Mapping (Grad-CAM)

To address the above issues, Grad-CAM is proposed, which has no restrictions in terms of using GAP, and can create heatmaps by visualizing any layer in the network. Let’s explore how this is achieved.

We have seen above that in CAM we generate heatmaps by taking the weighted average of layer output channels using the weights of the fully-connected network that’s connected to the output class.

In Grad-CAM, we do something similar, the only difference being the way in which these weights are generated. The gradient of the output class value with regards to each channel in the feature map of a certain layer is calculated. This results in a gradient channel map with each channel in it representing the gradient of the corresponding channel in a feature map.

The gradient channel map obtained is then global average pooled, and the values obtained henceforth are the weights of importance of each channel in the feature map. The weighted feature map is then used as a heatmap, just like in CAM. Since the gradient output can be calculated with regards to any layer, there’s no restriction of only using a final layer—also there’s no mention of network architecture anywhere, so any kind of architecture works.

Keras code to generate heatmap with Grad-CAM

class_idx = np.argmax(predict[i], axis=-1)

class_output = model.output[:, class_idx]

last_conv_layer = model.layers[-44]

class_output = model.output[:, class_idx]

img = x_test[i]

img = img.copy()

x = [img]

grads = k.gradients(class_output, last_conv_layer.output)[0]

pooled_grads = k.mean(grads, axis=(0, 1, 2))

iterate = k.function([model.input], [pooled_grads, last_conv_layer.output[0]])

pooled_grads_value, conv_layer_output_value = iterate([x])

for i in range(128):

conv_layer_output_value[:, :, i] *= pooled_grads_value[i]

heatmap = np.mean(conv_layer_output_value, axis=-1)

The entire source code is available here:

Below are some of the results that are achieved by generating heatmaps using Grad-CAM. As we can see, the main object is being focused on by the network, and hence the network is performing well.

Below is a cool representation of what the network concentrates on as we move from its initial layers to its final layers. In this case, the target output is glasses, and we can see that the initial layers aren’t concentrating exactly on the glasses, but we can also see that as we reach the final layers, they’re able to focus on the sunglasses.

As we can see from above Grad-CAM outputs are quite appealing and would definitely help a human in better understanding a neural network and in working along with it.

There is one limitation with Grad-CAM, though. If there’s more than one instance of a single class present in the image—for example if there are multiple cats present in an image for a cats vs dogs classifier—Grad-CAM won’t be able to identify all the instances of the class. This is solved using Grad-CAM++

The major update in Grad-CAM++ is during computing the GAP of the gradient channel map, only the positive values are considered and the negative values are ignored.

This is achieved by using a RELU on top of each value of the gradient channel map output. This approach of ignoring the negative gradients is commonly used in guided backpropagation.

The main reason for using only positive gradients is that we want to know which pixels have a positive impact on the class output—we don’t care which pixels have a negative impact. Also, if we don’t neglect the negative gradient, the negative values cancel few of the positive values in the gradient channel map while performing global average pooling. As a result, we lose valuable information, which can be avoided by neglecting negative gradients.

GradCAM Source: https://arxiv.org/pdf/1610.02391.pdf

GradCAM++ Source: https://arxiv.org/pdf/1710.11063.pdf