Researchers trying to understand how neural networks work shouldn’t just focus on interpretable neurons, according to new research from DeepMind researchers.

AI systems are often described as black boxes. It’s difficult to understand how they work and reach particular outcomes, making people nervous about using them to make important decisions in areas such as healthcare or recruitment.

Making neural networks more interpretable is hot topic in research. It’s possible to look at the connections between different groups of neurons and visualise which ones correspond to a specific class.

If an image classification model is fed different types of pictures, say an image of a cat or dog, researchers can find the ‘cat neurons’ or a ‘dog neurons’.

These interpretable neurons are important as they are the ones that push the neural network to a particular answer, in this case it’s whether the animal in the image is a cat or dog.

A paper from DeepMind to be presented at the International Conference on Learning Representations (ICLR) next month in April shows that studying these interpretable neurons alone isn’t enough to understand how deep learning truly works.

“We measured the performance impact of damaging the network by deleting individual neurons as well as groups of neurons,” according to a blog post.

Deleting groups of neurons changes the strength of the connections between other neurons and can make the neural network performance drop. For example, if the cat neurons are deleted and the model is shown a picture of a cat, it might be more difficult to identify the animal correctly and its accuracy decreases.

But the results showed that these class-specific neurons weren’t all that important after all. After deleting these interpretable neurons, the performance levels didn’t change by much.

On one hand, it’s a little disheartening to find out that looking into the interpretable neurons isn’t enough to untangle the inner workings of a neural network. But it’s not all too surprising either, because it means that the models that were less affected after having their neurons deleted don’t rely on memorizing training data, and generalise better to new images. And that’s how neural networks should work, really.

DeepMind hope to “explain the role of all neurons, not just those which are easy-to-interpret”.

“We hope to better understand the inner workings of neural networks, and critically, to use this understanding to build more intelligent and general systems,” it concluded. ®