Explainability v.s Deep Learning

Explainability is like a ghost that has been hunting deep learning researchers for many years. You can hear many people ask AI researchers to provide more explainability for their deep learning models from time to time. However, we, human beings, seem to forget that we often also don’t know how to explain our behaviors, especially for system-1 tasks.

Maybe let me explain system-1 tasks and system-2 tasks a bit. This concept was first introduced in the book “Thinking, Fast and Slow”. System-1 tasks are done fast and unconsciously such as recognizing if it’s a dog or a cat, understanding a simple sentence, and driving a car on an empty road.

On the contrary, system-2 tasks are done slowly and consciously, for example, digging into your memory to recognize a sound, solving 17 × 24, and giving someone your phone number. So, when your brain is performing a task, like identifying if an object is a dog, it takes many raw inputs (pixies), feeds those into many layers of billions of neurons, and then it draws a conclusion.

And if I ask you why you think it is a dog instead of wolf, you will probably need to spend quite a long time to explain why you think so. You might come up with some reasons (e.g. its mouse is sharper and so on), but you probably didn’t make your decision because of any of those.

And this kind of task (system 1) is what exactly deep learning models are good at for now. In case you are curious, deep learning models for object classification usually have a very similar mechanism to how human brains do this. There are also different layers of neurons, and the deeper the layer is, the more abstract concepts it can understand, and it makes a decision in the very last layer.

As we hardly can even explain ourselves, why are we always asking machines to explain themselves? I actually spent some time thinking about this question. I believe there are two different reasons for this.

1. Lack of Dark Knowledge

Even deep learning models can outperform humans in many different tasks, but they sometimes make very unbelievable mistakes that humans will never make — for example, classifying a dog as an airplane. If a human fails to classify a dog as a dog, they probably will guess it is a cat or a wolf, which is not too far from a dog in terms of common sense. We call this kind of common sense “dark knowledge”.

And whenever people see such mistakes made by a model, they just think it is crazy, it’s a black box, and want to ask for an explanation. The major problem for this is not about explainability, and it is how we can let machines learn such dark knowledge and how we can represent it in a machine learnable way.

2. Lack of Consistency

Machine learning models usually can make very different predictions for certain cases when they were trained on the same data. To be concrete, 10 different models might have very similar accuracies for the test data, but they might be good at and bad at very different kinds of case. This means they see things very differently. Some cases may be very difficult for one model but extremely easy for another. (Maybe two models will have a high agreement if they use similar architectures. I have not done much research on this yet.)

They are not very consistent, which is quite different from humans. The images which are hard for humans often have very similar attributes. That’s why we think humans are much more interpretable than deep learning models. In my opinion, the main problem for this is not lack of explainability, but we just function differently from models.

Solutions?

We can actually sum up two reasons above to one concept, which is we want models to be like us. I do believe this is something related to our evolution. Humans prefer humans than any other species, and having a machine that acts like humans makes us feel safe. But the problem is if machines can already outperform us, why should we force than to be like us?

There are typically two different angles for this among AI researchers. One is we should not force any human biases to models, and they should just learn every thing from data or rewards by themselves. The best example for this is AlphaZero, which learns everything from playing with itself, and eventually became the strongest Go and Chess player of all time. It even came up with some moves that humans had ever thought of, and humans ended up learning a lot from it.

Another side of the argument is that we should introduce human knowledge to machines. I do think it makes sense (somehow), but the problem with this approach is that we have not yet found good ways to represent human knowledge and implement it. What most people currently do is using symbolic AI approaches (using many handcrafted rules to guide machines). This makes models less flexible because you will need experts to handcraft rules for every different problem, and humans can also introduce wrong assumptions to machines.

(A discussion between Yann LeCun and Christopher Manning might help you understand those two views more.)

Encoding human knowledge into models in a more machine learning way is what AI researchers should overcome in the next decade, and it might be the key to progress our models from performing system-1 tasks to system-2 ones. And this line of research has not been valued in academia that much yet. If you are interested in this topic, you probably will not want to miss an epic talk given by Yoshua Bengio at NeurIPS 2019.



I write articles about deep learning, AI, and natural language understanding so follow me if you are interested.