Summary: This post – part of a series about Machine Learning (ML) and verification – discusses the hard problem of verifying ML-based systems, and how “Explainable AI” might help.

I can think of three interesting topics whose title contains both “ML” and “verification”:

Verifying ML-based systems

Using a verification infrastructure to train ML-based systems

Using ML to help verification (of anything)

These are all fun topics, and (because ML is such a fast-moving field) there have been several recent, relevant developments. However, this is too much for one post, so I plan to discuss these issues over a series of posts.

Specifically, here are some items I’d like to talk about (the current post will only touch the first two):

Verifying ML-based systems is hard – mainly because they are opaque (“They have a resume but no spec”). But CDV may help.

There is a new DARPA program for “Explainable AI”, which (if successful) will also help ML verification

New research in AI safety (e.g. the article “Concrete problems in AI safety”) is very relevant to ML verification

There are attempt to make ML-based systems “know what they know” – also quite relevant for verification

Using a verification environment to train ML-based systems may be a good idea: This (somewhat-oddball) direction essentially says: If we already have a verification environment capable of producing rare corner cases, why not use it to actually train the ML-based system (since otherwise rare events will not appear in enough training data)

Using ML to help verification (of anything) may be a good idea: ML can help in failure clustering, automatic debugging, coverage maximization (my favorite) and so on.

As always, keep in mind that I am coming from the verification (and not the ML) side of the house. Your comments are very welcome.

The current (sad) state of verifying ML-based systems

Verification of complex systems is hard but extremely important. [sub]systems created via ML are even harder to verify. I discussed this in a previous post, where I said:

The big issue is that an ML system is opaque (black-box). Yes, this is still all just SW, and you can certainly look at it, but it is very hard to interpret: For instance, the knowledge learned by NNs is encoded as numerical weights of various node-to-node connections. You can print them out and stare at them all day long, but it will not help you much. And yes, people have tried to convert those numbers into readable rules / decision trees (this is called “rule extraction”), but this yields approximate results and does not scale well.

Because ML systems are opaque, you cannot really reason about what they do. Also, you can’t do modular (as in module-by-module) verification. And you can never be sure what they’ll do about a situation never encountered before. Finally, when fixing a bug (e.g. by adding the buggy situation + correct output to the learning set), you can never be sure (without a lot of testing) that the system has fixed “the full bug” and not just some manifestations of it.

So, it is a hard problem, but it must be tackled, because ML-based systems are quickly gaining ground.

In the context of AVs (and other Intelligent Autonomous Systems), a big issue with ML-based systems is rare edge cases. Such cases (near-accidents, goat-on-the-road, combinations of low-probability occurrences) are rare in the training sets, and machine learning algorithms don’t learn well from small sets.

As discussed in the above-mentioned post, CDV-based verification might be a good solution for verifying ML systems:

The idea here is to use the usual CDV tool suite for ML systems testing: Automatic constraint-based test generation (enhanced with use cases / scenarios), automatic checking and coverage collection. CDV-style testing of ML systems has its own issues, but given the unexciting alternatives, I think it will eventually emerge as at least a strong contender.

However, if an ML-based system is completely opaque, one can only do black-box CDV. This works, but it means you have to do your checking, coverage collection and debugging based only on inputs and outputs.

Gray-box CDV would be much better, which brings us to:

Explainable AI

DARPA just came out with a new program – Explainable Artificial Intelligence. The full announcement is here (pdf – pretty interesting up to page 19). This is what they are aiming for:

Explainable AI—especially explainable machine learning—will be essential if future warfighters are to understand, appropriately trust, and effectively manage an emerging generation of artificially intelligent machine partners. …

New machine-learning systems will have the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future. The strategy for achieving that goal is to develop new or modified machine-learning techniques that will produce more explainable models.

Explainable AI, if it can be achieved, would be a Very Good Thing. In addition to helping with issues of trust and management as described by DARPA above, it will also have huge usability, social and legal ramifications – see for instance If a driverless car goes bad we may never know why.

And (you knew this was coming) it will also help verification. Here is how: We could (hopefully) use those explanations as partial replacement for the (missing) source code, thus counteracting (to some degree) the opaqueness which makes verifying ML systems so hard. For instance, we could use this “pseudo source code” for:

Monitoring and coverage extraction

Partial checking

Debugging what went wrong upon failure

Perhaps (if the explanations are modular) some modular verification

All that hinges, of course, on Explainable AI actually happening. Will it? This is a very hard problem, and some of the initial tries have big issues (see below). On the other hand, the need is clearly there, DARPA has a pretty good track record of success in its long-term programs, and ML research keeps producing surprising solutions to almost-impossible problems. So maybe.

Why Explainable AI is hard

“Explainable ML” indeed sounds almost like a contradiction in terms. Much of the power of e.g. Deep Neural Networks stems from the fact that during the training process internal nodes come to represent various abstract features, which (usually) do not correspond directly to any terms in English – they are just the features that the learning process “discovered”.

Consider the paper Generating Visual Explanations (pdf), mentioned in the DARPA proposal as one possible early direction. In the paper they describe how they augmented an ML classifier (for images of wild birds) with an explanation system, which can produce sentences like “This is a Laysan Albatross because this bird has a large wingspan, hooked yellow beak, and white belly”.

Pretty impressive, huh? However, if I understand correctly, those explanations are based on a separate ML system which was trained to produce descriptions of bird images, using a bunch of (image, description) pairs created by people. The explanation system then takes care to produce “relevant” explanations. E.g. it does not mention the yellow beak if it does not appear in that particular image, or if it does not distinguish well between bird species.

So, these explanations are impressive, but they may have nothing to do with how the wild-bird-classifier actually classified. For instance, it may have decided that this is an Albatross because it appears over water (but the description writers were explicitly told to ignore the bird’s surroundings). Or it may have made the decision because of some indescribable pattern of colors on the wing, combined with some specific angle of the wing, or whatever. And none of this would appear in the explanation, because those internal DNN features do not correspond to any of the terminology used by the description writers.

One (partial) solution for avoiding such “pseudo-explanations” is to split a big ML system into smaller ML modules which are allowed to communicate only via human-defined terminology. For instance, we could decide to replace our full-AV-control ML system with two ML modules – one for scene-recognition and one for acting upon the scene (e.g. turning the wheel), with the further constraint that first module can only produce human-understood scene descriptions like “person_approaching, distance=2, speed=7, angle=30”.

This would make the full-AV-control system much more explainable, but (probably) much less capable. So people would normally not do that, except if quality-of-explanation was so important that you would be willing to sacrifice e.g. ML prediction power for it.

BTW, another reason you might connect ML modules via human-understood-terms is during prototyping: If you are busy hooking up ML modules to achieve some effect, you really need visibility into the interfaces (at least initially), and in any case you may be reusing ML modules whose output was originally meant for human consumption.

Of course, if we wanted to preserve optimal performance, we could just have a separate scene-recognition module, working in parallel to the real system and producing those human-understood descriptions. This would again supply just “pseudo-explanations” for the real behavior of the full system.

But how bad are “pseudo explanations”?

They could be pretty bad, if you are trying to understand why a system is making a critical decision, and it is (in effect) lying to you

However, if these explanations are really important, one can probably build a semi-manual process for improving them: For instance, one can build a system which measures how un-predictive these pseudo-explanations are, and (beyond a certain threshold) asks humans to guess words which would help narrow the difference (“Ah, looks like it needs a term like ‘the bird is over water’”).

Finally, the needs for verification are different from the need for “true explanation”. Verification could probably use “pseudo-explanations” for coverage collection, checking, debugging etc. – it won’t be ideal, but would still be helpful.

Other attempts at Explainable AI

People have tried other ways to explain ML systems, which only use the inputs in the explanation: For ML systems whose input is text, the explanation is the subset of the input words which most contributed to the decision, perhaps with weights saying how much they contributed. If the input is an image, the explanation would be the set of pixels which most contributed, and so on.

One example of this scheme is here (pdf): The ML-system-to-be-explained is a system which answers text questions about pictures (“What vegetable is on the plate?”), and so unsurprisingly its “explanations” can be either highlighted words (figure 2a in the paper) or highlighted pixels (figure 2b). They produce this via a kind of sensitivity analysis: E.g. to determine which pixels are important, they cover various pixels in the image and see how this influences the ML’s result.

A somewhat-similar scheme is used in this paper (pdf), also mentioned in the DARPA proposal, which produces model-agnostic explanations: In other words, it can explain the results of any ML system, looking at just its inputs and outputs, and thus can even help you evaluate (via the explanations) which of several competing systems is better.

These input-only schemes have the advantage that they do not depend on (possibly-misleading) user-supplied higher-level concepts. Obviously, they have the disadvantage that no higher-level concepts are produced, making them less applicable to verification. Still, for the purpose of debugging / understanding they may be helpful.

What we (as verification people) would really like to have is not just explanations for specific ML outputs, but rather an explainable model, i.e. a written description (e.g. a set of rules) of how the ML system would behave in every case. This would also be very useful outside of verification, for people who would like to understand the behavior of the system.

In the general case, this is probably way too much to ask for. The last paper mentioned above does talk about producing a model, but what it means is “producing input-only explanations for N diverse-and-separate points”. It is the user who is supposed to integrate those N explanations into a coherent whole.

However, it may be possible to build such an approximate model in special cases. For instance, if you know that your ML system sort-of behaves like a state machine, you can create a second ML system which learns that state machine (from input-output sequences, like in this paper), and then shows it.

To summarize:

Explainable AI is hard to achieve, but achieving it would be useful for many things, including verification

This will take a while (the DARPA project is ~5 years), but us verification people can probably use intermediate results (like pseudo-explanations and special-case models)

In any case this just part of the ML-verification puzzle

Stay tuned for more ML-verification posts soon.

[Added 1-Sep-2016: Here is the next installment of the ML-and-verification series]

Notes

I’d like to thank Amiram Yehudai, Sandeep Desai, Eli Sennesh and Thomas (Blake) French for commenting on previous versions of this post.