It’s often assumed that as the complexity of an AI system increases, it becomes invariably less interpretable. But researchers have begun to challenge that notion with libraries like Facebook’s Captum, which explains decisions made by neural networks with the deep learning framework PyTorch, as well as IBM’s AI Explainability 360 toolkit and Microsoft’s InterpretML. In a bid to render AI’s decision-making even more transparent, a team hailing from Google and Stanford recently explored a machine learning model — Automated Concept-based Explanation (ACE) — that automatically extracts the “human-meaningful” visual concepts informing a model’s predictions.

As the researchers explain in a paper detailing their work, most machine learning explanation methods alter individual features (e.g., pixels, super-pixels, word-vectors) to approximate the importance of each to the target model. This is an imperfect approach, because it’s vulnerable to even the smallest shifts in the input.

By contrast, ACE identifies higher-level concepts by taking a trained classifier and a set of images within a class as input before extracting the concepts and sussing out each’s importance. Specifically, ACE segments images with multiple resolutions to capture several levels of texture, object parts, and objects before grouping similar segments as examples of the same concept and returning the most important concepts.

To test ACE’s robustness, the team tapped Google’s Inception-V3 image classifier model trained on the popular ImageNet data set and selected a subset of 100 classes out of the 1,000 classes in the data set to apply ACE. They note that the concepts flagged as important tended to followed human intuition — for instance, that a law enforcement logo was more important for detecting a police van than the asphalt on the ground. This wasn’t always so, however. In a less obvious example, the most important concept for predicting basketball images turned out to be players’ jerseys rather than the basketball. And when it came to the classification of carousels, the rides’ lights had greater sway than its seats and poles.

The researchers concede that ACE is by no means perfect — it struggles to meaningfully extract exceptionally complex or difficult concepts. But they believe the insights it provides into models’ learned correlations might promote safer use of machine learning.

“We verified the meaningfulness and coherency through human experiments and further validated that they indeed carry salient signals for prediction. [Our] method … automatically groups input features into high-level concepts; meaningful concepts that appear as coherent examples and are important for correct prediction of the images they are present in,” wrote the researchers. “The discovered concepts reveal insights into potentially surprising correlations that the model has learned.”