A new data set reveals just how bad AI is at reasoning—and suggests that a new hybrid approach might be the best way forward.

Questions, questions: Known as CLEVRER, the data set consists of 20,000 short synthetic video clips and more than 300,000 question and answer pairings that reason about the events in the videos. Each video shows a simple world of toy objects that collide with one another following simulated physics. In one, a red rubber ball hits a blue rubber cylinder, which continues on to hit a metal cylinder.

The questions fall into four categories: descriptive (e.g., “What shape is the object that collides with the cyan cylinder?”), explanatory (“What is responsible for the gray cylinder’s collision with the cube?”), predictive (“Which event will happen next?”), and counterfactual (“Without the gray object, which event will not happen?”). The questions mirror many of the concepts that children learn early on as they explore their surroundings. But the latter three categories, which specifically require causal reasoning to answer, often stump deep-learning systems.

Fail: The data set, created by researchers at Harvard, DeepMind, and MIT-IBM Watson AI Lab is meant to help evaluate how well AI systems can reason. When the researchers tested several state-of-the-art computer vision and natural language models with the data set, they found that all of them did well on the descriptive questions but poorly on the others.

Mixing the old and the new: The team then tried a new AI system that combines both deep learning and symbolic logic. Symbolic systems used to be all the rage before they were eclipsed by machine learning in the late 1980s. But both approaches have their strengths: deep learning excels at scalability and pattern recognition; symbolic systems are better at abstraction and reasoning.

The composite system, known as a neuro-symbolic model, leverages both: it uses a neural network to recognize the colors, shapes, and materials of the objects and a symbolic system to understand the physics of their movements and the causal relationships between them. It outperformed existing models across all categories of questions.

Why it matters: As children, we learn to observe the world around us, infer why things happened and make predictions about what will happen next. These predictions help us make better decisions, navigate our environments, and stay safe. Replicating that kind of causal understanding in machines will similarly equip them to interact with the world in a more intelligent way.