Since the 1950s, artificial intelligence has repeatedly overpromised and underdelivered. While recent years have seen incredible leaps thanks to deep learning, AI today is still narrow: it’s fragile in the face of attacks, can’t generalize to adapt to changing environments, and is riddled with bias. All these challenges make the technology difficult to trust and limit its potential to benefit society.

On March 26 at MIT Technology Review’s annual EmTech Digital event, two prominent figures in AI took to the virtual stage to debate how the field might overcome these issues.

Gary Marcus, professor emeritus at NYU and the founder and CEO of Robust.AI, is a well-known critic of deep learning. In his book Rebooting AI, published last year, he argued that AI’s shortcomings are inherent to the technique. Researchers must therefore look beyond deep learning, he argues, and combine it with classical, or symbolic, AI—systems that encode knowledge and are capable of reasoning.

Danny Lange, the vice president of AI and machine learning at Unity, sits squarely in the deep-learning camp. He built his career on the technique’s promise and potential, having served as the head of machine learning at Uber, the general manager of Amazon Machine Learning, and a product lead at Microsoft focused on large-scale machine learning. At Unity, he now helps labs like DeepMind and OpenAI construct virtual training environments that teach their algorithms a sense of the world.

During the event, each speaker gave a short presentation and then sat down for a panel discussion. The disagreements they expressed mirror many of the clashes within the field, highlighting how powerfully the technology has been shaped by a persistent battle of ideas and how little certainty there is about where it’s headed next.

Below, their panel discussion has been condensed and lightly edited for clarity.

Gary, you draw upon your expertise in neuroscience and psychology to figure out what’s currently missing in AI. What is it about classical AI that you think makes it the right system to combine with deep learning?

Gary Marcus: The first thing I’ll say is we may need hybrids that are more complicated than just deep learning plus classical AI. We need at least that. But there may be a whole bunch of things we haven’t even dreamed of yet. We need to be open-minded.

Why add classical AI to the mix? Well, we do all kinds of reasoning based on our knowledge in the world. Deep learning just doesn’t represent that. There’s no way in these systems to represent what a ball is or what a bottle is and what these things do to one another. So the results look great, but they’re typically not very generalizable.

Classical AI—that’s its wheelhouse. It can, for example, parse a sentence to its semantic representation, or have knowledge about what’s going on in the world and then make inferences about that. It has its own problems: it usually doesn’t have enough coverage, because too much of it is hand-written and so forth. But at least in principle, it’s the only way we know to make systems that can do things like logical inference and inductive inference over abstract knowledge. It still doesn’t mean it’s absolutely right, but it’s by far the best that we have.

And then there’s a lot of psychological evidence that people can do some level of symbolic representation. In my prior life as a cognitive development person, I did experiments with seven-month-old infants and showed that those infants could generalize symbolic knowledge. So if a seven-month-old infant can do it, then why are we holding our hands behind our back trying to build AI without mechanisms that infants have?

Have you seen any projects where they have successfully combined deep learning and symbolic AI in promising ways?

GM: In an article I wrote called “The Next Decade in AI,” I listed about 20 different recent projects that try to put together hybrid models that have some deep learning and some symbolic knowledge. One example that everybody knows is Google search. When you type in a search query, there’s some classical AI there that is trying to disambiguate words. It’s trying to figure out when you talk about “Paris,” are you talking about Paris Hilton, Paris, Texas, or Paris, France, using Google knowledge graph. And then it uses deep learning to do some other stuff—for example, to find synonyms using the BERT model. Of course, Google search is not the AI that we’re ultimately hoping to achieve, but it’s pretty solid proof that this is not an impossible dream.

Danny, do you agree that we should be looking at these hybrid models?

Danny Lange: No, I do not agree. The issue I have with symbolic AI is its attempt to try to mimic the human brain in a very deep sense. It reminds me a bit of, you know, in the 18th century if you wanted faster transportation, you would work on building a mechanical horse rather than inventing the combustion engine. So I’m very skeptical of trying to solve AI by trying to mimic the human brain.

Deep learning is not necessarily a silver bullet, but if you feed it enough data and you have the right neural-network architecture, it is able to learn abstractions that we as humans cannot interpret but that makes the system very efficient at solving a wide range of tasks.

It sounds like fundamentally both of you have a disagreement about what the goal of AI is.

GM: I think there’s an irony. When I had a debate with Yoshua Bengio in December, Bengio said that the only commitment of deep learning was that it be neurologically based. So I’ve heard both opposite extremes from deep learning. That’s a bit strange, and I don’t think we should take those arguments seriously.

Instead we should say, “Can symbols help us?” And the answer is, overwhelmingly, yes. Almost all the world’s software is built on symbols. And then you have to say, “Empirically, does the deep-learning stuff do what we want it to do?” And the problem so far is it has been model free. Vicarious [an AI-powered industrial robotics startup] had a great demonstration of an Atari game learning system that DeepMind made very popular, where it learned to play Breakout at a superhuman level. But then Vicarious moved the paddle a few pixels and the whole thing fell apart, because the level of learning was much too shallow. It didn’t have a concept of a paddle, a ball, a set of bricks. A symbolic algorithm for Breakout would very easily be able to compensate for those things.

The reason to look at humans is because there are certain things that humans do much better than deep-learning systems. That doesn’t mean humans will ultimately be the right model. We want systems that have some properties of computers and some properties that have been borrowed from people. We don’t want our AI systems to have bad memory just because people do. But since people are the only model of a system that can develop a deep understanding of something—literally the only model we’ve got—we need to take that model seriously.

DL: Yeah, so the example that the world’s programming languages are symbolic based—that’s true because they’re designed for humans to implement their ideas and thoughts.

Deep learning is not a replication of the human brain. Maybe you can say it’s inspired by the neural world, but it’s a piece of software. We haven’t really gone to great depth with deep learning yet. We’ve had a limited amount of training data so far. We’ve had limited structures with limited compute power. But the key point is that deep learning learns the concept, it learns the features. It’s not a human-engineered thing. I think the big difference between Gary’s approach and my approach is whether the human engineers give intelligence to the system or whether the system learns intelligence itself.

Danny, you mentioned that we haven’t really seen the potential of deep learning in full because of limitations in data and compute. Shouldn’t we be developing new techniques, given that deep learning is so inefficient? We’ve had to drastically increase compute in order to unlock new deep-learning abilities.

DL: One of the problems with deep learning is that it has really been based so far on a sort of classical approach: you generate a big training data set and then you feed it in. One thing that could really improve deep learning is to have an active learning process where the network is being trained to optimize the training data. You don’t have to just feed in a mind-numbing amount of data to improve the learning process. You can constantly tailor your training data to target a specific area.

Gary, you point out deep learning’s vulnerabilities to bias and to adversarial attacks. Danny, you mentioned that synthetic data is a solution to this because “there is no bias,” and you can run millions of simulations that presumably get rid of adversarial vulnerabilities. What are each of your responses to that?

GM: Data alone is not a solution yet. Synthetic data is not going to help with things like biases in loans or biases in job interviews. The real problem there is that these systems have a tendency to perpetuate biases that were there for historical reasons. It’s not obvious that synthetic data is the solution, as opposed to building systems that are sophisticated enough to understand the cultural biases that we’re trying to replace.

Adversarial attacks are a different kind of thing. Data might help with some of them, but so far we haven’t really eliminated the many different kinds of adversarial attacks. I showed you the baseball with foam on it being described as espresso. If somebody thinks in advance to make baseballs with espresso in simulation and label them carefully, fine. There’s always going to be some cases that nobody’s thought of. A system that’s purely data-driven is going to continue to be vulnerable.

DL: Real-world data is very biased, no matter what you do. You collect data in a certain environment, say for self-driving vehicles, and you have a representation of maybe 90% adults and 10% children in the streets. That’s the normal distribution. But a machine-learning system needs to train on even amounts of adults and children to safely avoid hitting either of them. So with synthetic data you’re basically able to balance out and avoid the bias if you’re careful. That doesn’t mean you can’t create new biases. You have to watch out for that. Certainly you solve privacy issues, because there’s no real humans or real children in any of your training data.

As for adversarial examples, the problem with a lot of them is that they’re basically being developed against weak computer-vision models—models that have been trained on 10 or 20 million images, say, from ImageNet. That’s far from enough data to actually generalize a model. We need large amounts of data sets with incredible amounts of domain randomization to generalize these computer-vision models so they actually don’t get fooled.

What is the thing you’re most excited about for the future of AI?

GM: There’s been a real movement toward hybrid models in the last year. People are exploring new things that they haven’t before, and that’s exciting.

DL: I think it’s really multi-model systems—systems that have been composed of many different models for perception and behavior together to solve real complex tasks.