How researchers are teaching AI to learn like a child

It's a Saturday morning in February, and Chloe, a curious 3-year-old in a striped shirt and leggings, is exploring the possibilities of a new toy. Her father, Gary Marcus, a developmental cognitive scientist at New York University (NYU) in New York City, has brought home some strips of tape designed to adhere Lego bricks to surfaces. Chloe, well-versed in Lego, is intrigued. But she has always built upward. Could she use the tape to build sideways or upside down? Marcus suggests building out from the side of a table. Ten minutes later, Chloe starts sticking the tape to the wall. "We better do it before Mama comes back," Marcus says in a singsong voice. "She won't be happy." (Spoiler: The wall paint suffers.)

Implicit in Marcus's endeavor is an experiment. Could Chloe apply what she had learned about an activity to a new context? Within minutes, she has a Lego sculpture sticking out from the wall. "Papa, I did it!" she exclaims. In her adaptability, Chloe is demonstrating common sense, a kind of intelligence that, so far, computer scientists have struggled to reproduce. Marcus believes the field of artificial intelligence (AI) would do well to learn lessons from young thinkers like her.

Researchers in machine learning argue that computers trained on mountains of data can learn just about anything—including common sense—with few, if any, programmed rules. These experts "have a blind spot, in my opinion," Marcus says. "It's a sociological thing, a form of physics envy, where people think that simpler is better." He says computer scientists are ignoring decades of work in the cognitive sciences and developmental psychology showing that humans have innate abilities—programmed instincts that appear at birth or in early childhood—that help us think abstractly and flexibly, like Chloe. He believes AI researchers ought to include such instincts in their programs.

Yet many computer scientists, riding high on the successes of machine learning, are eagerly exploring the limits of what a naïve AI can do. "Most machine learning people, I think, have a methodological bias against putting in large amounts of background knowledge because in some sense we view that as a failure," says Thomas Dietterich, a computer scientist at Oregon State University in Corvallis. He adds that computer scientists also appreciate simplicity and have an aversion to debugging complex code. Big companies such as Facebook and Google are another factor pushing AI in that direction, says Josh Tenenbaum, a psychologist at the Massachusetts Institute of Technology (MIT) in Cambridge. Those companies are most interested in narrowly defined, near-term problems, such as web search and facial recognition, in which blank-slate AI systems can be trained on vast data sets and work remarkably well.

But in the longer term, computer scientists expect AIs to take on much tougher tasks that require flexibility and common sense. They want to create chatbots that explain the news, autonomous taxis that can handle chaotic city traffic, and robots that nurse the elderly. "If we want to build robots that can actually interact in the full human world like C-3PO," Tenenbaum says, "we're going to need to solve all of these problems in much more general settings."

Some computer scientists are already trying. In February, MIT launched Intelligence Quest, a research initiative now raising hundreds of millions of dollars to understand human intelligence in engineering terms. Such efforts, researchers hope, will result in AIs that sit somewhere between pure machine learning and pure instinct. They will boot up following some embedded rules, but will also learn as they go. "In some sense this is like the age-old nature-nurture debate, now translated into engineering terms," Tenenbaum says.

Part of the quest will be to discover what babies know and when—lessons that can then be applied to machines. That will take time, says Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington. AI2 recently announced a $125 million effort to develop and test common sense in AI. "We would love to build on the representational structure innate in the human brain," Etzioni says, "but we don't understand how the brain processes language, reasoning, and knowledge."

Different minds Over time, artificial intelligence (AI) has shifted from algorithms that rely on programmed rules and logic—instincts—to machine learning, where algorithms contain few rules and ingest training data to learn by trial and error. Human minds sit somewhere in the middle. INSTINCT LEARNING Rule-based AI Humans Machine learning AI IBM’s Deep Blue, which bested chess champion Garry Kasparov in 1997, relied on rules and logic. Babies learn by trial and error. But developmental cognitive scientists say we also begin life with basic instincts that help us gain a flexible common sense. With few programmed rules, DeepMind’s AlphaZero today can beat IBM’s Deep Blue at chess. But it doesn’t generalize.

Ultimately, Tenenbaum says, "We're trying to take one of the oldest dreams of AI seriously: that you could build a machine that grows into intelligence the way a human does—that starts like a baby and learns like a child."

In the past few years, AI has shown that it can translate speech, diagnose cancer, and beat humans at poker. But for every win, there is a blunder. Image recognition algorithms can now distinguish dog breeds better than you can, yet they sometimes mistake a chihuahua for a blueberry muffin. AIs can play classic Atari video games such as Space Invaders with superhuman skill, but when you remove all the aliens but one, the AI falters inexplicably.

Machine learning—one type of AI—is responsible for those successes and failures. Broadly, AI has moved from software that relies on many programmed rules (also known as Good Old-Fashioned AI, or GOFAI) to systems that learn through trial and error. Machine learning has taken off thanks to powerful computers, big data, and advances in algorithms called neural networks. Those networks are collections of simple computing elements, loosely modeled on neurons in the brain, that create stronger or weaker links as they ingest training data.

With its Alpha programs, Google's DeepMind has pushed deep learning to its apotheosis. Each time rules were subtracted, the software seemed to improve. In 2016, AlphaGo beat a human champion at Go, a classic Chinese strategy game. The next year, AlphaGo Zero easily beat AlphaGo with far fewer guidelines. Months later, an even simpler system called AlphaZero beat AlphaGo Zero—and also mastered chess. In 1997, a classic, rule-based AI, IBM's Deep Blue, had defeated chess champion Garry Kasparov. But it turns out that true chess virtuosity lies in knowing the exceptions to the exceptions to the exceptions—information best gleaned through experience. AlphaZero, which learns by playing itself over and over, can beat Deep Blue, today's best chess programs, and every human champion.

Yet systems such as Alpha clearly are not extracting the lessons that lead to common sense. To play Go on a 21-by-21 board instead of the standard 19-by-19 board, the AI would have to learn the game anew. In the late 1990s, Marcus trained a network to take an input number and spit it back out—about the simplest task imaginable. But he trained it only on even numbers. When tested with odd numbers, the network floundered. It couldn't apply learning from one domain to another, the way Chloe had when she began to build her Lego sideways.

The answer is not to go back to rule-based GOFAIs. A child does not recognize a dog with explicit rules such as "if number of legs=4, and tail=true, and size>cat." Recognition is more nuanced—a chihuahua with three legs won't slip past a 3-year-old. Humans are not blank slates, nor are we hardwired. Instead, the evidence suggests we have predispositions that help us learn and reason about the world. Nature doesn't give us a library of skills, just the scaffolding to build one.

Harvard University psychologist Elizabeth Spelke has argued that we have at least four "core knowledge" systems giving us a head start on understanding objects, actions, numbers, and space. We are intuitive physicists, for example, quick to understand objects and their interactions. According to one study, infants just 3 days old interpret the two ends of a partially hidden rod as parts of one entity—a sign that our brains might be predisposed to perceive cohesive objects. We're also intuitive psychologists. In a 2017 Science study, Shari Liu, a graduate student in Spelke's lab, found that 10-month-old infants could infer that when an animated character climbs a bigger hill to reach one shape than to reach another, the character must prefer the former. Marcus has shown that 7-month-old infants can learn rules; they show surprise when three-word sentences ("wo fe fe") break the grammatical pattern of previously heard sentences ("ga ti ga"). According to later research, day-old newborns showed similar behavior.

Marcus has composed a minimum list of 10 human instincts that he believes should be baked into AIs, including notions of causality, cost-benefit analysis, and types versus instances (dog versus my dog). Last October at NYU, he argued for his list in a debate on whether AI needs "more innate machinery," facing Yann LeCun, an NYU computer scientist and Facebook's chief AI scientist. To demonstrate his case for instinct, Marcus showed a slide of baby ibexes descending a cliff. "They don't get to do million-trial learning," he said. "If they make a mistake, it's a problem."

LeCun, disagreeing with many developmental psychologists, argued that babies might be learning such abilities within days, and if so, machine learning algorithms could, too. His faith comes from experience. He works on image recognition, and in the 1980s he began arguing that hand-coded algorithms to identify features in pictures would become unnecessary. Thirty years later, he was proved right. Critics asked him: "Why learn it when you can build it?" His reply: Building is hard, and if you don't fully understand how something works, the rules you devise are likely to be wrong.

But Marcus pointed out that LeCun himself had embedded one of the 10 key instincts into his image-recognition algorithms: translational invariance, the ability to recognize an object no matter where it appears in the visual field. Translational invariance is the principle behind convolutional neural networks, or convnets, LeCun's greatest claim to fame. In the past 5 years they've become central to image recognition and other AI applications, kicking off the current craze for deep learning.

LeCun tells Science that translational invariance, too, could eventually emerge on its own with better general learning mechanisms. "A lot of those items will kind of spontaneously pop up as a consequence of learning how the world works," he says. Geoffrey Hinton, a pioneer of deep learning at the University of Toronto in Canada, agrees. "Most of the people who believe in strong innate knowledge have an unfounded belief that it's hard to learn billions of parameters from scratch," he says. "I think recent progress in deep learning has shown that it is actually surprisingly easy."

The debate over where to situate an AI on a spectrum between pure learning and pure instinct will continue. But that issue overlooks a more practical concern: how to design and code such a blended machine. How to combine machine learning—and its billions of neural network parameters—with rules and logic isn't clear. Neither is how to identify the most important instincts and encode them flexibly. But that hasn't stopped some researchers and companies from trying.

A robotics laboratory at The University of New South Wales in Sydney, Australia, is dressed to look like a living room and kitchen—complete with bottles of James Boag's Premium Lager in the fridge. Computer scientist Michael Thielscher explains that the lab is a testbed for a domestic robot. His team is trying to endow a Toyota Human Support Robot (HSR), which has one arm and a screen for a face, with two humanlike instincts. First, they hope to program the HSR to decompose challenges into smaller, easier problems, just as a person would parse a recipe into several steps. Second, they want to give the robot the ability to reason about beliefs and goals, the way humans instinctively think about the minds of others. How would the HSR respond if a person asked it to fetch a red cup, and it found only a blue cup and a red plate?

So far, their software shows some humanlike abilities, including the good sense to fetch the blue cup rather than the red plate. But more rules are programmed into the system than Thielscher would like. His team has had to tell their AI that cup is usually more important than red. Ideally, a robot would have the social instincts to quickly learn people's preferences on its own.

Other researchers are working to inject their AIs with the same intuitive physics that babies seem to be born with. Computer scientists at DeepMind in London have developed what they call interaction networks. They incorporate an assumption about the physical world: that discrete objects exist and have distinctive interactions. Just as infants quickly parse the world into interacting entities, those systems readily learn objects' properties and relationships. Their results suggest that interaction networks can predict the behavior of falling strings and balls bouncing in a box far more accurately than a generic neural network.

Vicarious, a robotics software company in San Francisco, California, is taking the idea further with what it calls schema networks. Those systems, too, assume the existence of objects and interactions, but they also try to infer the causality that connects them. By learning over time, the company's software can plan backward from desired outcomes, as people do. (I want my nose to stop itching; scratching it will probably help.) The researchers compared their method with a state-of-the-art neural network on the Atari game Breakout, in which the player slides a paddle to deflect a ball and knock out bricks. Because the schema network could learn about causal relationships—such as the fact that the ball knocks out bricks on contact no matter its velocity—it didn't need extra training when the game was altered. You could move the target bricks or make the player juggle three balls, and the schema network still aced the game. The other network flailed.

Besides our innate abilities, humans also benefit from something most AIs don't have: a body. To help software reason about the world, Vicarious is "embodying" it so it can explore virtual environments, just as a baby might learn something about gravity by toppling a set of blocks. In February, Vicarious presented a system that looked for bounded regions in 2D scenes by essentially having a tiny virtual character traverse the terrain. As it explored, the system learned the concept of containment, which helps it make sense of new scenes faster than a standard image-recognition convnet that passively surveyed each scene in full. Concepts—knowledge that applies to many situations—are crucial for common sense. "In robotics it's extremely important that the robot be able to reason about new situations," says Dileep George, a co-founder of Vicarious. Later this year, the company will pilot test its software in warehouses and factories, where it will help robots pick up, assemble, and paint objects before packaging and shipping them.

One of the most challenging tasks is to code instincts flexibly, so that AIs can cope with a chaotic world that does not always follow the rules. Autonomous cars, for example, cannot count on other drivers to obey traffic laws. To deal with that unpredictability, Noah Goodman, a psychologist and computer scientist at Stanford University in Palo Alto, California, helps develop probabilistic programming languages (PPLs). He describes them as combining the rigid structures of computer code with the mathematics of probability, echoing the way people can follow logic but also allow for uncertainty: If the grass is wet it probably rained—but maybe someone turned on a sprinkler. Crucially, a PPL can be combined with deep learning networks to incorporate extensive learning. While working at Uber, Goodman and others invented such a "deep PPL," called Pyro. The ride-share company is exploring uses for Pyro such as dispatching drivers and adaptively planning routes amid road construction and game days. Goodman says PPLs can reason not only about physics and logistics, but also about how people communicate, coping with tricky forms of expression such as hyperbole, irony, and sarcasm.

Chloe might not master sarcasm until her teen years, but her inborn knack for language is already clear. At one point in Marcus's apartment, she holds out a pair of stuck Lego bricks. "Papa, can you untach this for me?" Her father obliges without correcting her coinage. Words and ideas are like Lego pieces, their parts readily mixed and matched, and eagerly tested in the world.

After Chloe tires of building on the wall, an older, slightly more seasoned intelligent system gets a chance to try it: her brother Alexander, age 5. He quickly constructs a Lego building that protrudes farther. "You can see the roots of what he's doing in what she did," Marcus says. When asked, Alexander estimates how far the structure might stick out before it would fall. "He's pretty well-calibrated," Marcus observes. "He hasn't had 10 million trials of glued-on-the-wall Lego things in order to assess the structural integrity. He's taking what he knows about physics, and so forth, and making some inferences."

Marcus is obviously proud, not only of his offspring's capabilities, but also that they uphold his theories of how we learn about the world—and how AIs should be learning, too. Done with their Lego buildings, Chloe and Alexander leap into their father's arms. They squeal with delight as he spins them around and around, offering them another chance to fine-tune their intuitive senses of physics—and fun.