You probably can’t remember what it feels like to play Super Mario Bros. for the very first time, but try to picture it. An 8-bit game world blinks into being: baby blue sky, tessellated stone ground, and in between, a squat, red-suited man standing still—waiting. He’s facing rightward; you nudge him farther in that direction. A few more steps reveal a row of bricks hovering overhead and what looks like an angry, ambulatory mushroom. Another twitch of the game controls makes the man spring up, his four-pixel fist pointed skyward. What now? Maybe try combining nudge-rightward and spring-skyward? Done. Then, a surprise: The little man bumps his head against one of the hovering bricks, which flexes upward and then snaps back down as if spring-loaded, propelling the man earthward onto the approaching angry mushroom and flattening it instantly. Mario bounces off the squished remains with a gentle hop. Above, copper-colored boxes with glowing “?” symbols seem to ask: What now?

This scene will sound familiar to anyone who grew up in the 1980s, but you can watch a much younger player on Pulkit Agrawal’s YouTube channel. Agrawal, a computer science researcher at the University of California, Berkeley, is studying how innate curiosity can make learning an unfamiliar task—like playing Super Mario Bros. for the very first time—more efficient. The catch is that the novice player in Agrawal’s video isn’t human, or even alive. Like Mario, it’s just software. But this software comes equipped with experimental machine-learning algorithms designed by Agrawal and his colleagues Deepak Pathak, Alexei A. Efros, and Trevor Darrell at the Berkeley Artificial Intelligence Research Lab for a surprising purpose: to make a machine curious.

A computer agent imbued with curiosity teaches itself how to play Super Mario Bros. pathak22/noreward-rl

“You can think of curiosity as a kind of reward which the agent generates internally on its own, so that it can go explore more about its world,” Agrawal said. This internally generated reward signal is known in cognitive psychology as “intrinsic motivation.” The feeling you may have vicariously experienced while reading the game-play description above—an urge to reveal more of whatever’s waiting just out of sight, or just beyond your reach, just to see what happens—that’s intrinsic motivation.

Humans also respond to extrinsic motivations, which originate in the environment. Examples of these include everything from the salary you receive at work to a demand delivered at gunpoint. Computer scientists apply a similar approach called reinforcement learning to train their algorithms: The software gets “points” when it performs a desired task, while penalties follow unwanted behavior.

But this carrot-and-stick approach to machine learning has its limits, and artificial intelligence researchers are starting to view intrinsic motivation as an important component of software agents that can learn efficiently and flexibly—that is, less like brittle machines and more like humans and animals. Approaches to using intrinsic motivation in AI have taken inspiration from psychology and neurobiology—not to mention decades-old AI research itself, now newly relevant. (“Nothing is really new in machine learning,” said Rein Houthooft, a research scientist at OpenAI, an independent artificial intelligence research organization.)

Such agents may be trained on video games now, but the impact of developing meaningfully “curious” AI would transcend any novelty appeal. “Pick your favorite application area and I’ll give you an example,” said Darrell, co-director of the Berkeley Artificial Intelligence lab. “At home, we want to automate cleaning up and organizing objects. In logistics, we want inventory to be moved around and manipulated. We want vehicles that can navigate complicated environments and rescue robots that can explore a building and find people who need rescuing. In all of these cases, we are trying to figure out this really hard problem: How do you make a machine that can figure its own task out?”

The Problem With Points

Reinforcement learning is a big part of what helped Google’s AlphaGo software beat the world’s best human player at Go, an ancient and intuitive game long considered invulnerable to machine learning. The details of successfully using reinforcement learning in a particular domain are complex, but the general idea is simple: Give a learning algorithm, or “agent,” a reward function, a mathematically defined signal to seek out and maximize. Then set it loose in an environment, which could be any real or virtual world. As the agent operates in the environment, actions that increase the value of the reward function get reinforced. With enough repetition—and if there’s anything that computers are better at than people, it’s repetition—the agent learns patterns of action, or policies, that maximize its reward function. Ideally, these policies will result in the agent reaching some desirable end state (like “win at Go”), without a programmer or engineer having to hand-code every step the agent needs to take along the way.