Tom Murphy graduated from Carnegie Mellon University with a PhD in computer science. Then he built software that learned to play Nintendo games.

In some cases, the system works well. Playing Super Mario, for instance, it learns to exploit a bug in the game, stomping on enemy Goombas even when floating below them. It can rack up points by attacking the game with a reckless abandon you and I would never try. But in other cases, it fizzles. It scores fewer points in Tetris than it would by merely placing blocks at random. And when it's on the verge of losing, it pauses the game—permanently. Like Joshua, the artificial intelligence in the 1983 sci-fi classic WarGames, Murphy's system appears to realize that sometimes the only winning move is not to play.

Murphy's software is far from the state-of-the-art. But it pretty much sums up the progress of modern artificial intelligence. It handles some tasks well. It's useless at others. And even at this early stage, it's learning to do stuff we humans would never do. You can see much the same thing in AlphaGo, the Google system that beat a grandmaster at the ancient game of Go. You even see it in simpler systems, like the image recognition inside Google Photo. These systems are becoming extremely powerful even as they remain extremely flawed, and as a result at least a little scary as they start to make unexpected decisions on their own.

At the moment, these decisions are largely harmless—but not always. Remember when Google's image recognition service started labeling black people as gorillas? And as these technologies find their way into medical applications, robotics, and self-driving cars, AI has the potential to do real physical harm. "We're starting to get into gray areas. We don't always know which inputs yield which outputs," says Alexander Reben, a roboticist and artist in Berkeley, California, whose work aims to dramatize these concerns. "We're unable to understand what the machine is doing."

That's why some of the most prominent names in AI are now working to develop ways of dealing with what might go wrong. Today, along with researchers from Stanford University, UC Berkeley, and the Elon Musk-led startup OpenAI, a team of Google AI specialists proposed a way to address these issues by building a framework for addressing AI safety risks. "Most previous discussion has been very hypothetical and speculative," Google researcher Chris Olah wrote in a blog post about their proposal. "We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably."

In their paper, Olah and his colleagues look at the example of a robot that learns to clean. The more pressing worries aren't apocalyptic—that humans won't be able to shut the machine down or that it will somehow destroy us all. They're more concerned that this cleaning robot will learn to do stuff that just doesn't make sense—kinda like Murphy's bot learning to permanently pause a game of Tetris. What if the robot learns to knock over a vase because that lets it clean faster? What if it games the system by covering over messes instead of cleaning them? How do you prevent the machine from doing stupid, harmful stuff like sticking a wet mop in an outlet? How do you tell it that lessons learned in the home may not apply to the office?

Olah and his collaborators lay out several concrete principles for AI researchers, from "avoiding negative side effects" (not knocking over the vase) to "safe exploration" (not sticking the mop in the outlet). The concerns are practical—it's in how to address them that the uncertainty remains. Still, that's kind of the point: Because no one has good answers, it's time to start looking for them. AI is advancing too fast not to.

Moral Machines

A system like AlphaGo learns by analyzing vast amounts of data. But it also learns by operating on its own. Through a technique called reinforcement learning, it plays game after game against itself, carefully tracking which moves bring the most territory on the board. In this way, AlphaGo learns to make moves no human has ever made—for better or for worse. Now, Google is using similar techniques to train not only popular online services like its search engine but robots and self-driving cars. And these machines will behave in their own unpredictable ways.

"You can build robotics that does some of this stuff now," Reben says of such unexpected behaviors. Reben recently built a robot that decides—all on its own—whether or not to prick your finger. This shows, he explains, why we must tackle safety concerns now, not later.

And indeed, Olah and his crew are not the only ones working on these problems. DeepMind, the Google-owned lab responsible for AlphaGo, is exploring the possibility of an AI "kill switch" that would prevent machines from spinning beyond human control. If an AI learns to override what humans tell it to do, a kill switch would still let people shut it down.

Machines can't make the hard calls themselves yet, because they don't understand morality. But Ken Forbus, an AI researcher at Northwestern, is trying to fix that. Using a "Structure Mapping Engine," he and his colleagues are feeding simple stories—morality plays—into machines in the hope that they will grasp the implicit moral lessons. It'd be a kind of synthetic conscience. "You can use stories to beef up the machines' reasoning," Forbus says. "You can—in theory—teach it to behave more like people would."

In theory. Creating a truly moral machine is a long way off—if it's possible at all. After all, if we humans can't agree on what is moral, how can we program morality into machines? While humans quibble, machines get smarter—whether or not they know right from wrong. The question isn’t whether machines will ever be able to beat Tetris without cheating. It’s whether they’ll ever learn that they ​shouldn’t cheat.