This computer program can beat humans at Go—with no human instruction

The artificial intelligence (AI) program that last year smacked down the best human player in the ancient board game Go has gotten even better. AlphaGo bested South Korean Go master Lee Sedol in part by learning from a vast catalog of example moves by humans. Now, the latest version of the program, AlphaGo Zero, has mastered the game entirely on its own, researchers at DeepMind, the company that developed the program, announced in a press briefing Monday in London. The novel self-teaching techniques used by the new program might also find uses in other domains, such as traffic planning or drug discovery.

“The previous version of AlphaGo was also an amazing achievement, but in some ways, this now feels complete,” says Martin Mueller, a computer scientist at the University of Alberta, in Edmonton, Canada, who also studies Go programs.

In Go, opponents take turns placing black and white stones on a 19-by-19 grid, trying to surround each other’s pieces and claim territory. There are more potential arrangements of the pieces than atoms in the known universe, making it impossible for a computer to play the game by exhaustively simulating all moves and outcomes. So the original AlphaGo evaluated each potential move in two more sophisticated ways.

First, it used a so-called search tree to determine how many times a move would lead to a win in a set of quickly simulated games—a process called rollout. Second, it used neural networks, programs that can learn to detect patterns, to predict in a given situation whether a move will lead to a win. That required training one network to predict human play, based on an online database of nearly 30 million moves. To further train its move-selection network, it then played itself more than a million times. Using the results of those games, it then taught a separate game-prediction network to predict whether a given move would lead to a win. That network’s prediction was averaged with that of the rollout when evaluating moves.

The new AlphaGo Zero works more simply. First, it combines the move-picking network and the game-predicting network, making the program more efficient and flexible. Second, the combined neural network uses a new architecture that allows for many more layers of tunable artificial neurons than those in the first AlphaGo. Third, during training, the network and search tree work more closely to improve each other. With these changes, the program could skip the step of learning from human games. It also skipped rollout, which had relied on hand-crafted tactical guidelines.

Led by computer scientist David Silver, the DeepMind team tested AlphaGo Zero against other computer programs to establish its strength on a rating scale called Elo. The version that defeated Sedol trained for months and reached an Elo rating of 3739. AlphaGo Zero surpassed that level in just 36 hours and eventually reached a rating of 5185, the researchers report today in Nature . AlphaGo Zero also trounced the older program 100 games to zero, even when it ran on just four processors, compared with the older AI’s 48.

When the researchers did train AlphaGo Zero on human games, it learned more quickly, but performed more poorly in the long run. Left on its own, they suggest, it learned differently from humans, mastering known moves in a different order and discovering a previously unknown sequence for playing in corners. “It’s a great advance,” says Tristan Cazenave, a computer scientist at Paris Dauphine University. “It shows that in a very difficult domain you can discover new knowledge that took humans thousands of years to discover."

A self-teaching algorithm could have other applications, such as searching through possible arrangements of atoms to find materials with new properties. “Maybe there is a room-temperature superconductor out there,” said Demis Hassabis, DeepMind's co-founder and CEO, during the briefing. However, Mueller notes, whereas Go has clear rules and limited moves, the real world is messy and uncertain. So, he says, it remains to be seen how well AlphaGo Zero’s techniques can work in less structured domains.