Demis Hassabis, the founder and CEO of DeepMind, announced at the Neural Information Processing Systems conference (NIPS 2017) last week that DeepMind’s new AlphaZero program achieved a superhuman level of play in chess within 24 hours.

The program started from random play, given no domain knowledge except the game rules, according to an arXiv paper by DeepMind researchers published Dec. 5.

“It doesn’t play like a human, and it doesn’t play like a program,” said Hassabis, an expert chess player himself. “It plays in a third, almost alien, way. It’s like chess from another dimension.”

AlphaZero also mastered both shogi (Japanese chess) and Go within 24 hours, defeating a world-champion program in all three cases. The original AlphaGo mastered Go by learning thousands of example games and then practicing against another version of itself.

“AlphaZero was not ‘taught’ the game in the traditional sense,” explains Chess.com. “That means no opening book, no endgame tables, and apparently no complicated algorithms dissecting minute differences between center pawns and side pawns. This would be akin to a robot being given access to thousands of metal bits and parts, but no knowledge of a combustion engine, then it experiments numerous times with every combination possible until it builds a Ferrari. … The program had four hours to play itself many, many times, thereby becoming its own teacher.”

“What’s also remarkable, though, Hassabis explained, is that it sometimes makes seemingly crazy sacrifices, like offering up a bishop and queen to exploit a positional advantage that led to victory,” MIT Technology Review notes. “Such sacrifices of high-value pieces are normally rare. In another case the program moved its queen to the corner of the board, a very bizarre trick with a surprising positional value.”

Abstract of Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.