One year since the bombshell announcement that DeepMind’s AlphaZero needed only the rules of chess and four hours of self-play to be able to beat Stockfish in a match, the long-awaited full paper has now been published in the academic journal Science. We have new games - Matthew Sadler has produced videos about five of them – and what seems conclusive evidence of AlphaZero’s superiority. It won a new match 574.5:425.5, despite Stockfish running in a powerful configuration and managing its own time. AlphaZero also won when given just 1/10th of the time to think.

A generic game-beater

AlphaZero would be extraordinary even if it had only reached “human” levels of attainment. It began as AlphaGo, that learned from human games to become the world’s best Go player, then developed into AlphaGoZero, that managed to surpass AlphaGo merely by playing against itself with no human input. AlphaZero is the new generalised version of that “reinforcement and search algorithm”, that the DeepMind team have shown can master multiple games – chess, shogi and Go – knowing only the rules. In the case of chess AlphaGo needed 300,000 of the 700,000 “steps” it took while training – just 4 hours (of 9 in total) – to reach a level at which it was beating Stockfish.

During the World Championship match we were featuring content from 2-time British Champion Matthew Sadler and WIM Natasha Regan, who are co-authoring Game Changer. They appear in this short video looking at AlphaZero:

Today the full paper on AlphaZero was published in Science, and you can check it out here (as well as lots more on the DeepMind website).

An end to the Stockfish controversy?

Ever since the first announcement last year there have been computer chess enthusiasts who, while not doubting the scientific achievement, were concerned that Stockfish had been unfairly treated. The original match that was announced as a 64:36 win for AlphaZero (28 wins to 0) was criticised for crippling Stockfish with too little hash memory, an unusual number of cores, and a 1-minute per move thinking time that didn’t allow Stockfish to manage its own time. The implication was that in fair conditions Stockfish might still have won.

In the full paper, however, the matches are much more rigorous, with the main match played from the starting position over 1,000 games, with 3 hours per player plus a 15-second increment per move. The Stockfish 8 configuration was the same as one used in the TCEC World Championship, but AlphaZero scored 155 wins to only 6 for Stockfish. That’s not all: the paper explains that AlphaZero still won when Stockfish was given an opening book, or when the latest version of the program at the time of submission (Stockfish 9) was used. And the clincher: AlphaZero also won when it was given just 1/10th of the time to think (at 1/30th of the time, Stockfish finally came out on top).

The authors point out that since AlphaZero normally searches 1,000 times fewer positions per second (60,000 to 60 million), that means that it reached better decisions while searching 10,000 times as few positions. One of the curiosities of the paper is that the authors aren’t sure exactly why!

AlphaZero may compensate for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations.

That “may” means the self-trained monster remains something of a black box even to its creators. What remains for AlphaZero doubters? Well, it’s possible the recently-released Stockfish 10 could do better, while there’s also the question of hardware. AlphaZero ran on a single machine with 4 first-generation TPUs, compared to Stockfish running on a conventional computer with 44 CPUs. As the paper notes, however, “Each program was run on the hardware for which it was designed”.

But what about the games?

There’s far more technical detail in the full paper, but it’s time to get to the games! Once again, that’s a treat for chess fans, and we’ve added the games to our system (click a game below to open it with some much lower-powered computer analysis):

Those in “Round 1” are the 10 games from December 2017, while Round 2 features 110 new games from the main match played from the normal starting position. Round 3 features 100 games from a match played where the opening moves are made for the programs according to the TCEC Computer Chess Championship opening book. If you switch to notation view (not chat), you’ll see which moves were “book” before our combatants started to think.

How can we make sense of it all? Well, the first 10 games in Rounds 2 and 3 are selected by Matthew Sadler as his favourites – and that’s not all. He’s also produced five videos, which have something for everyone. Enjoy!

1. All-in Defence: Stockfish 1/2-1/2 AlphaZero (replay the game)

For the first time now we’re seeing AlphaZero drawing and even losing some games, but draws like this one are stunning! A true Najdorf brawl:

2. Bold Sir Lancelot: AlphaZero 1-0 Stockfish (replay the game)

A white knight hops around at will in a positional masterclass:

3. Endgame Class: Stockfish 0-1 AlphaZero (replay the game)

One of the most memorable images from the Science paper is the following, which shows the 6-ply (3 moves by both players) positions that were featuring most often for AlphaZero when it was playing itself in its 700,000 steps of training:

Yes, Vladimir Kramnik seems to have stumbled on the Holy Grail of chess when preparing the Berlin Defence to play against Garry Kasparov in London in 2000. After both the 4 hours, when it could beat Stockfish, and the 9 hours it trained in total, the most popular position in the AlphaZero training games was the Berlin: 1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6. Here Matthew Sadler looks at how AlphaZero went about winning a memorable game with the opening:

4. Exactly How to Attack: AlphaZero 1:0 Stockfish (replay the game)

Matthew calls this “one of my favourite games” and counts 7 pawn sacrifices from AlphaZero in total in “an absolutely fantastic attack”. He quotes DeepMind co-founder Demis Hassabis as describing this game as, “like chess from another planet”.

5. Long-term Sacrifice: Stockfish 0-1 AlphaZero (replay the game)

It wasn’t AlphaZero’s choice to play the Leningrad Dutch, but it made Matthew very jealous! The game features a cascade of sacrifices, with Sadler noting of one of them, “I don’t think I’ve ever seen a tactic like this”.

That’s just scraping the surface of the AlphaZero games now published, so it looks as though they’re going to keep us busy for a while yet.

Before that, though, there’s a little chess event starting next week in the DeepMind offices in London. On Monday there’s the London Chess Classic ProBiz Cup, while on Tuesday the Grand Chess Tour Playoff begins with semi-final matches between Fabiano Caruana and Hikaru Nakamura, and Levon Aronian and Maxime Vachier-Lagrave. You can watch all the action live here on chess24.

See also: