A few weeks ago, a group of researchers from Google’s artificial-intelligence subsidiary, DeepMind, published a paper in the journal Science that described an A.I. for playing games. While their system is general-purpose enough to work for many two-person games, the researchers had adapted it specifically for Go, chess, and shogi (“Japanese chess”); it was given no knowledge beyond the rules of each game. At first it made random moves. Then it started learning through self-play. Over the course of nine hours, the chess version of the program played forty-four million games against itself on a massive cluster of specialized Google hardware. After two hours, it began performing better than human players; after four, it was beating the best chess engine in the world.

The program, called AlphaZero, descends from AlphaGo, an A.I. that became known for defeating Lee Sedol, the world’s best Go player, in March of 2016. Sedol’s defeat was a stunning upset. In “AlphaGo,” a documentary released earlier this year on Netflix, the filmmakers follow both the team that developed the A.I. and its human opponents, who have devoted their lives to the game. We watch as these humans experience the stages of a new kind of grief. At first, they don’t see how they can lose to a machine: “I believe that human intuition is still too advanced for A.I. to have caught up,” Sedol says, the day before his five-game match with AlphaGo. Then, when the machine starts winning, a kind of panic sets in. In one particularly poignant moment, Sedol, under pressure after having lost his first game, gets up from the table and, leaving his clock running, walks outside for a cigarette. He looks out over the rooftops of Seoul. (On the Internet, more than fifty million people were watching the match.) Meanwhile, the A.I., unaware that its opponent has gone anywhere, plays a move that commentators called creative, surprising, and beautiful. In the end, Sedol lost, 1-4. Before there could be acceptance, there was depression. “I want to apologize for being so powerless,” he said in a press conference. Eventually, Sedol, along with the rest of the Go community, came to appreciate the machine. “I think this will bring a new paradigm to Go,” he said. Fan Hui, the European champion, agreed. “Maybe it can show humans something we’ve never discovered. Maybe it’s beautiful.”

AlphaGo was a triumph for its creators, but still unsatisfying, because it depended so much on human Go expertise. The A.I. learned which moves it should make, in part, by trying to mimic world-class players. It also used a set of hand-coded heuristics to avoid the worst blunders when looking ahead in games. To the researchers building AlphaGo, this knowledge felt like a crutch. They set out to build a new version of the A.I. that learned on its own, as a “tabula rasa.”

The result, AlphaGo Zero, detailed in a paper published in October, 2017, was so called because it had zero knowledge of Go beyond the rules. This new program was much less well-known; perhaps you can ask for the world’s attention only so many times. But in a way it was the more remarkable achievement, one that no longer had much to do with Go at all. In fact, less than two months later, DeepMind published a preprint of a third paper, showing that the algorithm behind AlphaGo Zero could be generalized to any two-person, zero-sum game of perfect information (that is, a game in which there are no hidden elements, such as face-down cards in poker). DeepMind dropped the “Go” from the name and christened its new system AlphaZero. At its core was an algorithm so powerful that you could give it the rules of humanity’s richest and most studied games and, later that day, it would become the best player there has ever been. Perhaps more surprising, this iteration of the system was also by far the simplest.

A typical chess engine is a hodgepodge of tweaks and shims made over decades of trial and error. The best engine in the world, Stockfish, is open source, and it gets better by a kind of Darwinian selection: someone suggests an idea; tens of thousands of games are played between the version with the idea and the version without it; the best version wins. As a result, it is not a particularly elegant program, and it can be hard for coders to understand. Many of the changes programmers make to Stockfish are best formulated in terms of chess, not computer science, and concern how to evaluate a given situation on the board: Should a knight be worth 2.1 points or 2.2? What if it’s on the third rank, and the opponent has an opposite-colored bishop? To illustrate this point, David Silver, the head of research at DeepMind, once listed the moving parts in Stockfish. There are more than fifty of them, each requiring a significant amount of code, each a bit of hard-won chess arcana: the Counter Move Heuristic; databases of known endgames; evaluation modules for Doubled Pawns, Trapped Pieces, Rooks on (Semi) Open Files, and so on; strategies for searching the tree of possible moves, like “aspiration windows” and “iterative deepening.”

AlphaZero, by contrast, has only two parts: a neural network and an algorithm called Monte Carlo Tree Search. (In a nod to the gaming mecca, mathematicians refer to approaches that involve some randomness as “Monte Carlo methods.”) The idea behind M.C.T.S., as it’s often known, is that a game like chess is really a tree of possibilities. If I move my rook to d8, you could capture it or let it be, at which point I could push a pawn or move my bishop or protect my queen. . . . The trouble is that this tree gets incredibly large incredibly quickly. No amount of computing power would be enough to search it exhaustively. An expert human player is an expert precisely because her mind automatically identifies the essential parts of the tree and focusses its attention there. Computers, if they are to compete, must somehow do the same.

Chess commentators have praised AlphaZero, declaring that the engine “plays like a human on fire.” Photograph Courtesy DeepMind Technologies

This is where the neural network comes in. AlphaZero’s neural network receives, as input, the layout of the board for the last few moves of the game. As output, it estimates how likely the current player is to win and predicts which of the currently available moves are likely to work best. The M.C.T.S. algorithm uses these predictions to decide where to focus in the tree. If the network guesses that ‘knight-takes-bishop’ is likely to be a good move, for example, then the M.C.T.S. will devote more of its time to exploring the consequences of that move. But it balances this “exploitation” of promising moves with a little “exploration”: it sometimes picks moves it thinks are unlikely to bear fruit, just in case they do.

At first, the neural network guiding this search is fairly stupid: it makes its predictions more or less at random. As a result, the Monte Carlo Tree Search starts out doing a pretty bad job of focussing on the important parts of the tree. But the genius of AlphaZero is in how it learns. It takes these two half-working parts and has them hone each other. Even when a dumb neural network does a bad job of predicting which moves will work, it’s still useful to look ahead in the game tree: toward the end of the game, for instance, the M.C.T.S. can still learn which positions actually lead to victory, at least some of the time. This knowledge can then be used to improve the neural network. When a game is done, and you know the outcome, you look at what the neural network predicted for each position (say, that there’s an 80.2 per cent chance that castling is the best move) and compare that to what actually happened (say, that the percentage is more like 60.5); you can then “correct” your neural network by tuning its synaptic connections until it prefers winning moves. In essence, all of the M.C.T.S.’s searching is distilled into new weights for the neural network.