AlphaZero is a generic algorithm that can solve any perfect information board game.

Board games are interesting for robots because they help us understand whether they can reason in complex environments where the rules are known. If robots ever take over, they will probably fund their campaign by playing board games online for money.

There’s tons of new board games being published every year which is great because different board games test vastly different skills

Chess is about reasoning through a deep tree of actions

Go is about depth and spatial understanding

Poker is about probabilities and seeing others intent

Diplomacy is about negotiation and backstabbing

Chess and Go are examples of full information games where the entire history of each match and the current state of the board if fully available to both players at all time.

Poker and Diplomacy are examples of hidden information games where the actual state of a player (their cards and intents) can’t be directly observed but they can inferred.

Games that look like Chess or Go like Reversi, Mancala are all solved via the AlphaZero algorithm.

Games that look like Poker or Diplomacy like Bridge or Hanabi are solved sometimes by newer research involving counterfactual regret minimization but not without a lot of caveats.

This post will show you how solve board games that look like Chess or Go using AlphaZero.

AlphaZero Architecture

AlphaZero doesn’t use human games as data, it’s trained by playing against itself. AlphaZero’s precursor AlphaGo both used human data and all sorts of custom features which were subsequently removed in AlphaZero. This is a big deal because it means that chess amateurs can build cutting edge chess engines while in the past you needed to pair up programmers with chess grand-masters.

TL;DR: The general idea of AlphaZero is to construct a tree which would keep track of which moves are good or bad in certain positions by randomly simulating games starting from those positions. Because the trees for Chess and Go are so massive, we use a neural network to guide the search process.

Game Tree

When you annotate a chess game the typical representation is a list, shown to the right in the below screenshot.

Now it’s black’s turn and black has many options. Some reasonable ones (not the best ones) are dxe4, e6 etc. — we can draw out the new state the board will be in as a tree.