Investigating the variance of the World Chess Championship final

A statistical analysis on Carlsen’s final match approach

2018 World Chess Championship logo showing 5 overlapping arms above chessboard holding or moving chess pieces.

While the FIDE World Chess Championship was going on, I found the following problem:

Say one of the players is better than his opponent to the degree that he wins 20 percent of all games, loses 15 percent of games and that 65 percent of games are drawn. Wins at this match are worth 1 point, draws a half-point for each player, and losses 0 points. In a 12-game match, the first player to 6.5 points wins. What are the chances the better player wins a 12-game match? How many games would a match have to be in order to give the better player a 75 chance of winning the match outright? A 90 percent chance? A 99 percent chance?

Now with the competition over for a little less than a month and with all the analysis that came for the intense 12-draw games followed by 3–0 tie-breaker sweep from Carlsen, I find that the questions asked are quite relevant, since after all, the players drew all 12 regular games. Was Carlsen’s strategy of drawing the much longer regular games (game 1 was 7 hours, game 12 was 3 hours), in order to overwhelm his opponent in the much shorter tie-breakers, where he is known to be unmatched, a safer alternative than playing to win the regular games?

In this article, I propose to implement a simple Python script that attempts to simulate under what circumstances (number of games per match) will the best player nearly always win.

I propose to solve this problem with something that I like: Sampling.

I will assume two tireless agents A and B that will play 10,000 n-game matches against each other, with n being the number of games in each match. As with the standard chess rules, a win gives 1 point, a draw 0.5 points and a loss 0 points.

A player wins an n-game match if they reach (n/2)+0.5 points. For example, in a 12-game match, a player wins if they reach 6.5 points. Independently repeat this 10,000 times and you will have the percentage of matches that agent A wins and the percentage of matches that agent B wins.

The function sample(n), returns the results of n games, where 1 indicates that agent A won, -1 indicates that agent B won and 0 indicates a draw for that game. We assume that agent A is the better agent and that it wins 20% of the games, loses 15% of the games to B, and draws the remaining 65%.

The following script tries different match sizes (n) and prints the % of matches won by the better player, the % of matches won by the worst player, and the % of draws.

The output:

Percentage of times the better agent wins, loses and draws.

As we can see, it would require a match to contain about 768 games in order for the better agent to have a 99% chance of taking the match. For a 75% chance of winning, it would require a little less than 96 games.

In its current state of 12 games, agent A wins only about 51.66% of matches even though the odds favour it. Why? Well because similarly to the final between Carlsen and Caruana, players/agents with such matching skills tend to draw a lot, whether on purpose or not.

To evaluate Carlsen’s strategy, we need to know what the size of a match should be in order to ensure that the better player between Carlsen and Caruana wins.

In chess, a player’s relative skill can be evaluated by their ‘Elo rating’. Carlsen had an Elo of 2835 prior to their game and Caruana had an Elo of 2832. Based on their Elo score and according to this website, Carlsen has a 19.2812559% chance of winning a game. Caruana has an 18.4434925% chance of winning a game, and there’s a 62.2752516% of drawing a game.

By updating the sample(n) function to reflect these values and running the script again, we get the following output:

It would require a match to be made of about 10,000 games in order to give Carlsen a 90% chance of winning. As such, I believe his strategy of attempting to draw the match in order to go to the much shorter tie-breakers (where he is known to be the deadliest) was a good one, especially since it worked!

Acknowledgements

Mark Askew for reading the article before publication.