The cumulative conformance section is partitioned into four subsections: correlation with the outcome of a game (4.2.1), conformance of play in World Championships (4.2.2), conformance of play during a whole career (4.2.3) and predicting the results of World Championships matches (4.2.4).

4.2.1 Correlation between cumulative conformance and the outcome of one game

In section 3.2 I have defined different possible indicators regarding the conformance of moves. Below, I am going to correlate these indicators to the outcome of games using again Pearson’s ρ.

Fig.3

First, it is interesting to have an idea of the distribution of the conformance for all the positions evaluated during this study. We only keep positions after game turn 10 and positions where the move to play is not forced. This leaves around 1,600,000 positions (respectively 1,350,000 for Guid and Bratko who eliminate positions with an evaluation lower than –2.00 or higher than 2.00). The conformance is equal to 0 for 980,000 moves (respectively 842,000), which is a large majority. In Fig. 3 the number of positions for each conformance, up to 1.99, is plotted (conformance is measured in centipawns, so it starts at 0.01 and goes up to 1.99 by 0.01 steps). The class after 1.99, which is not plotted, contains all positions with a conformance greater than 2.00; there are around 53000 such positions.

For each game and each type of conformance, three different kinds of conformance (as defined in section 3.2) are computed. We quickly summarize them below.

1 Raw conformance δ = v b - v p is just the raw difference between the evaluation v b of the best move and the evaluation v p of the move made by the player.

2 Guid and Bratko conformance is defined in a similar way, but the positions with an evaluation higher than +2 or lower than –2 are not considered.

3 Ponderated conformance is defined by δ′ = δ/(1 + v b /k 1 ) for v b > 0 and δ′ = δ/(1 + v b /k 2 ) for v b < 0, where k 1 and k 2 are suitable constants. In subsection 3.2.3, after a statistical analysis of the distribution of errors, k 1 = 1.44 and k 2 = -3.53 are chosen.

w

b

p w ( x ) = nb _ moves _ white ( δ ≤ x ) total _ moves _ white p b ( x ) = nb _ moves _ black ( δ ≤ x ) total _ moves _ black

In the rest of this section, each time the word “conformance” is used, it can represent any of these three meanings, except when explicitly stated otherwise. We are interested in the cumulative conformance for White (respectively Black) during one game defined by) (respectively)):(respectively Black) is the total number of white moves in the game which are: this value is simply the number of white moves in this game minus the opening moves and minus the moves which are forced (there is only one move possible) 21 ) (respectively Black) is the number of moves with a conformance less than or equal to, taken only in the movesas defined above.

Then p (x) = p w (x) - p b (x) is the difference between White’s and Black’s conformance for a given game. There are around 26,000 games, and thus 26,000 p (x) for each x. Now, we wish to know for which value of x p (x) has the best correlation with the outcome of the game. Thus, for each x we compute Pearson’s ρ by correlating for each x the 26,000 p (x) with the outcome of the 26,000 corresponding games (+1 if White wins, 0 for a draw and -1 if White loses). An optimization was quickly performed using a Nelder and Mead (1965) simplex algorithm22 to find the best correlation possible, and the optimal values found are k 1 = 0.75 and k 2 = -3.3.

Fig.4

Figure 4 represents the correlations of the accumulated conformance indicators starting at conformance 0. The best correlation is found for d ≤ 0.3 for the raw and ponderated conformances, and for d ≤ 0.2 for the G&B conformance. It is interesting to notice that the choices made for k 1 = 1.44 and k 2 = -3.53 in subsection 3.2.3 work remarkably well when compared to the optimal curve k 1 = 0.75 and k 2 = -3.30. The decision to use two different slopes depending on the sign of the evaluation function is also validated when we compare the previous curves to the curves defined by k 1 = - k 2 = 1.25 and k 1 = - k 2 = 3.00.

It is important to try to understand why there is a “bump” in the curve representing correlation (i.e., why the optimal correlation is reached around δ ≤ 0.30 and not somewhere else). My interpretation is the following: having a better conformance for “perfect” (d = 0) moves is of course extremely important because the “perfect” moves class is by far the largest and overshadows the others. However, having a better conformance here does not tell us anything about the distribution of the other moves, and even if there are less moves in the other classes, there are still some of them, especially in the class closest to 0. Thus “adding” those classes to the conformance indicator gives more information about the distribution of the moves and “captures” important information. However, after a point, adding new classes which contain a small number of moves adds less meaningful information, and the correlation decreases.

There is still an other point to discuss: how is the outcome of the game correlated to the mistakes made, in other words what happens when we correlate the outcome of the game to p′ (x) defined by

p w ′ ( x ) = nb _ moves _ white ( δ ≥ x ) total _ moves _ white p b ′ ( x ) = nb _ moves _ black ( δ ≥ x ) total _ moves _ black p ′ ( x ) = p w ′ ( x ) - p b ′ ( x )

p w ′ ( x ) = nb _ moves _ white ( δ ≥ x ) total _ moves _ white = total _ moves _ white - nb _ moves _ white ( δ ≤ x ) total _ moves _ white = 1 - nb _ moves _ white ( δ ≤ x ) total _ moves _ white = 1 - p w ( x )

First, let us notice that) +) =. So:Thus Pearson’sfor′ () is 23 )). Thus the curve representing the correlation of′ () will be exactly the opposite of the one of), with the same extrema at the same positions.

This result might seem paradoxical. Intuitively, we might think that making big errors should be quite strongly correlated to the result of the game. This is of course true: in Fig. 13 in subsection 4.4.1 we will see that the result of the game is very strongly correlated to the highest evaluation reached in the game. But here the accumulated conformance indicator(s) is not measuring this kind of correlation. Accumulated conformance is in fact measuring the combination of two things at the same time: on the one hand, it has to take into account how often a player is losing a game when he24 is making a (big) mistake, but it also depends on the probability of making big mistakes. A player who loses always when making a 50cp mistake, but only makes such mistakes one game out of one hundred will lose less often than a player who never loses games when he makes a 50cp error, and loses them only when he makes a 100cp error, but makes such mistakes one game out of fifty.

It is important to remember that I have only be maximizing the correlation of the difference of the accumulated conformance indicator with the result of the game, which is not the same thing as “fitting” the value of the difference of the conformance between two players with the result of the game. As Pearson’s ρ is invariant under linear scaling, it is possible using a classical least square method to find α and β such as r = βd + α is the best approximation of the actual result of the game (here d stands for the difference of the conformance indicators of the two players). This will of course not change Pearson’s ρ, so this computation can be done independently of the optimization of k 1 and k 2 , and we can compute α and β for all possible values of x such as δ ≤ x. We expect25 α to be rather close to 0, while β should increase with x.

Fig.5

In Fig. 5 we have plotted the values of β and α as a function of x. Let us remember that the optimal value of x is 0.3 for ponderated and raw conformance, and 0.2 for Guid and Bratko conformance; the optimal values of (α, β) are: Raw (α = 4.3 10-2, β = 4.00), Guid and Bratko (α = 6.7 10-2, β = 3.37), and Ponderated (α = -7.0 10-3, β = 3.64). The values of α show that there is a small positive bias regarding raw conformance (and Guid and Bratko conformance). The correlation has always been computed by subtracting Black’s value from White’s value, so this shows that, for identical raw values of the conformance indicator, White wins more often than Black26. A quick statistical analysis of the 26,000 games shows that the average score of a game is 0.12 (White is winning 56% of the points). It is common knowledge that, in chess, White wins slightly more often than Black, and the usual explanation is that White’s positions are usually “better” as White plays first. This explanation is of course correct27, but there might be another factor.

Fig.6

When plotting the difference of the raw accumulated conformance indicator for White and for Black, it is always positive (see left part of Fig. 6). White is playing 61.1% perfect moves (x = 0), while Black is only playing 60.2% perfect moves. The difference even rises for larger x and is maximal around x = 0.25 where it reaches almost 2%. So, Black is in a way, making more mistakes than White. Why it is so is more difficult to interpret. We have already seen (subsection 3.2.3) that players are making more serious mistakes when they are in unfavorable positions; as Black is usually starting with a slight disadvantage, the same kind of psychological bias might encourage them to take more risks, and thus to make more mistakes. On the right side of Fig. 6, we see that the distributions of White’s and Black’s conformance are different. White is performing better at 0 and slightly above, while Black is better below 0. This figure also confirms that while the level of play remains consistent when the evaluation of the position is positive, it is degrading fast for negative ones. We also understand why ponderated conformance corrects the bias: it is “stretching” differently the positive and the negative side of the curve because it is using two different constants to “bend” the distributions. The fact that the difference between White and Black is maximal around x = 0.25 might be another reason why the accumulated conformance indicator has the best correlation around this value.

In conclusion, the advantage of the accumulated conformance indicator is that it is a scalar, and it is thus easy to consider it as a ranking. The player with the best indicator is just supposed to be the best player. However, this discussion should remind us that cumulative conformance is not a beast which is easily tamed, and it is much more difficult to interpret it than it might seem at first glance. A second important thing to remember is that we have “fitted” the model to the data using only games played by world class champions; it is extremely possible that results and parameters would be different for club players, as the distribution of their moves is very different; thus some classes with high δ which are marginal here could have a much higher importance.