Are We Actually Getting Better at Chess?

How do you tell if we’re getting better at a sport or a game, when you can never pit players from different eras against each other? For example, could Tom Brady carve up Pittsburgh’s Steel Curtain defense of the 1970s? Who would win one-on-one, Michael Jordan (in his prime) or LeBron James? Could Justin Verlander strike out Babe Ruth? Outside of video games, we’ll never know.

The obvious exceptions are track and field events, where accomplishments are measured in time and distance. And in those cases, we actually have been consistently chipping away at records, running faster, jumping higher etc. Though the recent use of performance-enhancing drugs has certainly tainted that “progress.”

But what about chess? Players are judged by a sophisticated rating system, though there’s a thought that scores have been inflated recently. A pair of academics have set out to address this by comparing the quality of play over the years. In their recent paper, Kenneth W. Regan, a computer science professor at the University of Buffalo, and Guy Haworth, an engineering professor at the University of Reading, examine the quality of players’ moves, rather than win-or-lose outcomes. Their conclusion is that yes, we are getting better at chess.

Read the full study here. The abstract is below:

This paper develops and tests formulas for representing playing strength at chess by the quality of moves played, rather than by the results of games. Intrinsic quality is estimated via evaluations given by computer chess programs run to high depth, ideally so that their playing strength is sufficiently far ahead of the best human players as to be a ‘relatively omniscient’ guide. Several formulas, each having intrinsic skill parameters s for sensitivity and c for consistency, are argued theoretically and tested by regression on large sets of tournament games played by humans of varying strength as measured by the internationally standard Elo rating system. This establishes a correspondence between Elo rating and the parameters. A smooth correspondence is shown between statistical results and the century points on the Elo scale, and ratings are shown to have stayed quite constant over time. That is, there has been little or no ‘rating inflation’. The theory and empirical results are transferable to other rational- choice settings in which the alternatives have well-defined utilities, but in which complexity and bounded information constrain the perception of the utility values.

At the outset, Regan and Haworth pose four questions, the two most fundamental of which are:

1. Has there been ‘inflation’—or deflation—in the chess Elo rating system over the past forty years?

2. Were the top players of earlier times as strong as the top players of today?

And here’s their conclusion:

…there has been little or no ‘inflation’ in ratings over time—if anything there has been deflation. This runs counter to conventional wisdom, but is predicted by population models on which rating systems have been based [Gli99]. The results also support a ‘no’ answer to question 2. In the 1970’s there were only two players with ratings over 2700, namely Bobby Fischer and Anatoly Karpov, and there were periods as late as 1981 when no one had a rating over 2700 (see [Wee00]). In the past decade, however, there have usually been thirty or more players with such ratings. Thus the lack of inflation implies that those players are better than all but Fischer and Karpov. Extrapolated backwards, this would be consistent with the findings of [DHMG07], which (like some recent competitions to improve on the Elo system) are based only on the results of games, not on intrinsic decision-making.

[HT: Tyler Cowen]