Chess Knightmare and Turing’s Dream



Plus answers to Sunday’s test on Turing



By permission of Mike Magnan, artist.



Viswanathan Anand retained his chess world championship title yesterday, by defeating challenger Boris Gelfand in a rapid-chess playoff after the twelve regulation games ended in a 6-6 tie. Yet the match—the most important in chess—was by most accounts not very exciting. Ten of the twelve games were draws, seven with agreement before Move 30 which is disallowed in many tournaments. Many of their moves had been prepared using computers, often toward or beyond Move 20.

Today I ask why computers playing among themselves have produced livelier games than recent matches of humans equipped with computer preparation.

Anand’s third straight match win since gaining the world title in 2008 is impressive. His play can often be exciting, though his results last year were lackluster. Gelfand earned the right to challenge by winning last year’s Candidates’ Match-Tournament in Kazan, Russia. In that event 27 of 30 regulation games were drawn and most matches were decided in similar fast-paced tiebreaks, which some deem akin to a penalty-kick shootout in soccer.

Fears of chess becoming “played out” go back even before the 1927 world championship match between Alexander Alekhine and José Raúl Capablanca, in which 32 of 34 games featured the same opening. What hypes the fear now is that we may have the technology to actually play much of chess out. This feeling was revealed by the number of people taken in by a hoax last month that the King’s Gambit had been exhaustively analyzed to a draw.

However the computers themselves play few draws against each other, and show a degree of human-appreciable inventiveness that Alan Turing could only have dreamed about, unless he had lived to see 85 let alone 100.

What Price High Standards?

According to my statistical model of player move choice mentioned here, this match had the highest standard in chess history. Based on computer analysis of the twelve regulation games, my model computes an “Intrinsic Performance Rating” (IPR) for Anand of 3002, and 2920 for Gelfand. Each is about 200 points higher than their current Elo ratings of 2791 and 2727, respectively. My analysis eliminates moves 1–8, moves in repeating sequences, and moves where one side is judged to have a clearly winning advantage, the equivalent of being over three pawns ahead.

To be sure, I ascribe most of this difference to their use of computer-prepared moves, in short games with relatively few “original moves.” Not only do the programs’ ratings ramp over 3200, home analysis can run them longer and stronger than the game-time settings used to compile those ratings. Still, the players had those moves in their head as they sat down computer-free at the board, and if what matters is the quality of the moves made by their hands regardless of where they came from, then this was history’s human chess pinnacle.

My work also hints that the Elo rating of perfect play may be as low as 3600. This is not far-fetched: if Anand could manage to draw a measly two games in a hundred against any perfect player, the mathematics of the rating system ensure that the latter’s rating would never rise above 3500, and if Gelfand could do it, 3400. Perfect play on both sides is almost universally believed to produce a draw, even after a few small slips. All this raises a question:

Does the higher draw tendency of recent top-level matches owe inevitably to their coming within a few hundred intrinsic rating points of perfection?

The fairest ones to ask are the computers, for they have now played far more games at this level than have we humans. And perhaps surprisingly, their answer seems to be No. The recent 19th World Computer Chess Championship received an IPR over 3000 from my model, yet 22 of 36 games ended in victory. The 2010 WCCC had 35 wins from 45 games, while the 2009 WCCC had 33 from 45, and this month’s International CSVN Tournament had only 7 draws in 28 games. Why?

Contempt and Contemplation

The reason may literally be that the computers have greater contempt for each other. The contempt factor is a term in a program’s evaluation function that makes it pretend to be a couple tenths of a pawn better off than it is, in situations where a drawing or drawish continuation is available.

The computers also have no awareness of high stakes that puts “staying in the game” ahead of maximizing one’s chance of winning. This tendency was called out by observers several times as the games were played, most notably in the 12th game when Anand preferred to trade queens and be a safe though sickly Pawn ahead, rather than play his queen to be a thorn in Gelfand’s position on risk of allowing Gelfand’s queen to gobble two of Anand’s pawns with check. Here are comments made by the now-retired Garry Kasparov and three players currently rated in the top 5:

Kasparov (after game 6): “Hopefully [the] next few days will provide more ‘fire on board’.” Vladimir Kramnik (after Anand’s draw offer in Game 12): “What is this? Really confusing… I can only have one explanation: [Anand] just couldn’t stand the pressure of the last game… It is one of the strangest decisions I ever saw in the World Championship matches.” Levon Aronian (on Twitter after Game 12): “Anand-Gelfand g 12 was brilliant. Anand found a great pawn sac at home, and Gelfand answered with 2 pawn sacs! Wow, can’t wait till tiebreaks!” Hikaru Nakamura (same time on Twitter): “I must be a very bad chess player since I keep liking Anand’s positions and he keeps offering draws instead of trying to win.”

Anand himself said, “The problem with such a tight match is that every mistake has a much higher value,” while Aronian noted about his surprise early elimination from the qualifier in tiebreaks after four regulation draws, “Perhaps I didn’t quite cope with the pressure.”

Thus humans get “tight”; computers don’t. How might we level the situation?

From My Corner Square

One way is to take away time. The 4 games of the Rapid playoff, in which the players had basically one-fourth the thinking time as for a standard game, were by all accounts more exciting. They were also longer, and provided 377 moves making my analysis cutoffs, compared to only 495 for the 12 regulation games. Still they hit a combined 2710, Anand 2701 and Gelfand 2720, in my model’s judgment of their intrinsic quality. This is about equal to both players’ performances in standard games over the past year. For comparison, the tiebreak in the 2006 match by which Kramnik defeated Veselin Topalov hit 2663, with Kramnik’s 2789 marginally better than his IPR for the regulation games, while Topalov’s 2530 perhaps explains his stated desire to avoid a tiebreaker with Anand in 2010.

But almost no one wishes to see Rapid chess become the standard. That the time to make 40 moves has shrunk from 150 minutes when I was active to 120 and now 90 minutes (plus 20 in increments) is considered shortening enough. Is there another way to dial humans up a notch?

Dick and I have also talked about the idea of an “aggressiveness parameter” in other applications besides chess. Of course robots in a virtual battle can be more aggressive. Can such a parameter be used in web search, say to promote pages that have many recent updates and other signs of dynamism and risk? Even in something pure like a SAT-solving engine or theorem-prover one can implement a degree of taking risks by undoing part of present progress. I am also improving my model to formulate “challenge produced” as a measure of skill, over the present emphasis on accuracy. (This is the place to note that the boldfaced IPR’s in this post are subject to change.)

My own opinion specific to chess is that by mid-century, the game will need to be tweaked to promote a longer “mixing time” before the players can simplify by trading central pawns and heavy pieces. The huge opening books and enumerated simplifying lines are what prompted Bobby Fischer to promote “FischerRandom Chess,” which is today better known as Chess960 for the 960 possible symmetrical starting configurations, from which a random choice is made. I favor combining Fischer’s ingenious generalized castling rule with an older non-random, non-symmetrical format originally proposed by David Bronstein, the former Soviet champion whose name channeled Kronsteen for James Bond. This is described here on the great Chess Variants website originated by Hans Bodlaender, whom we all know in computer science theory.

To be sure, the game of Go has virtually no “draw problem” and humans still beat computers handily at it.

§

On a separate note, Kasparov is an invited speaker for Manchester’s Alan Turing Centenary Conference. In the second half of his talk he will describe a “Turing Test” in which he was challenged to find which of five games was played by a computer. As first related here twelve years ago, he distinguished it quickly by the other games having a tangible frequency of short-term errors. My joint work with Guy Haworth and Giuseppe DiFatta on player modeling, however, has regressed this frequency against Elo rating, and can now be used to generate artificial games between players of any desired target rating that have such “human” errors. Would an expert be able to distinguish those now?

Turing Test Answers

Here are our answers to last Sunday’s “Turing Test” post.

(i) Mathison (v): note that (i) and (ii) are equivalent upon considering negated formulas. (ii) Marian Rejewski—let us not forget the Poles… (v) in the 1970’s. (v) Last month. (v) DEUCE. (iv) Kurt Gödel himself told us he never met Turing. (i), according to Turing’s formal definition of “circling,” though today we read “halting” in place of its formal antonym. (ii) Rosser. (iii) Church was Turing’s PhD advisor… (iii) …at Princeton. (i) Turing never won it—we meant the answer to be (v) none, but mis-worded (i); for (iv) note the Knuth Prize. (i) Everything down to one tape. (i) Doubting it, as mentioned in Andrew Odlyzko’s talk here. (iii) Voice encryption for phones. (ii) Frances McDormand. (We cheated with Wikipedia here.) Correction: (v) since the current number is 1, with Max Newman in the JSL, 1942. So much for what we thought we knew (while at Princeton), though in light of question 5, perhaps more will emerge e.g. with I.J. Good… (ii) Distance runner. (v) All named for Turing. (v) See the last few items on this page.

Commenters are welcome to post answers to their supplementary questions, for which we thank them.

Open Problems

How can we induce the top human players to play more like computers, so that their games will be more exciting?

Can computer chess engines be programmed to explain their analysis in a vocabulary of chess strategy prose? Would they then be said to pass a “Turing Test” at chess?

[fixed answer to question 17, gratia commenters, included co-authors; fixed ChessVariants link]