Normally, covering computer science articles is a bit of a strain, but two things about a recent one had a strong personal appeal: I'm addicted to the Civilization series of games, and I rarely bother to read the users' manual. These don't necessarily sound like issues that could be tackled via computer science, but some researchers have decided to let a computer teach itself how to play Freeciv and, in the process, teach itself to interpret the game's manual. Simply by determining whether the moves it made were ultimately successful, the researchers' software not only got better at playing the game, but it figured out a lot of the owner's manual as well.

Civilization isn't the first game to catch the attention of computer scientists. The new papers' authors, based at MIT and University College London, cite past literature in which computers were able to teach themselves Go, Poker, Scrabble, multi-player card games, and real-time strategy games. The method used for all of these is called a Monte Carlo search framework. At each possible move, the game runs a series of simulated games, which it uses to evaluate the possible utility of various moves. It uses these to update a utility function that estimates the value of a given move for a specific state of the game. After multiple iterations, the utility function should get better at identifying the best move, although the algorithm will sporadically insert a random move, just to continue to sample new possibilities.

This all sounds pretty simple, but the computational challenges are pretty large. The authors estimate that an average player will typically have 18 units in play, and each of those can take any one of 15 actions. That creates what they term an "action space" of about 1021 possible moves. To gauge the utility of any one of these, they ran things out 20 moves and then checked the game score (or determined whether they won or lost before then). They performed this 200 times in order to generate their performance numbers.

For their testing, the Monte Carlo search was set to play Freeciv's built in AI in a one-on-one match on a grid of 1,000 tiles. A single 100-move game took about 1.5 hours to complete on a Core i7, so all this simulation time wasn't trivial. But, in general, the algorithm performed fairly well, being able to achieve victory in that short time frame about 17 percent of the time (left to play a game to completion, the Monte Carlo search won just under half the time).

Still the authors wondered whether the algorithm might arrive at better decisions more consistently if it had access to the owner's manual, which contains various bits of advice about the strengths and weaknesses of various units, as well as some general guidance about how to build an empire (stick early cities near a river, for example). So, they decided to get their program to RTFM.

The "reading" took place using a neural network that takes the game state, a proposed move, and the owner's manual as input. One set of neurons in the network analyzed the manual to look for state/action pairs. These pairs are things like "active unit" or "completed road" (the states) and "improve terrain" or "fortify unit" as the actions. A separate neural network then figured out whether any of the items identified in the first applied to the current situation. These were then combined to find relevant advice in the manual, which was then incorporated into the utility function.

The key thing about this process is that the neural network doesn't even know whether it's correctly identifying state/action pairs when it starts—it doesn't know how to "read"—much less whether it has correctly interpreted the advice they convey (do you build near a river, or should you never build by a river?). All it has to go on is what impact its interpretation has on the outcome of the game. In short, it has to figure out how to read the owner's manual simply by trying different interpretations and seeing whether they improve its play.

Despite the challenges, it works. When the full-text analysis was included, the success of the authors' software shot up; it now won over half its games within 100 moves, and beat the game's AI almost 80 percent of the time when games were played to completion.

To test how well the software did, the authors fed it a mix of sentences from the owners' manual and those culled from the pages of The Wall Street Journal. The software correctly used sentences from the manual over 90 percent of the time during the early game. However, as play progressed, the manual became less of a useful guide, and the ability to pick out the manual dropped to about 60 percent for the rest of the game. In parallel, the software started relying less on the manual, and more on its game experience.

That doesn't mean the Journal was useless, however. Feeding the full software package random text instead of an owner's manual also boosted their algorithm's winning percentage, boosting it to 40 percent in 100-move games. That's not as good as the 54 percent obtained with the manual, but it is quite a bit better than the 17 percent win rate of the algorithm alone.

What's going on here? The paper doesn't say, but the key thing to note is that the neural network is only attempting to identify rules that work (i.e., build near a river). It doesn't actually care how those rules are conveyed—it simply associates text with a random action and determines whether the results are any good. If it's lucky, it can end up associating a useful rule with a random bit of text. It has a better chance of doing so with nonrandom bits of text like the owner's manual, but it can still provide useful guidance no matter what it's given to work with.

(I've asked the authors for their explanation of this result but, as of publication, they haven't gotten back to me.)

The authors conclude that their software successfully learned to leverage the rich language present in the game's manual to perform better, learning to interpret the language as it went along. This is clearly true; the software would perform better when it was given the owner's manual than when it was fed random text, and the difference was statistically significant. But simply giving it any text resulted in a larger relative boost. That implies that it's better to have some rules to work with, no matter how they're derived, than no guidance at all.