One way of predicting the future is to study data about events in the past and build a statistical model that generates the same pattern of data. Statisticians can then use this model to make predictions about the future.

This is easier said than done. The statistical models often have many parameters that need to be fitted to the data before they can reproduce it. Indeed, there are often large numbers of combinations of parameters that produce a fit.

By repeating this fitting process many times, statisticians can build up a probability distribution of predictions showing how likely it is that certain things will happen. This process is known as “fitting the model” and statisticians use it for everything from predicting asteroid impacts to finding downed aircraft.

Today, Richard Vale at the University of Canterbury in New Zealand has taken this art to new heights by predicting the content of as yet unpublished novels in the Song of Ice and Fire series by George R R Martin. As any fan will know, these novels are the basis of the hit TV series Game of Thrones.

The series currently contains five books but fans are eagerly awaiting two more. Each chapter in the existing books is told from the point of view of one of the characters. So far, 24 characters have starred in this way. The approach that Vale has taken is to use the distribution of characters in chapters in the first five books to predict the distribution in the forthcoming novels.

He begins with a single table of data which summarises the number of chapters that each character has starred in so far. For example, the character Jon Snow starred in nine chapters in the first book, eight in the second, 12 in the third, none in the fourth and 13 in the fifth. The character Brienne starred in 8 chapters in the fourth book but in none of the others. And so on.

The question that Vale sets out to answer is what can be predicted about future books based only on this data from the existing ones. And his approach is entirely statistical so it does not include common sense assumptions such as the idea that a character killed off in the past is unlikely to star in the future.

Of course, Vale has to make a number of assumptions about the statistical nature of the data. For example, he assumes that the chapters in which a character stars follows Poisson distribution, which is one of the simplest to handle mathematically. It is based on the idea that events in a given time interval occur independently, like the number of decay events per second from a radioactive source, and are not related by some deeper connection.

Having created a model, Vale then runs a computer program to find the parameters in the model that best fit the data. And having found the best fits, he then uses the model to find the probability distributions of the number of chapters that each character will star in in book 6 and book 7. (He points out that book 7 is less interesting because the probabilities can be sharpened after the publication of book 6).

The results make clear predictions. For example, it shows that certain characters are unlikely to star in any chapters. It also makes predictions about whether one particular character is likely to be dead or not, following an ambiguous chapter in the fifth novel.

Vale is refreshingly honest about the limitations of his model, pointing out that readers should not be impressed by some of the predictions that his model makes and explaining why. “Given that we are interested in whether the model works for its intended purpose rather than in advertising it, we should not shy away from identifying and criticising its flaws,” he says.

He explains, for example, that the model does not deal with the possibility of new characters being introduced. And that the entire thing rests on a relatively small amount of data. There are other problems too. “There is little to support the choice of the Poisson distribution in [the model] other than that it has the smallest possible number of parameters,” he admits.

Nevertheless, this is a fascinating exercise in statistical modelling that will do more to introduce the process to a wider range of people than any number of textbooks or Wikipedia entries.

Indeed, it is not hard to see how this kind of approach could be used to explore potential futures of all kinds of creative endeavours. Literature is filled with great unfinished works, such as the Aubrey-Maturin novels by Patrick O’Brien, Tolkien’s Unfinished Tales, Steig Larsson’s novels, Jane Austen’s and so on.

We’ll look forward to finding out more about the statistical future of these unfinished stories — provided Vale and his colleagues have some spare time on their hands.

Ref: arxiv.org/abs/1409.5830 : Bayesian Prediction For The Winds Of Winter