Predicting the Future Is Easier Than It Looks

Before Billy Bean and the sabermetric revolution upended baseball and ushered in a new era of statistically driven baseball analysis, old-timers insisted that the young eggheads and their spreadsheets were no match for a time-worn scout. Experience, gut feeling, and a sense of the intangible qualities that make up a quality prospect — these were the things that the old guard argued could never be captured by an Excel spreadsheet, let alone a statistical model.

By and large, they were wrong, and Billy Bean’s scrappy Oakland Athletics squads showed that the so-called eggheads could see further into the future and with greater clarity than had previously been thought possible.

The same statistical revolution that changed baseball has now entered American politics, and no one has been more successful in popularizing a statistical approach to political analysis than New York Times blogger Nate Silver, who of course cut his teeth as a young sabermetrician. And on Nov. 6, after having faced a torrent of criticism from old-school political pundits — Washington’s rough equivalent of statistically illiterate tobacco chewing baseball scouts — the results of the presidential election vindicated Silver’s approach, which correctly predicted the electoral outcome in all 50 states.

That kind of nay-saying — epitomized by Joe Scarborough’s comments that anyone who thinks the presidential race was anything but a toss-up is a joke — is exactly what the baseball establishment said to Billy Bean before this approach became popular in professional athletics, where it has spread far beyond a few losing baseball teams to a wide range of sports. And it is exactly what some parts of the political world said to the poll aggregators even during the most recent presidential election.

But the statistical revolution in presidential politics begs the question: Why can’t the same predictive tools be applied to world politics? The amount of data that exist on world politics is enormous. Perhaps it is time to focus on making more sense of it.

The first real attempt to ask whether conflicts could be predicted can be found in Quincy Wright’s 1942 magnum opus A Study of War. Interestingly, he believed short-term forecasting should be based on public opinions rather than economic and political indices. But if forecasts were to be based on indices, Wright wrote, they "should have a value for statesmen, similar to that of weather maps for farmers or of business indices for businessmen … Such indices could be used not only for studying the probability of war between particular pairs of states but also for ascertaining the changes in the general tension level within a state of throughout the world."

But on Nov. 8, Jay Ulfelder, himself a forecaster, tried to throw a bucket of cold water on this idea, arguing in Foreign Policy why it is impossible to have a "Nate Silver" in world politics. (The fact that Nate Silver’s name can now be used as a metaphor for a perfect predictive model is one indication of his enormous success and popularity.)

First, Ulfelder tells us that election forecasting is the leading edge of statistical forecasting and that the things that foreign policymakers want to know are far from that edge. Perhaps, but this is not transparently clear. Today, there are several dozen ongoing, public projects that aim to in one way or another forecast the kinds of things foreign policymakers desperately want to be able to predict: various forms of state failure, famines, mass atrocities, coups d’état, interstate and civil war, and ethnic and religious conflict. So while U.S. elections might occupy the front page of the New York Times, the ability to predict instances of extreme violence and upheaval represent the holy grail of statistical forecasting — and researchers are now getting close to doing just that. In 2010 scholars from the Political Instability Task Force published a report that demonstrated the ability to correctly predict onsets of instability two years in advance in 18 of 21 instances (about 85%), including the prediction of instability in Iran in 2004 and Côte d’Ivoire in 2002, for example. None other than Jay Ulfelder was a co-author of this study, so he may just suffer from excessive modesty.

Just within the environmental conflict realm, a recent report by the Army Environmental Policy Institute lists no fewer than twelve ongoing projects that touch on some aspect of forecasting. These include the USAID’s Famine Early Warning System which tracks and predicts food insecurity around the world as well as the Climate Change and African Political Stability project, housed at the Robert S. Strauss Center at the University of Texas at Austin. Outside of the environmental arena, there are more.

Forecasting models need reliable measures of "things that are usefully predictive," Ulfelder notes. Well, sure. Does this mean that reliability is at issue? Or that we are using data that are not "usefully" predictive? This is a curious claim, especially in light of the controversial nature of polls. Indeed, there exists five decades worth of literature that grapples with exactly those issues in public opinion. Take the recent U.S. election as an example. In 2012 there were two types of models: one type based on fundamentals such as economic growth and unemployment and another based on public opinion surveys. Proponents of the former contend that that the fundamentals present a more salient picture of the election’s underlying dynamics and that polls are largely epiphenomenal. Proponents of the latter argue that public opinion polling reflects the real-time beliefs and future actions of voters.

As it turned out, in this month’s election public opinion polls were considerably more precise than the fundamentals. The fundamentals were not always providing bad predictions, but better is better. Plus there is no getting around the fact that the poll averaging models performed better. Admittedly, many of the polls were updated on the night before the election, though Drew Linzer’s prescient votamatic.org posted predictions last June that held up this November. To assess the strength of poll aggregation, we might ask how the trajectory of Silver’s predictions over time compare with the results, and there are other quibbles to raise for sure. But better is better.

When it comes to the world, we have a lot of data on things that are important and usefully predictive, such as event data on conflicts and collaborations among different political groups within countries. Is it as reliable as poll data? Yes, just so, but not more. Would we like to have more precise data and be able to have real-time fMRIs of all political actors? Sure, but it is increasingly difficult to convincingly argue that we don’t have enough data.

Let’s consider a case in which Ulfelder argues there is insufficient data to render a prediction — North Korea. There is no official data on North Korean GDP, so what can we do? It turns out that the same data science approaches that were used to aggregate polls have other uses as well. One is the imputation of missing data. Yes, even when it is all missing. The basic idea is to use the general correlations among data that you do have to provide an aggregate way of estimating information that we don’t have. We know enough about how other things are related to GNP that we can figure out reasonable estimates of what range it falls into in places where it is not observed. The CIA estimates North Korean GDP per capita at $1800 for 2011, based on extrapolations, growth rates estimations, and inflation. Our Duke imputations are based on lots of other data, but no data whatsoever for North Korean GDP, and we have it at about $1700, perhaps close enough for government work.

Our point is that collecting data in new and exciting ways has changed the nature of political forecasting. While the Twitterverse might have been agog over the daily release of the Gallup tracking poll, the real story of the election was playing out elsewhere. The firm Latino Decisions did not start frequent polling until 2010 (it was founded in 2007) but conducted more than 60,000 questionnaires during this year’s election and found that the Latino vote was going to be overwhelmingly in favor of the Democrats. One need only glance at this year’s exit polling to know that the Latino vote was crucial to President Obama’s reelection. So by focusing on Gallup — whose results were wildly off the mark anyway — many mainstream pundits ignored a wealth of other, perhaps more important information.

Ulfelder and others argue that the world needs a large amount of data from a large "swath" of history to effectively develop good models, but this kind of thinking often fails to reflect the mechanics of a statistical forecast. We’ve had elections in this country for more than two centuries at last count, but very few of the models that were applied to predict the 2012 election employed data on each of them. Some structural models only used data on a few years, and the polling data obviously doesn’t go back very far.

There is a tradition in world politics to go either back until the Congress of Vienna (when there were fewer than two dozen independent countries) or to the early 1950s after the end of the Second World War. But in reality, there is no need to do this for most studies. Typically, data is needed only for the current political era, which might date to 1989 or even just through the current century. Don’t be misled by claims that there "won’t be enough cases to analyze" if you use these "shortened" time frames. This makes the assumption that you have to analyze annual data, an assumption that is just blatantly false, even if it has been standard operating practice in quantitative world politics for decades. Not only are there techniques available for analyzing data on a much smaller time scale (days, weeks, or months), but if we use them, we are likely to get closer to those elusive variables that the policymakers lust after. And generally, the data have to be tortured before they surrender to annual formats. Consider, for instance, a coup on January 1 in 2010 and another one on December 31st. These two coups occurred in the same calendar year, but did they actually occur at the same instant?

Ulfelder tells us that "when it comes to predicting major political crises like wars, coups, and popular uprisings, there are many plausible predictors for which we don’t have any data at all, and much of what we do have is too sparse or too noisy to incorporate into carefully designed forecasting models." But this is true only for the old style of models based on annual data for countries. If we are willing to face data that are collected in rhythm with the phenomena we are studying, this is not the case. For example, Thailand became considerably more democratic in July 2011 as a result of Yingluck Shinawatra winning a landslide election and successfully forming a coalition government to replace a government established by a coup d’état. There is no need to assume this change to a more democratic form of government applies to the entire year of 2011, since we know it didn’t really characterize Thailand during the first half of the year. We have data and techniques that can deal with monthly or even daily data. Whether the data are too noisy to make use of is an empirical question. Our hunch is that clever data scientists will find a way to make these data useful.

Consider thyroid cancer. According to the National Cancer Institute, thyroid cancer has an incidence of about 6 in 100,000. It is a rare event, but we know quite a lot about it, even to the level of making preventive prescriptions. The rareness does not prevent us from learning about its occurrence, how to treat it, and even how to best avoid this cancer.

Don’t get us wrong: Better data is always better. But we actually have a lot of data. We don’t want to argue that forecasts about world politics are as precise as we observed for the 2012 presidential elections. And we agree with Ulfelder that they are not. But we are not so pessimistic to think that forecasting of certain kinds of events cannot in principle be solved by a combination of statistical approaches and data by clever investigators. David Rothschild, David Pennock, and a team of about 30 others (then at Yahoo; now mainly at Microsoft Research) predicted 303 electoral votes for Obama on February 15th, 2012 not by building a model of the election, but as they put it, by being "the mother of all prediction engines, period." There is considerable effort underway, beyond Microsoft Research, to build predictive models that relate to world politics.

Better, of course, will always be better. But there is room for seeing the water in the glass, not just how big the glass is.