No calculations are necessary to see that we missed badly in our forecast of the U.K. election.

Our final forecast was for the Conservatives to win an expected 278 seats (or somewhere in the range of 252-305 seats), Labour to win 267 (240-293), the Scottish National Party 53 (47-57), and the Liberal Democrats 27 (21-33). The actual final results are 330 seats for the Conservatives, 232 for Labour, 56 for the SNP and just eight for the Lib Dems. Even though we took (or at least tried to take) into account the scale of historical poll misses in the U.K., our prediction intervals fell short of including the result for all of these parties except the SNP.

The only thing we can say on our behalf is that in comparative terms, our forecast was middle of the pack, as no one had a good pre-election forecast. Of course the national exit poll, while not as close to the target as in 2010, was far better than any pre-election forecast.

Steve Fisher at ElectionsEtc came closest to the seat result: his 95 percent prediction intervals nearly included the Conservative seat total, although they missed the Liberal Democrat interval substantially, just as ours did. Several other forecasts were further away than we were.

The most obvious problem for all forecasters was that the polling average had Labour and the Conservatives even on the night before the election. This was not just the average of the polls, it was the consensus. Nearly every pollster’s final poll placed the two parties within 1 percentage point of each other. Based on the polling average being level, we predicted Conservatives to win by 1.6 percentage points on the basis of the historical tendency of polls to overstate changes from the last election. This kind of adjustment is helpful for understanding how the 2010 result deviated from the national polls on election day, as well as the infamous 1992 U.K. polling disaster, when the polls had the two parties even before the election and the Tories won by 7.5 percentage points. The Conservative margin over Labour will be smaller than that when the 2015 totals are finalized, but not a lot smaller (currently it is 6.4 with all but one constituency declared). So our adjustment was in the right direction, but it was not nearly large enough. Part of the reason Fisher did better is that he applied a similar adjustment, but made it party-specific, leading to a larger swingback for the Tories than for other parties because of that 1992 result.

Before the election, we calculated expectations for three measures of performance — absolute seat error, individual seat error, and Brier score — based on the uncertainty in our forecast. (More about how those three errors are defined can be found in this article.) We have now calculated each of these quantities for our forecasts, given the final results. We did not do as well as we had expected to do by any of these measures. Our absolute seat error was 105 . We incorrectly predicted 63 individual seats (out of 632 in England, Wales and Scotland). Our Brier score was 96 (the best possible score would have been 0, and the worst 632). Not good.

A lot of this is because our forecast was wrong on the national vote shares. If you look at the set of plots showing the results of the election as a function of our forecasts, you can see that we did pretty well at capturing the relative performance of Labour, the Conservatives and (more surprisingly) UKIP across different seats. We did OK with the SNP and less well with the Liberal Democrats.

Ideally, all the points would be on the diagonal line: the results would be the same as the forecasts. A little scatter around the line does not necessarily imply a problem, as positive and negative prediction errors tend to cancel out. The fundamental problem for the forecast performance was that, on average, the Conservatives did better, Labour did worse, and the Liberal Democrats did much worse than we expected.

We did try to capture the possibility of a national poll miss. We ran calibrations on historical polling to see how badly we might expect the polling average to miss the national vote shares. Clearly we need to look very carefully at how we did this. The 2015 poll miss was somewhat smaller than 1992 and we did have 1992 in our data. But perhaps this failed to introduce sufficient uncertainty due to the complexity of having multiple parties, each of which could be above or below its polling level.

In addition to failing to fully take into account the possibility of national polling error, we made one bad choice regarding how to use constituency-level polling data. The bulk of the constituency-level polls were fielded by Lord Michael Ashcroft. His constituency-level polls consistently showed substantial differences in the relative support for the Liberal Democrats depending on how he worded the question.

Ashcroft asked two voting-intention questions in all his constituency polls. The first was the “generic” question that is widely used in U.K. polls: “If there was a general election tomorrow, which party would you vote for?” This was followed up with a more “specific” question: “Thinking specifically about your own parliamentary constituency at the next general election and the candidates who are likely to stand for election to Westminster there, which party’s candidate do you think you will vote for in your own constituency?” The Liberal Democrats did far better in the latter question, particularly where they were incumbents. We went with the specific question.

Here’s what we wrote when we introduced our model at FiveThirtyEight:

A major concern is that Ashcroft’s constituency-level polls reveal substantial differences in the relative support for the parties depending on how the questions are asked, and we have little evidence to indicate which of these questions is more predictive. … We think the specific question is likely to be more accurate based on indirect evidence from the last election cycle, but we won’t really know until the election occurs because it hasn’t been deployed so widely before. … There is a lot more data to work with this election, which ought to help, but it could also just provide more rope for us to hang ourselves with. Incorporating new data sources is difficult because without a historical record, we can’t be certain of their accuracy.

We think it is great that Ashcroft ran both these questions in all his constituency polls, but it turns out that we may well have made the wrong choice about which one to use in our model. This choice alone explains the entirety of our over-prediction of the Liberal Democrat seat totals.

Post-election, we re-ran our pre-election model, but replacing all the specific-question Ashcroft data we used with the generic-question data from the same constituency surveys. Had we simply made this one modeling choice differently, our seat forecast would have looked like this:

Conservatives 292 (267-321)

Labour 274 (244-299)

SNP 54 (50-58)

Lib Dems 6 (2-10)

This would have been a more accurate forecast for the Lib Dems and a better forecast for the Conservatives than the one we actually made, but it still would have left the Tories too low and Labour too high. The prediction intervals do not quite cover the seats won by Conservatives and Labour, but they are much closer to doing so. Had we fixed this and also done a better job of calibrating the historical national vote share uncertainty to allow for 1992-level errors, we would have had a more satisfying forecast. We still would have been off on the central prediction, but we would have had a forecast that included the election result in the 90 percent prediction intervals for every party.

All of this is easy to say now. It’s our job as forecasters to report predictions with accurate characterizations of uncertainty, and we failed to achieve that in this election. We are not trying to make excuses here; we are trying to understand what went wrong. If we can find a few clear methodological culprits, that enables us to do better next time.

Here’s what we take from all this:

We need to include even more uncertainty about the national vote on election day relative to the polls. Constituency polls may work better when based on the standard generic voting-intention question.

We will surely learn more as we dig further into the results. The pollsters are also trying to figure out what went wrong, so we’re not the only ones with our work cut out for us before the next election! Whenever that election arrives, we’ll aim to be ready with better forecasts than we had in 2015.