Over thousands of iterations, the model gradually learns the relationships between demographics and political behaviour. We can use those relationships to predict the voting habits of each demographic group in each state (per the ACS). The aforementioned female Floridian seniors are predicted to vote for Donald Trump over Hillary Clinton by five percentage points, for example. We compute the same for each of the tens of thousands of demographic groups in our data.

Once that has finished, all that’s left is to calculate the estimates for each state. This is done by adding up (or “post-stratifying” — the “P” of “Mr P”) the predicted number of Clinton voters in each group in each state. We obtain her vote share in each state by dividing the number of eligible voters favouring Mrs Clinton by the total number of adult citizens who live there. The same is done for Mr Trump. Since we are only concerned with votes for Mrs Clinton and Mr Trump (third parties have been excluded from this analysis for computational reasons, though in testing this made little difference) electoral votes are allocated to whichever candidate is projected to win more than 50% of votes in a given state. Probabilities of victory are derived by simulating each state’s outcome thousands of times, accounting for the errors from predictions made by a similar model we built to make ex post facto predictions of the actual results of the 2016 presidential election.

The results are presented in this week’s Graphic Detail piece:

The main chart from the print article

First, second and …n principles

From start to finish, our approach was not an easy one. Although the processes resembled that for a typical social-science research article, the time frame was much more compressed: journalistic demands necessitated that the work be completed roughly within a month and a half. Should anyone want to repeat the method I have described above, they might want to keep a few things in mind.

First, familiarisation with concepts like Bayesian statistics was important in our approach (because several of our Data Team members are sticklers for uncertainty, myself included) but this is not strictly necessary. Other R packages exist to accomplish nearly identical tasks — in fact, we ended up using one of them, “lme4” to compute the final data because it generated identical point predictions. But either way, an understanding of subjects like public opinion polling, survey weights and American voting behaviour is crucial. Had we not completed similar projects before, this one would have taken even longer.

Second, MRP is an effective tool to extract reliable estimates of state-level opinion from national polling, but it is not perfect. Even after having a validated record of who voted in 2016, the model still cannot precisely predict the election; the average absolute error in our predictions of Hillary Clinton’s state-level vote share in the contest was just under 2 percentage points. Predictions made before the election, without the knowledge of who actually voted, could have had larger errors. The quality of the national survey is key; you cannot weight your way out of unrepresentative data.

Finally, there is a certain utility in pursuing a complex approach, but a parsimonious one that accomplishes the same task with as few bells and whistles as possible will make things much easier to explain to the reader. As that is our ultimate goal at The Economist, I did not do things like extract probabilities from posterior predictive distributions, include random effects terms with varying slopes or other such fanciness that a reader will only interpret as sociological gobbledygook, if they are communicated at all. That being said, this is not an endeavour for Ockhamites; there is danger in being too simplistic.

What you may be asking yourself after reading all of this text

Story time

In the end, the madness was worth it. Our team produced a phenomenal story. The finished product is a highly detailed answer to the question of how America’s political landscape would change if every adult citizen had been required to vote in its most recent presidential election.

We have quantified for the reader just how left-leaning America’s non-voters are. We have shown how an increase in voter turnout would produce varying political swings in states with different populations of whites and non-whites, holders of college degrees and high-school diplomas, millennials and baby boomers, etc. And although the numbers didn’t make it onto the page — we had fewer than 300 words to work with in this week’s chart-filled Graphic Detail — we were also able to show the persistence of a built-in electoral advantage for working-class whites in America, a frequently-covered topic of this newspaper. Finally, we have provided a data-driven answer to a quintessential Economist “What If?” question — something rarer in the era preceding this newspaper’s data team.

G. Elliott Morris is a data journalist at The Economist. You can follow The Economist’s Data Team on Twitter.