It is year since the 2017 general election. I am sure lots of people will be writing a lot of articles looking back at the election itself and the year since, but I thought I’d write a something about the 2017 polling error, something that has gone largely unexamined compared to the 2015 error. The polling companies themselves have all carried out their own internal examinations and reported to the BPC, and the BPC will be putting out a report based on that in due course. In the meantime, here are my own personal thoughts about the wider error across the industry.

The error in 2017 wasn’t the same as 2015.

Most casual observers of polls will probably have lumped the errors of 2015 and 2017 in together, and seen 2017 as just “the polls getting it wrong again”. In fact the nature of the error in 2017 was completely different to that in 2015. It would be wrong to say they are unconnected – the cause of the 2017 errors was often pollsters trying to correct the error of 2015 – but the way the polls were wrong was completely different.

To understand the difference between the errors in 2015 and the errors in 2017 it helps to think of polling methodology as being divided into two bits. The first is the sample – the way pollsters try to get respondents who are representative of the public, be that through their sampling itself or the weights they apply afterwards. The second is the adjustments they make to turn that sample into a measure of how people would actually vote, how they model things like turnout and accounting for people who say don’t know, or refuse to answer.

In 2015, the polling industry got the first of those wrong, and the second right (or at least, the second of those wasn’t the cause of the error). The Sturgis Inquiry into the 2015 polling error looked at every possible cause of error, and decided that the polls had samples that were not representative. While they didn’t think the way pollsters predicted turnout was based on strong enough evidence and recommended improvements there too, they ruled it out as cause of the 2015 error.

In 2017 it was the opposite situation. The polling samples themselves had pretty much the correct result to start with, showing only a small Tory lead. More traditional approaches towards modelling turnout (which typically made only small differences) would have resulted in polls that only marginally overstated the Tory lead. The large errors we saw in 2017 were down to the more elaborate adjustments that pollsters had introduced. If you had stripped away all the attempts aimed at modelling turnout, don’t knows and suchlike (as in the table below) then the underlying samples the pollsters were working with would have got the Conservative lead over Labour about right:

What did pollsters do that was wrong?

The actual things that pollsters did to make their figures wrong varied from pollster to pollster. So for ICM, ComRes and Ipsos MORI, it looks as if new turnout models inflated the Tory lead, for BMG it was their new adjustment for electoral registration, for YouGov it was reallocating don’t knows. The actual details were different in each case, but the thing they had in common was that pollsters had introduced post-fieldwork adjustments that had larger impacts than at past elections, and which ended up over-adjusting in favour of the Tories.

In working out how pollster came to make this error we need to have closer look at the diagnosis of what went wrong in 2015. Saying that samples were “wrong” is easy, if you are going to solve it you need to identify how they were wrong. After 2015 the broad consensus among the industry was that the samples had contained too many politically engaged young people who went out to vote Labour and not enough disinterested young people who stayed at home. Polling companies took a mixture of two different approaches towards dealing with this, though most companies did a bit of both.

One approach was to try and treat the cause of the error by improving the samples themselves, trying to increase the proportion of respondents who had less interest in politics. Companies started adding quotas or weights that had a more direct relationship with political interest, things like education (YouGov, Survation & Ipsos MORI), newspaper readership (Ipsos MORI) or straight out interest in politics (YouGov & ICM). Pollsters who primarily took this approach ended up with smaller Tory leads.

The other was to try and treat the symptom of the problem by introducing new approaches to turnout that assumed lower rates of turnout among respondents from demographic groups who had not traditionally turned out to vote in the past, and where pollsters felt samples had too many people who were likely to vote. The most notable examples were the decision by some pollsters to replace turnout models based on self-assessment, with turnout models based on demographics – downweighting groups like the young or working class who have traditionally had lower turnouts. Typically these changes produced polls with substantially larger Conservative leads.

So was it just to do with pollsters getting youth turnout wrong?

This explanation chimes nicely with the idea that the polling error was down to polling companies getting youth turnout wrong, that young people actually turned out at an unusually high level, but that polling companies fixed youth turnout at an artificially low level, thereby missing this surge in young voting. This is an attractive theory at first glance, but as is so often the case, it’s actually a bit more complicated than that.

The first problem with the theory is that it’s far from clear whether there was a surge in youth turnout. The British Election Study has cast doubt upon whether or not youth turnout really did rise that much. That’s not a debate I’m planning on getting into here, but suffice to say, if there wasn’t really that much of a leap in youth turnout, then it cannot explain some of the large polling misses in 2017.

The second problem with the hypothesis is that there isn’t really that much relationship between those polling companies who had about the right proportion of young people in their samples and those who got it right.

The chart below shows the proportion of voters aged under 25 in each polling company’s final polling figures. The blue bar is the proportion in the sample as a whole, the red bar the proportion in the final voting figures, once pollsters had factored in turnout, dealt with don’t knows and so on. As you would expect, everyone had roughly the same proportion of under 25s in their weighted sample (in line with the actual proportion of 18-24 year olds in the population), but among their sample of actual voters it differs radically. At one end, less than 4% of BMG’s final voting intention figures were based on people aged under 25s. At the other end, almost 11% of Survation’s final voting figures were based on under 25s.

According to the British Election Study, the closest we have to authorative figures, the correct figure should have been about 7%. That implies Survation got it right despite having far too many young people. ComRes had too many young people, yet had one of the worst understatements of Labour support. MORI had close to the correct proportion of young people, yet still got it wrong. There isn’t the neat relationship we’d expect if this was all about getting the correct proportion of young voters. Clearly the explanation must be rather more complicated than that.

So what exactly did go wrong?

Without a nice, neat explanation like youth turnout, the best overarching explanation for the 2017 error is that polling companies seeking to solve the overstatement of Labour in 2015 simply went too far and ended up understating them in 2017. The actual details of this differed from company to company, but it’s fair to say that the more elaborate the adjustments that polling companies made for things like turnout and don’t knows, the worse they performed. Essentially, polling companies over-did it.

Weighting down young people was part of this, but it was certainly not the whole explanation and some pollsters came unstruck for different reasons. This is not an attempt to look in detail at each pollster, as they may also have had individual factors at play (in BMG’s report, for example, they’ve also highlighted the impact of doing late fieldwork during the daytime), but there is a clear pattern of over-enthusiastic post-fieldwork adjustments turning essentially decent samples into final figures that were too Conservative:

BMG’s weighted sample would have shown the parties neck-and-neck. With just traditional turnout weighting they would have given the Tories around a four point lead. However, combining this with an additional down-weighting by past non-voting and the likelihood of different age/tenure groups to be registered to vote changed this into a 13 point Tory lead.

ICM’s weighted sample would have shown a five point Tory lead. Adding demographic likelihood to vote weights that largely downweighted the young increased this to a 12 point Tory lead.

Ipsos MORI’s weighted sample would have shown the parties neck-and-neck, and MORI’s traditional 10/10 turnout filter looks as if it would have produced an almost spot-on 2 point Tory lead. An additional turnout filter based on demographics changed this to an 8 point Tory lead.

YouGov’s weighted sample had a 3 point Tory lead, which would’ve been unchanged by their traditional turnout weights (and which also exactly matched their MRP model). Reallocating don’t knows changed this to a 7 point Tory lead.

ComRes’s weighted sample had a 1 point Conservative lead, and by my calculations their old turnout model would have shown much the same. Their new demographic turnout model did not actually understate the proportion of young people, but did weight down working class voters, producing a 12 point Tory lead.

Does this mean modelling turnout by demographics is dead?

No. Or at least, it shouldn’t do. The pollsters who got it most conspicuously wrong in 2017 were indeed those who relied on demographic turnout models, but this may have been down to the way they did it.

Normally weights are applied to a sample all at the same time using “rim weighting” (this is an iterative process that lets you weight by multiple items without them throwing each other off). What happened with the demographic turnout modelling in 2017 is that companies effectively did two lots of weights. First they weighted the demographics and past vote of the data so it matched the British population. Then they effectively added separate weights by things like age, gender and tenure so that the demographics of those people included in their final voting figures matched the people who actually voted in 2015. The problem is this may well have thrown out the past vote figures, so the 2015 voters in their samples matched the demographics of 2015 voters, but didn’t match the politics of 2015 voters.

It’s worth noting that some companies used demographic based turnout modelling and were far more successful. Kantar’s polling used a hybrid turnout model based upon both demographics and self-reporting, and was one of the most accurate polls. Surveymonkey’s turnout modelling was based on the demographics of people who voted in 2015, and produced only a 4 point Tory lead. YouGov’s MRP model used demographics to predicts respondents likelihood to vote and was extremely accurate. There were companies who made a success of it, and it may be more of a question about how to do it well, rather than whether one does it at all.

What have polling companies done to correct the 2017 problems, and should I trust them?

For individual polling companies the errors of 2017 are far more straightforward to address than in 2015. For most polling companies it has been a simple matter of dropping the adjustments that went wrong. All the causes of error I listed above have simply been reversed – for example, ICM have dropped their demographic turnout model and gone back to asking people how likely they are to vote, ComRes have done the same. MORI have stopped factoring demographics into their turnout, YouGov aren’t reallocating don’t knows, BMG aren’t currently weighting down groups with lower registration.

If you are worried that the specific type of polling error we saw in 2017 could be happening now you shouldn’t be – all the methods that caused the error have been removed. A simplistic view that the polls understated Labour in 2017 and, therefore, Labour are actually doing better than the polls suggest is obviously fallacious.

However, that is obviously not a guarantee that polls couldn’t be wrong in other ways.

But what about the polling error of 2015?

This is a much more pertinent question. The methodology changes that were introduced in 2017 were intended to correct the problems of 2015. So if the changes are reversed, does that mean the errors of 2015 will re-emerge? Will polls risk *overstating* Labour support again?

The difficult situation the polling companies find themselves in is that the methods used in 2017 would have got 2015 correct, but got 2017 wrong. The methods used in 2015 would have got 2017 correct, but got 2015 wrong. The question we face is what approach would have got both 2015 and 2017 right?

One answer may be for polling companies to use more moderate versions of the changes them introduced in 2017. Another may be to concentrate more on improving samples, rather than post-fieldwork adjustments to turnout. As we saw earlier in the article, polling companies took a mixture of two approaches to solving the problem of 2017. The approach of “treating the symptom” by changing turnout models and similar ended up backfiring, but what about the first approach – what became of the attempts to improve the samples themselves?

As we saw above, the actual samples the polls used were broadly accurate. They tended to have smaller parties too high, but the balance between Labour and Conserative was pretty accurate. For one reason or another, the sampling problem from 2015 appears to have completely disappeared by 2017. 2015 samples were skewed towards Labour, but in 2017 they were not. I can think of three possible explanations for this.

The post-2015 changes made by the polling companies corrected the problem. This seems unlikely to be the sole reason, as polling samples were better across the board, with those companies who had done little to improve their samples performing in line with those who had made extensive efforts.

Weighting and sampling by the EU ref made samples better. There is one sampling/weighting change that nearly everyone made – they started sampling/weighting by recalled EU ref vote, something that was an important driver of how people voted in 2017. It may just be that providence has provided the polling companies with a useful new weighting variable that meant samples were far better at predicting vote shares.

Or perhaps the causes of the problems in 2015 just weren’t an issue in 2017. A sample being wrong doesn’t necessarily mean the result will be wrong. For example, if I had too many people with ginger hair in my sample, the results would probably still be correct (unless there is some hitherto unknown relationship between voting and hair colour). It’s possible that – once you’ve controlled for other factors – in 2015 people with low political engagement voted differently to engaged people, but that in 2017 they voted in much the same way. In other words, it’s possible that the sampling shortcomings of 2015 didn’t go away, they just ceased to matter.

It is difficult to come to firm answer with the data available, but whichever mix of these is the case, polling companies shouldn’t be complacent. Some of them have made substantial attempts to improve their samples from 2015, but if the problems of 2015 disappeared because of the impact of weighting by Brexit or because political engagement mattered less in 2017, then we cannot really tell how successful they were. And it stores up potential problems for the future – weighting by a referendum that happened in 2016 will only be workable for so long, and if political engagement didn’t matter this time, it doesn’t mean it won’t in 2022.

Will MRP save the day?

One of the few conspicuous successes in the election polling was the YouGov MRP model (that is, multi-level regression and post-stratification). I expect come the next election there will be many other attempts to do the same. I will urge one note of caution – MRP is not a panacea to polling’s problems. They can go wrong, and still relies on the decisions people make in designing the model it runs upon.

MRP is primarily a method of modelling opinion at lower geographical areas from a big overall dataset. Hence in 2017 YouGov used it to model the share of the vote in the 632 constituencies in Great Britain. In that sense, it’s a genuinely important step forward in election polling, because it properly models actual seat numbers and, from there, who will win the election and will be in a position to form a government. Previously polls could only predict shares of the vote, which others could use to project into a result using the rather blunt tool of uniform national swing. MRP produces figures at the seat level, so can be used to predict the actual result.

Of course, if you’ve got shares of the vote for each seat then you’ll also be able to use it to get national shares of the vote. However, at that level it really shouldn’t be that different from what you’d get from a traditional poll that weighted its sample using the variables and the same targets (indeed, the YouGov MRP and traditional polls showed much the same figures for much of the campaign – the differences came down to turnout adjustments and don’t knows). Its level of accuracy will still depend on the quality of the data, the quality of the modelling and whether the people behind it have made the right decisions about the variables used in the model and on how they model things like turnout… in other words, all the same things that determine if an opinion poll gets it right or not.

In short, I do hope the YouGov MRP model works as well in 2022 as it did in 2017, but MRP as a technique is not infallible. Lord Ashcroft also did a MRP model in 2017, and that was showing a Tory majority of 60.

TLDR:

