With the EU referendum just 2 months away, polls are pouring in. But, in view of recent performance, one can’t be blamed for being a bit anxious about reading too much into them. Are the pre-referendum polls another case of quantity trumping quality, or was the 2015 general election just an odd event, when the entire universe colluded against the British polling industry? We’ll only know the answer to that on the morning of June 24th 2016, but in the meantime, we can always speculate because what else are we going to do? Just wait around for 2 full months for reality to unfold while we discuss substantive policy issues? Of course not, as we have numbers, and numbers are fun!

First question: are we conducting too many polls? Yes, yes we are.

There are 2 polls released every 5 days.

Since September 2015, there have been 90 polls conducted. 76 of which online, the rest telephone. In just over 7 months, 181 827 people were asked the question: Should the United Kingdom remain a member of the European Union or leave the European Union? By the time I’m writing this, they will probably have interviewed the entire population of York. There are so many polls in fact, that I absolutely cannot keep up with all of them and at one point decided to just go ahead with this analysis even though there are at least 4 other pools out there at the time of writing that I will unabashedly ignore.

Is there any indication the quality is improving? No, nothing stands out.

We’ve seen before, from quite a few sources (John Curtice said it, the polling inquiry people said it, even I said it), that allowing for more difficult to reach respondents to complete the survey will drive more representative results. Because people will gladly spend a few minutes answering questions about topics they enjoy, polls tend to over-sample people with an interest in politics, people who also tend to behave differently from the voting population. So it pays off to leave polls open for a few more days, just to get those apathetic – and actually more average than the rest of us – respondents. Any progress on that?

The average poll conducted in 2016 is in field for 3.6 days. Only a fifth of polls conducted in 2016 (11 out of 50) are done in 5 days or more. If willingness to chase harder to reach respondents will play any role in the accuracy of this round of polls, we’re all in for a rough day on June 24th.

So how is sampling coming along?

All pollsters make public their tables, with a decent amount of cross-breaks (meaning data is reported among different groups of respondents, e.g. young and old, female or male, 2015 vote). This allows us to see how the samples are made up in terms of key demographics and past vote and how they are then processed to resemble more closely the British population, i.e. how different sub-groups are weighted. Most polls would report weighted and unweighted samples for some socio-demographic groups: age, gender, social class, and for 2015 General election vote. You’ll notice a very wide range of age groups, and that’s because, as much as I want to, pollsters just don’t like to be all like, they crave a little bit of individuality, and using different age groups is where that shines through.

It seems there’s a systematic bias to how samples are constructed. Gender is balanced, and class slightly biased towards ABC1s. When it comes to age, though, polls struggle to get enough young people in their sample. Knowing that younger voters tend to be more pro-EU than older voters, the failure to correctly sample might mean we are seeing a tight race that isn’t in line with real vote intentions. And while younger people are also less likely to turn up to vote, I would still argue that if you’re trying to get a representative view of the population, a representative sample is an important stepping stone.

Secondly, and what I’ve been saying for a while now, polls are failing to address non-voters. Unfortunately very few polls show the number of past non-voters in their samples. From the ones that do have that information, the picture that emerges is not all that rosy. While the samples are representative of the major parties’ electorate, when it comes to non-voters they fail to get enough respondents. This is an important issue, seeing as a third of the electorate did not actually vote. And that would be bad enough, but it turns out only 20% of the weighted sample is, on average, made up of non-voters. It’s not that pollsters have to over-weight non-voters (which turns out from last time, doesn’t even fix the problem), it’s that they’re not even weighting them enough!

All in all, I see few reasons to be optimistic for pollsters’ fate come June 23. But far be it for me to make predictions. It can always happen that turnout models will be very effective and polls in the final days before the referendum will be spot on. Consider this more a state of affairs report.

P.S. Many thanks to What UK Thinks for aggregating all the polls and links to original tables!