By Sean Trende - October 20, 2014

I've been getting a lot of e-mails and tweets either asking me if the polls might be skewed toward Republicans this year, or forwarding me articles suggesting that they might be. This isn’t unique: Around this time in 2012, Republicans were absolutely adamant that the polls were undercounting Republicans and that Romney was in far better shape than the polls suggested.

Today, the suggestion is that the polls may be undercounting some combination of nonwhite/young/cellphone-only voters in such a way that the polls are materially disfavoring Democrats. Georgia and especially Colorado are usually emphasized, with the emphasis on the latter focusing on pretty substantial poll misses there in 2010 and 2012.

My answer today is the same as it was in 2012: We really don’t know, but it isn’t smart to bet on a particular type of miss. The only thing we can say with any sort of certainty is that in some years the polls have had a slightly pro-Republican skew, while in other years they have had a slightly Democratic skew, while in still other years they’ve had no skew. Moreover, even though the skew might be material, it is typically also small: enough to flip a 51-49 race but not enough to save a candidate who is down, say, 50-44.

You can -- and should -- read Nate Silver and Mark Blumenthal on this. Much of what follows is meant mostly to build on and perhaps crystallize their observations as to why the previous instances of poll skew aren’t that useful.

The bottom line, I think, is that it is difficult to translate these observations into a prediction. It is one thing to say “there may have been skew in the previous two cycles.” It is quite another to say “on the basis of this, we can predict what will happen in the following cycle.” There are three points relevant here:

1) We cannot predict what years will have a Republican or Democratic skew. This is a crucially important point made by Silver that bears repeating and begs for further development. After all, the claim here is not simply that the polls may be skewed. The claim is that the polls may be skewed in a Democratic direction in this year.

In other words, to take this seriously, you have to take it is as a prediction. The problem arises when you ask the question: How can we make this prediction reliably, e.g., with some sort of methodology and based upon actual evidence.

One way would be to try to use prior years. But as Silver notes, this doesn’t work with the data we have; things seem to bounce around more or less randomly. If we had applied this approach in past years, we’d have predicted significant pro-Democratic skew in 1996 -- and we’d have been wrong. We’d have predicted growing pro-Republican skew in 2000, and we’d have been wrong. In 2004 we’d have predicted a pro-Republican skew -- numerous smart people did just that -- and we’d have been wrong.

We might conclude from this that skew is random. More likely, skew appears random, but it actually is the result of pollsters not being stuck in time. When polls go too far in a Republican direction, pollsters alter their methodology to account for this, by weighting, using Internet panels, and so forth. If polls go too far in a Democratic direction, they do the opposite. This could explain why we haven’t seen more than two elections in a row with a substantial skew, although given the small number of observations, even that supposed trend is tenuous.

We could also try to make a prediction based upon unique things we know to be true, or expect to be true, about given elections. For example, we might explain 2012 with the theory that the Obama campaign’s get-out-the-vote-efforts were missed by the pollsters. Democrats have invested heavily in their Bannock Street Project, with the goal of increasing minority turnout in off-year elections. So perhaps we should expect an even bigger miss this year.

This is intellectually satisfying at first blush. But in reality, there are a number of deep-seated problems with the approach. The biggest problem is that we wouldn’t be able to predict the past very well. Let’s try to deprive ourselves of the benefit of hindsight and ask ourselves, “Given conditions on the ground, what would our prediction have been previously, and how would it have held up?”

Well, we know what the prediction would have been in 2004 because a great many people were suggesting that there was a hidden Democrat vote that the pollsters weren’t picking up on, for any number of familiar-sounding reasons: cellphones, growing youth population, the growing pro-Republican skew in presidential elections in 1996 and 2000. They were wrong -- the polls were spot-on in 2004.

Or consider 2008, where a flood of first-time voters threatened to flummox pollsters. This is, in my opinion, the year with the single-strongest case to be made that there would be pro-Republican skew in the polls. Yet there was actually a minuscule Democratic skew.

What about 2012? Most analysts that I spoke with didn’t buy the unskewers’ arguments, but admitted that the partisan composition of some of the polls was eyebrow-raising. After all, there was a potential causal mechanism there: Pollsters had been shifting their methodologies to try to compensate for declining response rates and the growing cellphone-only population. Maybe they really had gone too far. No one I was aware of predicted a pro-Republican skew. But of course, the opposite was true.

It’s easy to come up with post-hoc explanations for why the polls did what they did in any given year. The trick is coming up with satisfying reasons that can explain why polls will go wrong ahead of time. We just don’t seem to have that yet.

2) We cannot predict which races will have a Republican or Democratic skew.

But even if we somehow could say “on average, the polls will be skewed toward Republicans or Democrats” in a given year, to be useful, we’d have to have reason to suspect that the skew applies more or less evenly across races. But this doesn’t appear to be the case.

For example, Blumenthal -- who, again, doesn’t endorse this approach -- presents data from 2010 showing a skew of about 3.1 points toward Republicans in the close races. If you move the data 3.1 points toward Democrats, you’d have avoided missed calls in Nevada and Colorado, but you’d have gotten Illinois wrong. In 2006, you would have predicted sizable losses for George Allen and Conrad Burns by correcting for the overall skew of the polls. Yet those two races went against the grain and were razor close, and they were almost the difference makers in Senate control that year.

3) We cannot adequately resist the temptations of our own biases.

The biggest problem with these sort of data -- very little variance, small number of observations -- is that they invite introduction of our own biases. I don’t mean bias in the crudest sense of the term, although I don’t think it is accidental that the people discussing poll skew in 2012 tended to be conservatives, and vice-versa this year.

I mean it in a more general sense. Humans are remarkably adept at discovering and using patterns. We don’t like chaos, and this is part of what has allowed us to advance as a species. Yet our minds aren’t precisely fine-tuned to patterns; we’re overly sensitive, and so we see dragons in clouds, a man’s face on the moon, and images of Mary in a grilled cheese sandwich. If I gave you a page with 15 dots and challenged you to fill in the gaps with what you saw, you’d probably come back with a picture of a Dimetrodon (or at least, that’s what I’d be inclined to draw) or some such; you wouldn’t likely return the page and tell me it is just random noise.

We do the same thing with data. We do it in very obviously bad ways -- there was a cottage industry of predicting presidential elections based on the winner of the final Redskins football game from 1932 to 2004 (there’s actually a statistically significant correlation between the margins of those games and the margin of presidential elections during this time).

But where it’s most dangerous is when we have good reason for believing that there has to be a pattern. I thought of this when reading Sam Wang’s most recent post where, in an update, he writes that “[m]aybe Silver and I have both missed the true pattern: midterms vs. Presidential years.” There’s something of a quiet assumption there: That there is a pattern to be found in the first place. Now Wang is known as a careful empiricist in his discipline of neurology, and he has done important work in an area near and dear to my heart: autism research. So I don’t think he means this statement to be taken as literally as it sounds -- of all people, he’s the most aware of these neurological tendencies in our species.

But it is nevertheless a useful illustration. If you only have a dozen or so data points and go looking for a pattern, sooner or later you will find something that explains those data points well. The problem is that we don’t have a great basis for sorting out the good theory from the bad, at least until the theory has survived a few trial runs.

To see the potential problem here, when I look at close races only in midterms, the pattern that jumps out at me is that pollsters understate the “victorious” party. 1994 and 2002 were good Republican years, and there was a pro-Democratic bias. 1998 and 2006 were good Democratic years, and there was a pro-Republican bias. This might suggest that there will be a pro-Democratic skew this year.

I can even justify this on the basis of theory: either pollsters miss a late break toward the fundamentals in these years (this is consistent with what John Sides suggests at the Washington Post) or they look at their results and think “this can’t be right; Jim Sasser isn’t going to lose,” and sit on the poll. Obviously, 2010 would need to be explained away, but if I took this theory seriously I could do so, either on the basis that Republican voters made up their minds extremely early, so there was an inevitably Democratic break at the end, or that the indicators were so strong that pollsters were actually surprised when Democrats like Harry Reid and Michael Bennet were hanging on.

The bottom line is that we have neither the data nor well-tested theories to explain what sort of skew we should expect this cycle. For my money, there are two races where I really take charges of poll skew seriously: Alaska, where seven of the last seven races have understated Republican strength (by seven points on average), and Colorado, where the introduction of mail voting probably does make the electorate difficult to model. Beyond that, I would not be surprised if there was a Republican skew, but I likewise would not be surprised if there was a Democratic skew. The possibilities basically cancel out, and I’m left with the simple poll averages as the best guidance for this election.