In today’s news, the circus continues: Sarah Palin is still looking for the real killer. Meanwhile, in real news: universal health care? Alaska Senate update: As I predicted from polls, Begich is pulling ahead. Andrew Sullivan is still clinging to the idea that turnout is suspiciously down from 2004, which I have pointed out is probably not true. I’ll stick with my prediction of a Begich win by 2-7%, and normal turnout. In the face of hard polling data, a straightforward interpretation without conspiracies is most likely to be right.

Now, to my real topic today, an oddity of polling. Although average poll margins can predict the eventual winner of a race, they usually underreport the final margin for whoever wins. How can this be?

A few days ago I showed you a plot. Here it is, this time with a fit line.

If polls were accurate, the slope of the green line would be 1 with an intercept of 0. But it is not. Indeed, for every 1% of actual margin, only 0.84+/-0.03% is captured in polls*. This underperformance implies that there is a hidden bonus for whoever is ahead. The intercept essentially goes through the origin, so there’s no overall bias toward either candidate.

This result can be understood better by restating it in terms of pollsters’ methods. Pollsters identify the people in their sample that they consider more likely to vote. Then they re-weight according to likely voting as well as other factors (e.g. age and sex). In light of this, one or both of the following is probably true:

Within a state, the leading candidate’s supporters are relatively more likely to vote than predicted by likely-voter screens. Within a state, the trailing candidate’s supporters are relatively less likely to vote than predicted by likely-voter screens.

Both scenarios are consistent with the fact that the direction of the effect depends on who’s leading locally.

So likely-voter screens are evidently missing some aspect of voter behavior on Election Day. What’s missing, and why does it show up differently on a state by state level?

One possibility is that many voters are aware of which way their state is likely to go, and modify their behavior slightly. For example, a small fraction might stay home (or at work) because it’s not fun to vote for a known loser. If the probability of voting differered by a net 8% between McCain and Obama “likely voters,” this could then enlarge the size of a win.

Another possibility is that pollsters’ estimates of the likelihood of voting is not sufficiently quantitative. It’s reasonable that they can figure out that one person is more likely to vote than another. But can they do it accurately? For instance, if I am more enthusiastic than you about voting for my candidate, am I 5% more likely to vote than you – or 15%? This can easily lead to bad weighting.

Why has this flaw been allowed to persist? Basically, there’s no penalty. Fixing the error would not lead to improved prediction of winners. A repair would only give a more accurate estimate of the winner’s eventual margin. Yet it is rare to hear anyone ask why a pollster’s projection was too conservative. So there’s no pressure to do better.

Note: Comments and some private correspondence indicate that a few of you have gotten the direction of this discrepancy backwards. To reiterate, polls underreport the eventual margins, which are about 19% larger than the last pre-election surveys would indicate. The effect is in the opposite direction as a hypothetical regression to the mean (itself not reliable, as I point out in comment #20).



*The fit was an unweighted regression of poll vs. actual. The plotted slope is the inverse, 1/0.84=1.19. A weighted fit gives a fit slope of 0.81+/-0.03 poll % per actual % margin.