Tuesday 5/24, 8:20am: in the comments, an interesting discussion here.

Every election is different from every other! Every poll must be reported as a unique game-changing event!! Or… https://t.co/xFSUXpn5mR — Sam Wang (@SamWangPhD) May 22, 2016

Some media types are going around with their hair on fire over two unfavorable polls for Hillary Clinton in which she lags Donald Trump. In response in the NYT, Norm Ornstein and Alan Abramowitz are trying to convince you that these polls mean nothing. Nothing, I tell you! Don’t Panic!!!

In a deep sense, they’re right. As I wrote the other day, opinion can move a lot between now and Election Day. And it is inappropriate to trumpet a single poll showing an exceptional result, which is what the news channels do.

However, do not throw out the baby with the bathwater. In fact, we can learn quite a lot from polls by extracting as much value as possible from them. This can be tricky because right around now, national polls are the least informative they are going to be in 2016. To put it another way, polls will be more informative one month from now – and they were also more informative a month ago. How can this be, and what do we really know about the Clinton/Trump November win probability?

Elections scholar Christopher Wlezien very kindly sent me the data that he and Robert Erikson used to construct the graphs in The Timeline Of Presidential Elections: 1952-2008. Adding in 2012 data, I took time series from 16 Presidential campaigns and calculated the standard deviation of the total movement as a function of time. This is a measure of uncertainty about November based on polls for a given day. This graph shows the ±1 standard deviation interval in red:

(Note that in my previous post I plotted the standard deviation in the Democratic vote share. However, the appropriate standard deviation to use is the standard deviation of the Democratic-Republican margin, which is twice as large. This is why I had to revise the win probability. PEC regrets the error.)

This year, January 1st was 312 days before the election. At earlier dates, the standard deviation is between 14 and 22 percentage points. You can see the variation across 16 Presidential campaigns in the gray traces. So polls before the new year really are quite uninformative.

Now look at later dates: the gray curves converge. Consequently, the standard deviation declines, and reaches a local minimum at 270 days before the election, in mid-February- close to the start of primary season. So before the primaries start, February is a time when national polls tell us a fair amount about the final outcome.

But wait! After that, the standard deviation creeps upward. The election is 169 days from now, and in about a week the standard deviation hits its maximum value for 2016. Truly, now is the single worst time to be paying attention to fresh polling data. I don’t know why this is. It could be because typically, one or both parties are still going through an active nomination contest – as Hillary Clinton and Bernie Sanders are doing now.

Amusingly, national polls won’t reach their February levels of accuracy until August. The Clinton-Trump margin in February was Clinton +5.0%. So how about if we just use that until after the conventions. Can you wait?

No? Okay, let’s do something else. There are currently 88 national polls for 2016. We can weight these to create the best possible estimate for the November Clinton-Trump margin. For the weight, use 1/sigma for the corresponding date on the graph above. For independent observations, this weighted sum is optimal. Applied to past elections, it favors the November winner in in 14 out of 16 elections (missing Reagan in 1980 and Bush in 2000), an accuracy rate of 87.5%. This year, it gives us a weighted-average margin of Clinton +6.5%.

In short, we have a situation in which today’s snapshot (Clinton +2.7%) shows a close race with a definitive Clinton lead (93% probability according to HuffPollster), and the November outlook shows a larger average expected lead (Clinton +6.5%), but a lower win probability* of 70% – the same as what I wrote the other day.

>>>

Here are some caveats and consequences that come to mind:

1) My analysis today implies that the current movement in polls is transient. If uncertainty is larger now, this suggests that there is some natural set point for the Clinton-Trump contest – one where we had a clearer picture a few months ago than we would by watching today’s news.

My general sense of the current state of the race is that Democrats are still in the midst of their nomination process, while Republicans are coming together around their nominee. Either of these dynamics would be enough for polls to become less accurate – and to favor the candidate whose nomination is settled. If true, then we might expect numbers to move back toward Clinton after the June 7th primaries. Also possible, though less likely, is continued movement toward Trump.

2) It seems to me that during periods of increasing uncertainty, it is best to incorporate older polls, on the grounds that these data points add information and decrease uncertainty. Conversely, starting at 160 days before the election (early June), I should switch to a rolling time window, since at this point polls are becoming increasingly predictive.

3) Now is a time to pay attention to non-poll-based methods. As longtime readers know, I am generally against mixing up polls and “fundamentals”-based models. But it is a good time to consider the possibility of looking at them.

However, there are surprisingly few models worth looking at. Models are subject to conceptual and technical errors. And very few fundamentals-based models have well-understood error properties. In an exception, Lauderdale and Linzer did a particularly good job in 2012. At that time, they estimated that national vote share in their model had a 95% confidence interval of +/-7% at the national level. In the units I plotted in the red curve above (+/- 1 sigma in two-candidate margin), this probably corresponds to about +/-7%. If true, that approach would be better than polls from now through August. However, to my knowledge, Linzer (who now does analysis at Daily Kos Elections) has not come out with a public calculation this year. And so I wait.

4) (added Tuesday May 24th, 9:00am): In comments, Amit Lath points out that the 2000-2012 campaigns were less volatile than 1952-1996, so maybe that should be the baseline. As it turns out, that does not affect the November prediction very much. See my response here.

*To calculate a probability, note that the weighted-average value of sigma during the time period of January 1 to now is 11.1%. The probability is calculated in MATLAB as prob=tcdf(clinton_trump_margin/11.1,3). In Excel: =1-TDIST(clinton_trump_margin/11.1,3,1).