The State Of The Polls, 2019 Polls just had one of their best election cycles, ever — but challenges abound in the industry

Much maligned for their performance in the 2016 general election — and somewhat unfairly so, since the overall accuracy of the polls was only slightly below average that year by historical standards — American election polls have been quite accurate since then. Their performance was very strong in the 2018 midterms, despite the challenge of having to poll dozens of diverse congressional districts around the country, many of which had not had a competitive election in years. Polls have also generally been accurate in the various special elections and off-year gubernatorial elections that have occurred since 2016, even though those are also often difficult races to poll.

[Related: FiveThirtyEight’s Pollster Ratings]

Does that mean everything is looking up in the industry? Well, no. We’ll introduce some complications in a moment. But I do want to re-emphasize that opening takeaway, since the media is just flatly wrong when it asserts that the polls can’t be trusted. In fact, American election polls are about as accurate as they’ve always been. That doesn’t mean polls will always identify the right winner, especially in close elections. (As a simple rule of thumb, we’ve found polls “call” the right winner 80 percent of the time, meaning they fail to do so the other 20 percent of the time — although upsets are more likely to occur in some circumstances than others.) But the rate of upsets hasn’t changed much over time.

Before we go any further, I want to direct you to the latest version of FiveThirtyEight’s pollster ratings, which we’ve updated for the first time since May 2018. They include all polls in the three weeks leading up to every U.S. House, U.S. Senate and gubernatorial general election since then, including special elections, plus a handful of polls from past years that were missing from previous versions of our database. You can find much more detail on the pollster ratings here, including all the polls used in the ratings calculation. Our presidential approval ratings, generic congressional ballot and impeachment trackers have also been updated to reflect these new ratings, although they make little difference to the topline numbers.

Now then, for those complications: The main one is simply that response rates to traditional telephone polls continue to decline. In large part because of caller-ID and call-blocking technologies, it’s simply harder than it used to be to get people to answer phone calls from people they don’t know. In addition to potentially making polls less accurate, that also makes them more expensive, since a pollster has to spend more time making calls for every completed response that it gets. As a result, the overall number of polls has begun to slightly decline. There were 532 polls in our pollster ratings database, which covers polls in the 21 days before elections occur, associated with elections on Nov. 6, 2018, which is down from 558 polls for Election Day 2014 and 692 polls for Election Day 2010.





FiveThirtyEight Politics Podcast: The races weâre watching on Election Day 2019

So why not turn to online polls or other new technologies? Well, the problem is that in recent elections, polls that use live interviewers to call both landlines and cellphones continue to outperform other methods, such as online and automated (IVR) polls. Moreover, online and IVR polls are generally more prone toward herding — that is, making methodological choices, or picking and choosing which results they publish, in ways that make their polls match other, more traditional polls. So not only are online and automated polls somewhat less accurate than live-caller polls, but they’d probably suffer a further decline in accuracy if they didn’t have live polls to herd toward.

Still, online polling is undoubtedly a large part of polling’s future — and some online polling firms are more accurate than others. Among the most prolific online pollsters, for example, YouGov stands out for being more accurate than others such as Zogby, SurveyMonkey, and Harris Insights & Analytics. And many former IVR pollsters are now migrating to hybrid methods that combine automated phone polling with internet panels. In the 2018 elections, this produced better results in some cases (e.g., SurveyUSA) than in others (e.g., Rasmussen Reports).

Polls have been quite accurate — and unbiased — in post-2016 elections

Each time we update our pollster ratings, we publish a few charts that depict the overall health of the industry — so let’s go ahead and run the numbers again. The first chart is the one we consider to be the most important: the average error of polls broken down by the type of election. A few quick methodological notes:

By average error, I mean the difference between the margin projected by the poll and the actual election result. For instance, if the poll shows the Democrat up by 1 percentage point and the Republican wins by 2 points, that would be a 3-point error.

To not give any one polling firm too much influence, the values in the chart are weighted based on the number of polls a particular pollster conducted for that particular type of election in that particular cycle

Polls that are banned by FiveThirtyEight because we know or suspect that they faked data are excluded from the analysis.

Note that I’ve included the handful of elections that have occurred so far in 2019 with the 2017-18 election cycle, even though we’ll classify them them later as part of the 2019-20 cycle instead.

OK, here’s the data:

Post-2016 polls have been accurate by historical standards Weighted-average error of polls in final 21 days before the election, among polls in FiveThirtyEight’s Pollster Ratings database Presidential Cycle Governor U.S. Senate U.S. House General Primary Combined 1998 8.2 7.4 6.8 7.6 1999-2000 4.9 6.1 4.4 4.4 7.6 5.5 2001-02 5.2 4.9 5.4 5.2 2003-04 6.0 5.6 5.4 3.2 7.1 4.8 2005-06 5.0 4.2 6.5 5.3 2007-08 4.1 4.7 5.7 3.6 7.4 5.4 2009-10 4.9 4.8 6.9 5.7 2011-12 4.9 4.7 5.1 3.6 8.9 5.2 2013-14 4.6 5.5 6.5 5.4 2015-16 5.4 5.0 5.5 4.8 10.1 6.7 2017-19 5.3 4.3 5.0 5.0 All years 5.4 5.3 6.1 4.0 8.7 5.8 Averages are weighted by the square root of the number of polls that a particular pollster conducted for that particular type of election in that particular cycle. Polls that are banned by FiveThirtyEight because we know or suspect they faked data are excluded from the analysis.

As I said, the 2017-19 cycle was one of the most accurate on record for polling. The average error of 5.0 points in polls of U.S. House elections is the second-best in our database, trailing only 1999-2000. The 4.3-point error associated with U.S. Senate elections is also the second-best, slightly trailing 2005-06. And gubernatorial polls had an average error of 5.3 points, which is about average by historical standards.

Combining all different types of elections together, we find that polls from 2017 onward have been associated with an average error of 5.0 points, which is considerably better than the 6.7-point average for 2015-16, and the best in any election cycle since 2003-04.

But note that there’s just not much of an overall trajectory — upward or downward — in polling accuracy. Relatively strong cycles for the polls can be followed by relatively weak ones, and vice versa.

One more key reminder now that the Iowa caucuses are only three months away: Some types of elections are associated with considerably larger polling errors than others. In particular, presidential primaries feature polling that is often volatile at best, and downright inaccurate at worst. Overall, presidential primary polls in our database mispredict the final margin between the top two candidates by an average of 8.7 points. And the error was even worse, 10.1 points, in the 2016 primary cycle. Leads of 10 points, 15 points or sometimes more are not necessarily safe in the primaries.

We can also look at polling accuracy by simply counting up how often the candidate leading in the poll wins his or her race. This isn’t our preferred method, as it’s a bit simplistic — if a poll had the Republican ahead by 1 point and the Democrat won by 1 point, that’s a much more accurate result than if the Republican had won by 20, even though it would have incorrectly identified the winner. But across all polls in our database, the winner was “called” 79 percent of the time.

Polls “call” the winner right 79 percent of the time Weighted-average share of polls that correctly identified the winner in final 21 days before the election, among polls in FiveThirtyEight’s Pollster Ratings database Presidential Cycle Governor U.S. Senate U.S. House General Primary Combined 1998 86% 86% 57% 78% 1999-2000 80 80 56 68% 95% 76 2001-2002 87 87 77 82 2003-2004 76 76 69 78 94 79 2005-2006 89 89 71 83 2007-2008 95 95 83 94 80 88 2009-2010 85 85 75 82 2011-2012 90 90 70 81 63 77 2013-2014 80 80 76 77 2015-2016 68 68 57 71 86 77 2017-2019 77 77 78 76 All years 82 82 72 79 83 79 Pollsters get half-credit if they show a tie for the lead and one of the leading candidates wins. Averages are weighted by the square root of the number of polls that a particular pollster conducted for that particular type of election in that particular cycle. Polls that are banned by FiveThirtyEight because we know or suspect they faked data are excluded from the analysis.

In recent elections, the winning percentage has been slightly below the long-term average — it was 76 percent in 2017-19. But this reflects the recent uptick in close elections, and that resource-constrained pollsters tend to poll these close elections more heavily.

As basic as this analysis is, it’s essential to remember that polls are much more likely to misidentify the winner when they show a close race. Polls in our database that showed a lead of 3 percentage points or less identified the winner only 58 percent of the time — a bit better than random chance, but not much better. But polls showing a 3- or 6-point lead were right 72 percent of the time, and those with a 6- or 10-point lead were right 86 percent of the time. (Errors in races showing double-digit leads are quite rare in general elections, although they occur with some frequency in primaries. And errors in races where one candidate leads by 20 or more points are once-in-a-blue-moon types of events, regardless of the type of election.)

Polls often misidentify the winner in a close race Share of polls that correctly identified the winner in final 21 days before the election, among polls in FiveThirtyEight’s Pollster Ratings database Leading candidate’s margin Share of polls correctly identifying winner 0-3 points 58% – – 3-6 points 72 – – 6-10 points 86 – – 10-15 points 94 – – 15-20 points 98 – – ≥20 points >99 – – Polls that are banned by FiveThirtyEight because we know or suspect they faked data are excluded from the analysis.

Another essential measure of polling accuracy is statistical bias — that is, whether the polls tend to miss in the same direction. We’re particularly interested in understanding whether polls systematically favor Democrats or Republicans. Take the polls in 2016, for instance. Although they weren’t that bad from an accuracy standpoint, the majority underestimated President Trump and Republicans running for Congress and governor, leading them to underestimate how well Trump would do in the Electoral College. Overall in the 2015-16 cycle, polls had a Democratic bias (meaning they overestimated Democrats and underestimated Republicans) of 3.0 percentage points. And that after a 2013-14 cycle when polls also had a Democratic bias (of 2.7 percentage points).

Polling bias is not very consistent from cycle to cycle Weighted-average statistical bias of polls in final 21 days of the election, among polls in FiveThirtyEight’s Pollster Ratings database Cycle Governor U.S. Senate U.S. House Pres. General Combined 1998 R+5.7 R+4.8 R+1.5 R+4.2 1999-2000 D+0.6 R+2.9 D+0.9 R+2.6 R+1.8 2001-2002 D+3.0 D+1.4 D+1.3 D+2.2 2003-2004 R+4.2 D+1.7 D+2.5 D+1.1 D+0.9 2005-2006 D+0.3 R+1.3 D+0.2 R+0.1 2007-2008 D+0.5 D+0.8 D+1.0 D+1.1 D+1.0 2009-2010 R+0.7 D+1.7 D+0.6 2011-2012 R+1.3 R+3.3 R+2.6 R+2.5 R+2.6 2013-2014 D+2.3 D+2.5 D+3.7 D+2.7 2015-2016 D+3.3 D+2.8 D+3.7 D+3.1 D+3.0 2017-2019 R+0.9 D+0.1 R+0.3 R+0.3 All years D+0.3 D+0.1 D+0.7 D+0.2 D+0.3 Bias is calculated only for elections where the top two finishers were a Republican and Democrat. Therefore, it is not calculated for presidential primaries. Averages are weighted by the square root of the number of polls that a particular pollster conducted for that particular type of election in that particular cycle. Polls that are banned by FiveThirtyEight because we know or suspect they faked data are excluded from the analysis.

In 2017-19, however, polls had essentially no partisan bias, and to the extent there was one, it was a very slight bias toward Republicans (0.3 percentage points). And that’s been the long-term pattern: Whatever bias there is in one batch of election polls doesn’t tend to persist from one cycle to the next. The Republican bias in the polls in 2011-12, for instance, which tended to underestimate then-President Obama’s re-election margins, was followed by two cycles of Democratic bias in 2013-14 and 2015-16, as previously mentioned. There is simply not much point in trying to guess the direction of poll bias ahead of time; if anything, it often seems to go against what the conventional wisdom expects. Instead, you should always be prepared for the possibility of systematic polling errors of several percentage points in either direction.

Which pollsters have been most accurate in recent elections?

Although it can be dangerous to put too much stock in the performance of a pollster in a single election cycle — it takes dozens of polls to reliably assess a pollster’s accuracy — it’s nonetheless worth briefly remarking on the recent performance of some of the more prolific ones. Below, you’ll find the average error, statistical bias and a calculation we call Advanced Plus-Minus (basically, how the pollster’s average error compares to other pollsters’ in the same election), for pollsters with at least five polls in our database for the 2017-19 cycle. Note that negative Advanced Plus-Minus scores are good; they indicate that a firm’s polls were more accurate than others in the same races.

How prolific pollsters have fared in recent elections Advanced Plus-Minus scores and other metrics for pollsters who conducted at least five surveys for the 2017-19 cycle, in FiveThirtyEight’s Pollster Ratings database Pollster Methodology No. of Polls Avg. Error Bias Adv. Plus-Minus ABC News/Washington Post Live 5 1.7 R+0.9 -4.1 Cygnal IVR/Online/Live 9 2.5 D+1.9 -3.7 Mason-Dixon Polling & Research Inc. Live 7 2.8 R+1.0 -3.0 Monmouth University Live 9 3.1 R+1.7 -2.9 Suffolk University Live 7 2.7 R+1.3 -2.7 Research Co. Online 20 3.8 R+1.1 -2.3 Mitchell Research & Communications IVR/Online 6 2.5 R+0.9 -2.0 Siena College/New York Times Upshot Live 47 3.6 R+1.3 -1.7 Emerson College IVR/Online 66 4.2 R+0.5 -1.5 Marist College Live 13 4.4 D+2.7 -1.1 Landmark Communications IVR/Online/Live 5 4.1 D+3.9 -1.0 YouGov Online 12 3.1 R+1.7 -1.0 SurveyUSA IVR/Online/Live 13 4.1 R+0.7 -1.0 Gravis Marketing IVR/Online/Live 25 3.8 D+0.6 -0.8 Harris Insights & Analytics Online 34 3.7 R+0.2 -0.2 Vox Populi Polling IVR/Online 7 4.5 D+3.6 +0.0 St. Pete Polls IVR 10 2.3 D+1.7 +0.0 Fox News/Anderson Robbins Research/Shaw & Co. Research Live 10 4.7 D+2.7 +0.0 Remington Research Group IVR/Live 5 4.1 D+3.1 +0.3 Change Research Online 57 5.5 D+1.5 +0.6 Quinnipiac University Live 13 4.3 D+2.7 +0.7 JMC Analytics/Bold Blue Campaigns Live 5 6.7 R+5.5 +0.9 SSRS Live 11 5.2 D+4.3 +0.9 Optimus IVR/Online/Live/Text 5 6.8 R+6.8 +0.9 Strategic Research Associates Live 5 5 D+1.9 +1.0 Susquehanna Polling & Research Inc. IVR/Live 6 8.6 D+8.0 +1.4 Trafalgar Group IVR/Online/Live 21 4.6 R+1.9 +1.6 Ipsos Online 10 5.3 R+3.0 +2.2 Rasmussen Reports/Pulse Opinion Research IVR/Online 5 6.1 R+5.8 +3.2 Carroll Strategies IVR 5 9.9 R+9.9 +3.4 Dixie Strategies IVR/Live 5 8.4 R+5.9 +3.8

Four of the top 5 and 6 of the 10 best pollsters according to this metric were exclusively live-caller telephone polls. In exciting news for fans of innovative polling, the list includes polls from our friends at The New York Times’s Upshot, who launched an extremely successful and accurate polling collaboration with Siena College in 2016. (It also includes ABC News, FiveThirtyEight’s corporate parent, which usually conducts its polls jointly with The Washington Post.)

Conversely, the five of the top six worst-performing pollsters — including firms such as Carroll Strategies, Dixie Strategies, and Rasmussen Reports/Pulse Opinion Research — were IVR pollsters (sometimes in conjunction with other methods), several of which had strong Republican leans in 2017-19. Some IVR pollsters did perform reasonably well in 2015-16, a cycle where most pollsters underestimated Republicans. In retrospect, though, that may have been a case of two wrongs making a right; IVR polls tend to be Republican-leaning, so they’ll look good in years where Republicans beat their polls, but they’ll often be among the worst polls otherwise.

Indeed, aggregating the pollsters by methodology confirms that live caller polls continue to be the most accurate. Below are the aggregate scores for the three major categories of polls — live caller, online, and IVR — by our Advanced Plus-Minus metric, average error and statistical bias.

Live-caller polls have been most accurate in recent elections Advanced Plus-Minus scores and other metrics for pollsters who conducted at least five surveys for the 2017-19 cycle, in FiveThirtyEight’s Pollster Ratings database Methodology No. of Polls Avg. Error Bias Adv. Plus-Minus Live caller w/cell 356 4.9 R+0.5 -0.3 Live caller w/cell only 210 4.4 R+0.2 -0.8 Live caller w/cell hybrid 146 5.5 R+0.9 +0.4 IVR 239 5.2 R+1.0 +0.3 IVR only 19 6.9 R+5.4 +2.4 IVR hybrid 220 5 R+0.4 +0.1 Online or text 358 5 R+0.4 +0.2 Online or text only 154 5 D+0.4 +0.5 Online or text hybrid 204 5 R+0.8 +0.1 All polls 628 5 R+0.3 +0.0 Averages are weighted by the square root of the number of polls that a particular pollster conducted for that particular type of election in that particular cycle. Polls that are banned by FiveThirtyEight because we know or suspect they faked data are excluded from the analysis.

The differences are clearest when looking at pollsters that exclusively used one method. Polls that exclusively used live callers (including calling cellphones) had an average error of 4.4 percentage points in the 2017-19 cycle, as compared to 5.0 points for polls exclusively conducted online or via text message, and 6.9 points for polls that exclusively used IVR. (Pure IVR polls, however, are now quite rare. Polls that used a hybrid of IVR and other methods did better, with an average error of 5.0 percentage points.)

Polling firms that are members of professional polling organizations that push for transparency and other best practices also continue to outperform those that aren’t. In particular, our pollster ratings give credit to firms that support the American Association for Public Opinion Research (AAPOR) Transparency Initiative, belong to the National Council on Public Polls (NCPP), or contribute data to the Roper Center archive. Pollsters that are part of one or more of these initiatives had an average error of 4.3 percentage points in the 2017-19 cycle, as compared to 5.4 percentage points for those that aren’t.

Another way to detect herding

Our pollster ratings have also long included an adjustment to account for the fact that online and automated polls tend to perform better when there are high-quality polls in the field. We’ve confirmed that this still applies. For instance, polls that are conducted online or via IVR are about 0.4 percentage points more accurate based on our Advanced Plus-Minus metric when their polls are preceded by “gold standard” polls in the same race. (“Gold standard” is the term we use for pollsters that are exclusively live caller with cellphones and are also AAPOR/NCPP/Roper members.) Live-caller polls do not exhibit the same pattern, however; their Advanced Plus-Minus score is unaffected by the existence of an earlier “gold standard” poll in the field. This is probably the result of herding; some of the lower-quality pollsters may be doing the equivalent of peeking at their more studious classmate’s answers in a math test. In fact, these differences are especially strong in recent elections, suggesting that herding has become more of a problem.

There is also a second, more direct method to detect herding, which we’re also now applying in our pollster ratings. Namely — as described in this story — there is a minimum distance that a poll should be from the average of previous polls based on sampling error alone. For instance, even if you knew that a candidate was ahead 48-41 in a particular race — a 7-point lead — you’d miss that margin by an average of about 5 percentage points in a 600-person poll because sampling only 600 people rather than the entire population introduces sampling error. That is, because of sampling error, some polls would inevitably show a 12-point lead and some would show a 2-point lead instead of all the polls being bunched together at a 6- or 7- or 8-point lead exactly. If the polls are very tightly bunched together, this is not a good thing — you should be suspicious of herding, which can sometimes yield embarrassing outcomes where every poll gets the answer wrong

Of course, there are other complications in the real world. There’s no guarantee that the race will have been static since other pollsters surveyed the race; one candidate may be losing or gaining ground. And pollsters have healthy methodological disagreements from one another, so the same race may look different depending on what assumptions they make about turnout and so forth. But these should tend to increase the degree to which polls differ from each other, and not produce herding.

But our herding penalty only applies if pollsters show too little variation from the average of previous polls of the race based on sampling error alone. If a pollster is publishing all its data without being influenced by other pollsters — including its supposed outliers — it should be fairly easy to avoid this penalty over the long run.

Many polls are closer to the average of previous polls than they “should” be, however. Unlike the previous type of herding I described, which is concentrated among lower-quality pollsters who are essentially trying to draft off their neighbors to get better results, this tendency appears among some higher-quality pollsters as well. In some cases, we suspect, this is because, late in the race, a pollster doesn’t want to deal with the media firestorm that would inevitably ensue if it published a poll that appears to be an outlier. In other cases, frankly, we suspect that pollsters rather explicitly look at the FiveThirtyEight or RealClearPolitics polling average and attempt to match it.

In any event, our formula now detects this type of herding, and it results in a lower pollster rating when we catch it. . Our pollster ratings spreadsheet now calculates each pollster’s Average Distance from Polling Average, or ADPA, which is how much the pollster’s average poll differs from the average of previous polls of that race. Among pollsters with at least 15 polls, the largest herding penalties are as follows:

Which pollsters show the clearest signs of herding? Pollster Herding Penalty Research Co. 1.17 Muhlenberg College 0.84 Angus Reid Global 0.82 Grove Insight 0.71 NBC News/Wall Street Journal 0.53 The list is limited to pollsters with at least 15 polls for which an average of previous polls can be computed.

Other methodological changes

Unless you’re really into details — or you’re a pollster! — you probably aren’t going to care about these … but there are a few other methodological changes we’ve made to our pollster ratings this year.

Previously, pollsters got a bonus if they exclusively conducted their polls via live callers with cellphones, since these have been the most accurate polls over time. But this year, if a pollster uses live-caller-with-cellphone polls in combination with other methodologies, we now give them partial credit for the live-caller bonus. Even though these hybrid polls did not have a particularly good performance in 2017-19, they’ve been reasonably strong in the long run; also, we’re bowing to the reality that many formerly live pollsters are increasingly incorporating online or other methods into their repertoire.

In determining whether a poll’s result fell into or outside the margin of error, a calculation that’s available in our spreadsheet, we now use a more sophisticated margin of error formula that accounts for the percentages of the top two candidates and not just the distance between them. The margin of error is smaller in lopsided races, e.g., when one candidate leads 70-20.

Our Predictive Plus-Minus scores and pollster letter grades are based on a combination of a pollster’s empirical performance (how accurate it has been in the past) and its methodological characteristics. The more polls a firm has conducted, the more the formula weights its performance rather than its methodological prior. In assigning the weights, our formula now considers how recent a particular firm’s polls were. In other words, if a pollster has conducted a lot of surveys recently, its empirical accuracy will be more heavily weighted. But if most of its polling is in the distant past, its pollster rating will gradually revert toward the mean based on its methodology.

For pollsters with a relatively small sample of polling, we now show a provisional rating rather than a precise letter grade. (An “A/B” provisional rating means that the pollster has shown strong initial results, a “B/C” rating means it has average initial results, and a “C/D” rating means below-average initial results.) It now takes roughly 20 recent polls (or a larger number of older polls) for a pollster to get a precise pollster rating.

That’s all for now! Once again, you can find an interactive version of the pollster ratings here, and a link with further detail on them here. And if you have questions about the pollster ratings, you can always reach us here. Good luck to pollsters on having a strong performance in the primaries.