As FiveThirtyEight has evolved over the past 10 years, we’ve taken an increasingly “macro” view of polling. By that, I mean: We’re more interested in how the polls are doing overall — and in broad trends within the polling industry — and less in how individual polls or pollsters are performing. As we described in an article earlier this week, overall the polls are doing … all right. Contrary to the narrative about the polls, polling accuracy has been fairly constant over the past couple of decades in the U.S. and other democratic countries.

Still, in election coverage, the “micro” matters too, and our newly updated pollster ratings — in which we evaluate the performance of individual polling firms based on their methodology and past accuracy — are still a foundational part of FiveThirtyEight. They figure into the algorithms that we design to measure President Trump’s approval ratings and to forecast elections (higher-rated pollsters get more weight in the projections). And sometimes those pollster ratings can reveal broad trends too: For example, after a reasonably strong 2012, online polls were fairly inaccurate in 2016.

The ratings also allow us to measure pollster performance over a large sample of elections — rather than placing a disproportionate amount of emphasis on one or two high-profile races. For instance, Rasmussen Reports deserves a lot of credit for its final, national poll of the 2016 presidential election, which had Hillary Clinton ahead by 2 percentage points, almost her exact margin of victory in the popular vote. But Rasmussen Reports polls are conducted by a Rasmussen spinoff called Pulse Opinion Research LLC, and state polls conducted by Rasmussen and Pulse Opinion Research over the past year or two have generally been mediocre.

So which pollsters have been most accurate in recent elections? Because some races are easier to poll than others, we created a statistic called Advanced Plus-Minus to evaluate pollster performance. It compares a poll’s accuracy to other polls of the same races and the same types of election. Advanced Plus-Minus also adjusts for a poll’s sample size and when the poll was conducted. (For a complete description, see here; we haven’t made any changes to our methodology this year.) Negative plus-minus scores are good and indicate that the pollster has had less error than other pollsters in similar types of races.

The table below contains Advanced Plus-Minus scores for the most prolific pollsters — those for whom we have at least 10 polls in our database for elections from Nov. 8, 2016 onward. These polls cover the 2016 general election along with any polling in special elections or gubernatorial elections since 2016.

How prolific pollsters have fared in recent elections Advanced Plus-Minus scores for pollsters’ surveys conducted for elections on Nov. 8, 2016, and later pollster Methodology No. of Polls Avg. Error Advanced Plus-Minus Bias Monmouth University Live 24 4.8 -1.5 D+3.9 Emerson College IVR 51 4.1 -1.0 D+1.2 Siena College Live 18 4.0 -0.9 D+1.5 Landmark Communications IVR/online 14 4.4 -0.6 D+4.3 Marist College Live 17 3.7 -0.6 D+1.5 Lucid Online 14 2.6 -0.4 D+2.4 SurveyUSA IVR/online/live 18 4.5 -0.2 D+1.0 Trafalgar Group IVR/online/live 15 4.0 -0.1 R+0.8 YouGov Online 33 4.3 +0.0 D+2.8 Opinion Savvy IVR/online 11 4.3 +0.1 D+2.8 Quinnipiac University Live 26 4.4 +0.1 D+4.2 Rasmussen Reports/Pulse Opinion Research IVR/online 55 5.1 +0.4 D+3.6 CNN/Opinion Research Corp. Live 10 4.3 +0.6 D+1.4 Gravis Marketing IVR/online 53 4.6 +0.7 D+2.5 Remington Research Group IVR/live 32 4.9 +0.8 D+2.1 Public Policy Polling IVR/online 28 5.2 +1.0 D+5.2 SurveyMonkey Online 195 7.3 +2.3 D+5.6 University of New Hampshire Live 19 8.9 +3.4 D+8.9 Google Surveys Online 12 8.4 +5.0 D+1.8 Negative plus-minus scores are good and indicate that the pollster has had less error than other pollsters in similar types of races. The “average error” is the difference between the polled result and the actual result for the margin separating the top two finishers in the race. “Bias” is a pollster’s average statistical bias toward Democratic or Republican candidates.

The best of these pollsters over this period has been Monmouth University, which has an Advanced Plus-Minus score of -1.5. That’s not a huge surprise — Monmouth was already one of our highest-rated pollsters. After that, the list is somewhat eclectic, including traditional, live-caller pollsters such as Siena College and Marist College, as well as automated pollsters such as Emerson College and Landmark Communications. Polling institutes run by colleges and universities are somewhat overrepresented among the high performers on the list and have generally become a crucial source of polling as other high-quality pollsters have fallen by the wayside.

The lowest-performing pollsters in this group are the University of New Hampshire’s Survey Center, Google Surveys and SurveyMonkey. UNH uses traditional telephone interviewing, but its polls were simply way off the mark in 2016, overestimating Democrats’ performance by an average of almost 9 percentage points in the polls it conducted of New Hampshire and Maine.

Google Surveys and SurveyMonkey are newer and more experimental online-based pollsters. Google Surveys has an unusual methodology in which it shows people a poll in lieu of an advertisement and then infers respondents’ demographics based on their web browsing habits. While national polls that used the Google Surveys platform got fairly good results both in 2012 and 2016, state polls that used this technology have generally been highly inaccurate. Some Google Surveys polls also have a highly do-it-yourself feel to them, in that members of the public can use the Google Surveys platform to create and run their own surveys. We at FiveThirtyEight are going to have to do some thinking about whether to include these types of do-it-yourself polls in our averages and forecasts.

SurveyMonkey, which sometimes partners with FiveThirtyEight on non-election-related polling projects, conducted polling in all 50 states in 2016, asking about both the presidential election and races for governor and the U.S. Senate. Unlike some other attempts to poll all 50 states, SurveyMonkey took steps to ensure that each state was weighed individually and that respondents to the poll were located within the correct state. Thus, FiveThirtyEight treated these polls as we did any other state poll. Unfortunately, the results just weren’t good, with an average error of 7.3 percentage points and an Advanced Plus-Minus score of +2.3.

It wasn’t just Google Consumer Surveys or SurveyMonkey, however — overall, online polls (with some exceptions such as YouGov and Lucid) have been fairly unreliable in recent elections. So have the increasing number of polls that use hybrid or mixed methodologies, such as those that mostly poll using automated calls (also sometimes called IVR or interactive voice response) but supplement these results using an online panel.

In the chart below, I’ve calculated Advanced Plus-Minus scores and other statistics based on the technologies the polls used. An increasing number of polling firms no longer fall cleanly into one category and instead routinely use more than one mode of data collection within the same survey or switch back and forth from one methodology to the next from poll to poll. Therefore, I’ve distinguished polls that use one methodology exclusively from those that employ mixed methods.

Online polls have been less accurate in recent elections Advanced Plus-Minus scores for pollsters’ surveys conducted for elections on Nov. 8, 2016, and later Poll type No. of Polls Average Error Adv. Plus-Minus Bias Live caller 77 4.9 +0.1 D+2.2 Live caller only 62 4.8 -0.1 D+2.5 Live caller hybrid 15 5.2 +0.7 D+1.2 IVR 35 4.6 -0.0 D+2.0 IVR only 13 4.5 -0.7 D+0.8 IVR hybrid 17 4.7 +0.4 D+2.6 Online 32 5.3 +1.1 D+3.0 Online only 15 5.4 +1.6 D+3.3 Online hybrid 17 5.1 +0.7 D+2.8 All pollsters 119 4.9 +0.3 D+2.4 Negative plus-minus scores are good and indicate that the pollster has had less error than other pollsters in similar types of races. Averages are weighted based on the square root of the number of polls that each firm conducted. Pollsters that are banned by FiveThirtyEight because we know or suspect that they faked their data are not included in the averages.

The clearest trends are that telephone polls — including both live caller and IVR polls — have outperformed online polls in recent elections and that polls using mixed or hybrid methods haven’t performed that well.

The relatively strong performance of IVR polls is surprising, considering that automated polls are not supposed to call cellphones and that more than half of U.S. households are now cellphone-only. It ought to be difficult to conduct a representative survey given that constraint.

We’ve sometimes seen the claim that IVR polls are more accurate because people are more honest about expressing support for “politically incorrect” candidates such as Trump when there isn’t another human being on the other end of the phone. This feeling of greater anonymity would presumably also apply to online polls, however, and online polls have not been very accurate lately (and they tended to underestimate Trump in 2016).

Another answer may be that the IVR polls were more lucky than good in 2016. In general, online polls tend to show more Democratic-leaning results, IVR polls tend to show more Republican-leaning results, and live-caller polls are somewhere in between. Thus, in years such as 2012 when Democratic candidates beat the polling averages, online polls tend to look good, and in years when Republicans outperform their polls, IVR polls look good. If undecided voters largely broke to Trump in 2016, polls that initially had too many Republicans in their samples would wind up performing well.

Over the long run, the highest-performing pollsters have been those that:

Exclusively use live-caller interviews, including calls placed to cellphones, and Participate in professional initiatives that encourage transparency and disclosure.

FiveThirtyEight’s pollster ratings will continue to award a modest bonus to pollsters that meet one or both of these standards and apply a modest penalty to those that don’t. Thus, the letter grades you see associated with polling firms are based on a combination of their historical accuracy and their methodological standards. Polling firms with non-standard methodologies can sometimes have individual races or even entire election cycles in which they perform quite well. But they don’t always sustain their performance over the long run.

As for online polls, we don’t want to discourage experimentation or to draw too many conclusions from just one cycle’s worth of polling. But we at FiveThirtyEight are becoming skeptical of what you might call bulk or “big data” approaches to polling using online platforms. The polling firms that get the best results tend to be those that poll no more than about six to eight states and put a lot of thought and effort into every poll. Online firms may want to do less national polling and fewer 50-state experiments and concentrate more on polling in electorally important states and congressional districts. Results in these contests will go a long way toward determining whether online polling is an adequate substitute for telephone polling.