An Evaluation of 2016 Election Polls in the U.S.

Ad Hoc Committee on 2016 Election Polling



Courtney Kennedy, Pew Research Center

Mark Blumenthal, SurveyMonkey

Scott Clement, Washington Post

JoshUA d. Clinton, Vanderbilt University

Claire Durand, University of Montreal

Charles Franklin, Marquette University

Kyley McGeeney, Pew Research Center[1]

Lee Miringoff, Marist College

Kristen Olson, University of Nebraska-Lincoln

Doug Rivers, Stanford University, YouGov

Lydia Saad, Gallup

Evans Witt, Princeton Survey Research Associates

Chris Wlezien, University of Texas at Austin





The Committee was supported by the following researchers:

Junjie Chen, Andrew Engelhardt, Arnold Lau, Marc Trussler, Luis Patricio Pena Ibarra



EXECUTIVE SUMMARY

INTRODUCTION

PERFORMANCE OF POLLS IN 2016 RELATIVE TO PRIOR ELECTIONS

EVIDENCE FOR THEORIES ABOUT WHY POLLS UNDER-ESTIMATED TRUMP'S SUPPORT

POLLING AND PROBABILISTIC FORECASTING

CONCLUSIONS

REFERENCES

APPENDIX



EXECUTIVE SUMMARY



The 2016 presidential election was a jarring event for polling in the United States. Pre-election polls fueled high-profile predictions that Hillary Clinton’s likelihood of winning the presidency was about 90 percent, with estimates ranging from 71 to over 99 percent. When Donald Trump was declared the winner of the presidency in the early hours of November 9th, it came as a shock even to his own pollsters (Jacobs and House 2016). There was (and continues to be) widespread consensus that the polls failed.



But did the polls fail? And if so why? Those are the central questions addressed in this report, which was commissioned by the American Association for Public Opinion Research (AAPOR). This report is the product of a committee convened in the Spring of 2016 with a threefold goal: evaluate the accuracy of 2016 pre-election polling for both the primaries and the general election, review variation by different survey methodologies, and identify significant differences between election surveys in 2016 and polling in prior election years. The committee is comprised of scholars of public opinion and survey methodology as well as election polling practitioners. Our main findings are as follows:



National polls were generally correct and accurate by historical standards. National polls were among the most accurate in estimating the popular vote since 1936. Collectively, they indicated that Clinton had about a 3 percentage point lead, and they were basically correct; she ultimately won the popular vote by 2 percentage points. Furthermore, the strong performance of national polls did not, as some have suggested, result from two large errors canceling (under-estimation of Trump support in heavily working class white states and over-estimation of his support in liberal-leaning states with sizable Hispanic populations).



State-level polls showed a competitive, uncertain contest… In the contest that actually mattered, the Electoral College, state-level polls showed a competitive race in which Clinton appeared to have a slim advantage. Eight states with more than a third of the electoral votes needed to win the presidency had polls showing a lead of three points or less (Trende 2016).[2] As Sean Trende noted, “The final RealClearPolitics Poll Averages in the battleground states had Clinton leading by the slimmest of margins in the Electoral College, 272-266.” The polls on average indicated that Trump was one state away from winning the election.



…but clearly under-estimated Trump’s support in the Upper Midwest. Polls showed Hillary Clinton leading, if narrowly, in Pennsylvania, Michigan and Wisconsin, which had voted Democratic for president six elections running. Those leads fed predictions that the Democratic Blue Wall would hold. Come Election Day, however, Trump edged out victories in all three.



There are a number of reasons as to why polls under-estimated support for Trump. The explanations for which we found the most evidence are:

Real change in vote preference during the final week or so of the campaign. About 13 percent of voters in Wisconsin, Florida and Pennsylvania decided on their presidential vote choice in the final week, according to the best available data. These voters broke for Trump by near 30 points in Wisconsin and by 17 points in Florida and Pennsylvania.

Adjusting for over-representation of college graduates was critical, but many polls did not do it. In 2016 there was a strong correlation between education and presidential vote in key states. Voters with higher education levels were more likely to support Clinton. Furthermore, recent studies are clear that people with more formal education are significantly more likely to participate in surveys than those with less education. Many polls – especially at the state level – did not adjust their weights to correct for the over-representation of college graduates in their surveys, and the result was over-estimation of support for Clinton.

Some Trump voters who participated in pre-election polls did not reveal themselves as Trump voters until after the election, and they outnumbered late-revealing Clinton voters. This finding could be attributable to either late deciding or misreporting (the so-called Shy Trump effect) in the pre-election polls. A number of other tests for the Shy Trump theory yielded no evidence to support it.

Change in turnout between 2012 and 2016 is also a likely culprit, but the best data sources for examining that have not yet been released. In 2016, turnout nationwide typically grew more in heavily Republican counties than in heavily Democratic counties, relative to 2012. A number of polls were adjusted to align with turnout patterns from 2012. Based on what happened in 2016, this adjustment may have over-estimated turnout among, for example, African Americans, and under-estimated turnout among rural whites. Unfortunately, the best sources for a demographic profile of 2016 voters have either not been released or not been released in full. While we think this could have contributed to some polling errors, the analysis that we were able to conduct examining the impact of likely voter modeling shows generally small and inconsistent effects.

Ballot order effects may have played a role in some state contests, but they do not go far in explaining the polling errors. State election rules led to Trump’s name appearing above Clinton’s on all ballots in several key states that Trump won narrowly (Michigan, Wisconsin and Florida). Being listed first can advantage a Presidential candidate by roughly one-third of one percentage point. Given that pollsters tend to randomize the order of candidate names across respondents rather than replicate how they are presented in the respondent’s state, this could explain a small fraction of the under-estimation of support for Trump, but ballot order represents at best only a minor reason for polling problems.

The patterns in early voting in key states were described in numerous, high-profile news stories as favorable for Clinton, particularly in Florida and North Carolina (Silver 2017). Trump won both states.

In the days leading up to November 8, several election forecasts from highly trained academics and data journalists declared that Clinton’s likelihood of winning was about 90 percent, with estimates ranging from 71 to over 99 percent (Katz 2016).

Polling data from the Upper Midwest showed Clinton leading, if narrowly, in Pennsylvania, Michigan and Wisconsin – states that had voted Democratic for president six elections running. This third deeply flawed set of data helped confirm the assumption that these states were Clinton’s Blue Wall (e.g., Goldmacher 2016; Donovan 2016). On Election Day, Trump eked out victories in all three. More than 13.8 million voted for president in those states, and Trump’s combined margin of victory was 77,744 votes (0.56%) (Wasserman 2017).

Was the accuracy of polling in 2016 noticeably different from past elections?

How well did the polls measure vote preference in the 2016 general election?

How well did the polls measure vote preference in the 2016 primaries and caucuses?

Did the accuracy of polls clearly vary by how they were designed?

Did polls, in general, under-estimate support for Trump and, if so, why?

National general election polls were among the most accurate in estimating the popular vote margin in U.S. elections since 1936, with an average absolute error of 2.2 percentage points and average signed error of less than one percentage point (1.3). They correctly projected that Clinton would win the national popular vote by a small but perceptible margin.

State-level general election poll errors were much larger, with an average absolute error of 5.1 points and an average signed error of 3.0 points. This was exacerbated by state polls, which indicated the wrong winner in Pennsylvania, Michigan and Wisconsin – states that collectively were enough to tip the outcome of the Electoral College.

In 2016, national and state-level polls, on average, tended to under-estimate support for Trump, the Republican nominee. In 2000 and 2012, however, general election polls clearly tended to under-estimate support for the Democratic presidential candidates. So, while it is common for polling errors to be somewhat correlated in any given election, there is no consistent partisan bias in the direction of poll errors from election to election.

The 2016 presidential primary polls generally performed on par with past elections. The 2016 pre-election estimates in the Republican and Democratic primaries were not perfect, but the misses were fairly normal in scope and magnitude. The vast majority of primary polls predicted the right winner, with the predictions widely off the mark in only a few states.

Table 1. Performance of Presidential Primary Polls by Year 2000 2004 2008 2012 2016 % Polls Predicting Winner 99% 100% 79% 64% 86% Average Absolute Error 7.7 7.0 7.6 8.3 9.3 Number of Polls 172 129 555 195 457

Survey mode had little effect. The differences in the absolute error of surveys employing different interviewing modes were not statistically significant. While polls using IVR and online methods are associated with slightly larger average absolute errors than polls with live interviewers, all else equal, the differences are small (0.21 and 0.08 larger, respectively, than a telephone poll) and not statistically distinguishable from zero.

Caucuses were problematic. Caucuses were associated with much bigger poll errors than primaries. The average absolute error was nearly 10 points greater in caucuses – a statistically significant difference.

Type of primary did not matter. There was no statistically significant difference in the accuracy of polls conducted in open vs. closed primaries.

Size of the electorate (population) mattered. Larger contests were associated with fewer polling errors. For every 1% increase in the size of the electorate the average absolute error decreased by 2.5%, all else being equal.

Certain states were harder to poll in than others. After holding all other factors constant, polls in Utah, South Carolina, Oregon, Michigan and Kansas were still off by a significantly greater margin than polls in other states. While it is impossible to diagnose the exact reasons for these systematic errors, controlling for them in the analysis is important because it removes the impact of these state-specific errors from the estimated effects of other factors.

The winner’s lead mattered. There was important variation in average poll performance depending on whether the election was a blowout or not. Errors tended to be larger in uncompetitive contests.

FiveThirtyEight 538 Polling Average (simple weighted average of the polls) 538 Polls Only (primarily based on polls, with limited adjustments) 538 Polls Plus (combines polls with an economic index; makes certain adjustments for historical election patterns) Huffington Post Pollster (poll-based time series model) RealClearPolitics (simple unweighted average of polls)

Table 2. Time of Decision and Presidential Vote in Key States Won by Trump % Voters who decided in final week Vote choice among voters deciding in final week Vote choice among voters deciding earlier Estimated Trump gain from late deciders Election (%Trump- %Clinton) Trump Clinton Trump Clinton Florida 11% 55% 38% 48% 49% 2.0% 1.2% Michigan 13% 50% 39% 48% 48% 1.4% 0.2% Pennsylvania 15% 54% 37% 50% 48% 2.3% 1.2% Wisconsin 14% 59% 30% 47% 49% 4.3% 0.8% National 13% 45% 42% 46% 49% 0.8% -2.1% Note – Analysis from Aaron Blake (2016) using NEP exit poll data.

Less compelling evidence points to other factors that may have contributed to under-estimating Trump’s support:. In 2016 national and state-level polls tended to under-estimate support for Trump, the Republican nominee. In 2000 and 2012, however, general election polls clearly tended to under-estimate support for the Democratic presidential candidates. The trend lines for both national polls and state-level polls show that – for any given election – whether the polls tend to miss in the Republican direction or the Democratic direction is tantamount to a coin flip.... However well-intentioned these predictions may have been, they helped crystalize the belief that Clinton was a shoo-in for president, with unknown consequences for turnout. While a similar criticism can be leveled against polls – i.e., they can indicate an election is uncompetitive, perhaps reducing some people’s motivation to vote – polls and forecasting models are not one and the same. As the late pollster Andrew Kohut once noted (2006), “I’m not a handicapper, I’m a measurer. There’s a difference.” Pollsters and astute poll reporters are often careful to describe their findings as a snapshot in time, measuring public opinion when they are fielded (e.g., Agiesta 2016; Easley 2016a; Forsberg 2016; Jacobson 2016; McCormick 2016; Narea 2016; Shashkevich 2016; Zukin 2015). Forecasting models do something different – they attempt to predict a future event. As the 2016 election proved, that can be a fraught exercise, and the net benefit to the country is unclear.The polling in the Republican and Democratic primaries was not perfect, but the misses were fairly normal in scope and magnitude. When polls did badly miss the mark, it tended to be in contests where Clinton or Trump finished runner-up. Errors were smaller when they finished first. This suggests that primary polls had a difficult time identifying wins by candidates other than the frontrunner.The performance of election polls is not a good indicator of the quality of surveys in general for several reasons. Election polls are unique among surveys in that they not only have to field a representative sample of the public but they also have to correctly identify likely voters. The second task presents substantial challenges that most other surveys simply do not confront. A typical non-election poll has the luxury of being able to be adjusted to very accurate benchmarks for the demographic profile of the U.S. population. Election polls, by contrast, require educated estimates about the profile of the voting electorate. It is, therefore, a mistake to observe errors in an election such as 2016 that featured late movement and a somewhat unusual turnout pattern, and conclude that all polls are broken. Well-designed and rigorously executed surveys are still able to produce valuable, accurate information about the attitudes and experiences of the U.S. public.As this report documents, the national polls in 2016 were quite accurate, while polls in key battleground states showed some large, problematic errors. It is a persistent frustration within polling and the larger survey research community that the profession is judged based on how these often under-budgeted state polls perform relative to the election outcome. The industry cannot realistically change how it is judged, but it can make an improvement to the polling landscape, at least in theory. AAPOR does not have the resources to finance a series of high quality state-level polls in presidential elections, but it might consider attempting to organize financing for such an effort. Errors in state polls like those observed in 2016 are not uncommon. With shrinking budgets at news outlets to finance polling, there is no reason to believe that this problem is going to fix itself. Collectively, well-resourced survey organizations might have enough common interest in financing some high quality state-level polls so as to reduce the likelihood of another black eye for the profession.Donald Trump’s victory in the 2016 presidential election came as a shock to pollsters, political analysts, reporters and pundits, including those inside Trump’s own campaign (Jacobs and House 2016). In the vast majority of U.S. presidential elections (95%), the winner of the national popular vote had also been the Electoral College winner (Gore 2016). That was not the case in 2016, when a divided result cast a critical spotlight on the polls’ performance.The national polls in 2016 indicated that Hillary Clinton would win the popular vote by about 3.2 percentage points. Taken together, those national polls were essentially accurate; the Democratic nominee ultimately won the popular vote by 2.1 points. In most presidential election years, one could reasonably conclude from this information alone that Trump’s winning the presidency was unlikely and that the polls accurately measured Americans’ vote preferences.In the 51 contests that decided the presidency, Trump won 306 electoral votes and Clinton 232.[3] Looking at the polls at the state level, most states seemed firmly in the Republican camp or in the Democratic one. Pundits cited up to 13 battleground states where the campaigns suggested the race could be close. The polls in that group of states showed competitive races with Clinton apparently holding a consistent advantage in most. But the advantage was slim: eight states with a combined 107 electoral votes had average poll margins (%Trump-%Clinton) of three points or less (Trende 2016).On the eve of the election, however, three types of information widely discussed in the news media pointed to a Clinton victory. All three turned out to be either misleading or wrong.The day after the election, there was a palpable mix of surprise and outrage directed towards the polling community, as many felt that the industry had seriously misled the country about who would win (e.g., Byers 2016; Cillizza 2016; Easley 2016b; Shepard 2016).The 2016 U.S. presidential election poses questions that political scientists, sociologists and survey researchers will be studying for years, if not decades. This report, commissioned by the American Association for Public Opinion Research (AAPOR), seeks to address only the performance of polls in 2016. Readers looking for an explanation of why Donald Trump won should consult other sources. The reasons Trump won and the reasons the polls missed may be partially overlapping, but this report only attempts to address the latter.This report is the product of a committee convened well before the election, in the spring of 2016, with the goal of summarizing the accuracy of 2016 pre-election polling (for both primaries and the general election), reviewing variation by different poll methodologies, and identifying differences from prior election years. That was an ambitious task before November 8. In the early morning hours of November 9, the task became substantially more complex and larger in scope as the committee felt obligated to also investigate why polls, particularly in the Upper Midwest, failed to adequately measure support for Trump.The committee is composed of scholars of public opinion and survey methodology as well as election polling practitioners. While a number of members were active pollsters during the election, a good share of the academic members were not. This mix was designed to staff the committee both with professionals having access to large volumes of poll data they knew inside and out, and with independent scholars bringing perspectives free from apparent conflicts of interest. The report addresses the following questions:Many different types of data were brought to bear on these issues. This information includes poll-level datasets in the public domain that summarize the difference between the poll estimates and the election outcomes and provide a few pieces of design information (pollster name, field dates, sample size, target population, and mode). For 2016 polls conducted close to Election Day, the committee supplemented those datasets with information about weighting, sample source (e.g., random digit dial [RDD] versus voter registration-based sample [RBS] for telephone surveys) and the share of interviews conducted with landlines versus cell phones, where applicable. Adding these design variables was done manually through searches of individual press releases, news stories, methodology reports and pollster websites. In many cases, design information about a poll was missing or unclear, in which case the committee contacted individual pollsters to obtain the information.In all, the committee reached out to 46 different polling organizations. Half (23) responded to our requests. Those who did respond were generous with their time and information. Not surprisingly, none of the organizations that did not respond are members of AAPOR’s Transparency Initiative and most do not have staff who are active in AAPOR.Generally, noncooperation with the committee’s requests did not have a noticeable impact on work, with one notable exception. Surveys conducted using interactive voice response (IVR), sometimes called robopolls, were scarce at the national level (just three pollsters used IVR), but they represented a large share of polling at the state level, particularly in Wisconsin and Michigan. Those IVR pollsters did not respond to our requests for microdata. Thus, the committee was unable to analyze that microdata along with data from firms using other methods, which could have been informative about polling errors in those states.Given the large number of pollsters active during the election, the volume of polling and the reality that all pollsters structure their microdatasets differently, the committee was selective in asking pollsters for microdata. Since provision of microdata is not required by the AAPOR Transparency Initiative, we are particularly grateful to ABC News, CNN, Michigan State University, Monmouth University, and University of Southern California/Los Angeles Times for joining in the scientific spirit of this investigation and providing microdata. We also thank the employers of committee members (Pew Research Center, Marquette University, SurveyMonkey, The Washington Post, and YouGov) for demonstrating this same commitment.[4]In the sections of the report that follow, the sets of polls analyzed may differ by section (e.g., national versus state-level; final two weeks versus full campaign). While this may be distracting, each section features what, in our judgment, was the best data available to answer each specific research question. At the top of each section, we describe the data and provide the rationale for our choices.There are several different metrics quantifying error in election poll estimates, but this report focuses on two simple measures that are easily compared to past elections. The first error measure is absolute error on the projected vote margin (or “absolute error”), which is computed as the absolute value of the margin (%Clinton-%Trump) in the poll minus the same margin (%Clinton-%Trump) in the certified vote. For example, if a poll showed Clinton leading Trump by 1 point and she won by 3 points, the absolute error would be ABS(1 – 3) = 2. This statistic is always positive, providing a sense of how much polls differed from the final vote margin but not indicating whether they missed more toward one candidate or another.The other key metric is the signed error on the projected vote margin (or “signed error”), which is computed in the exact same manner as the absolute error but without taking the absolute value. This statistic can be positive or negative, with positive values indicating over-estimation of Clinton’s support and negative values indicating over-estimation of Trump’s support. In the example above, if Clinton led by 1 point in a poll and won by 3 points, the signed error would be -2 points. When averaging absolute error and signed error across multiple polls, the signed error is always lower than (or equal to) the absolute error since positive and negative values are averaged together. Neither measure should be confused with whether polls were within the margin of sampling error, a statistic that applies to individual candidate support estimates but not the vote margin.Since Election Day, dozens of theories have been put forward by politicians, pundits, pollsters and many others as to why the polls missed in 2016. Many such theories have fallen by the wayside since the final official vote totals were tallied, showing Clinton with a narrow lead. In the end, the final vote came close to what the national polls found, at least in aggregate.As we discuss later, many polls did a reasonable job at the national level in the general election and at the state level in the presidential primaries. But many did not. Much of our analytical focus is on assessing errors in the general election polls, but some of the possible sources of the errors also apply potentially to polls in the primaries.Here is a summary of the major types of potential errors that we investigate in this report.Both Trump and Clinton had historically poor favorability ratings. One possibility is that these negative evaluations made it difficult for some voters to decide whether to vote and, then, difficult to decide for whom to vote. Unhappy with their options, many voters may have waited until the final week or so before deciding, a set of last-minute changes that polls completed a week out from the election would not have detected. Perhaps this included those who broke late for Trump, as well as potential Clinton voters who decided not to vote because they concluded she was going to win.During the primaries and the general election, political observers speculated that voters who were supporting Trump were less likely to admit this stance to pollsters than those supporting Clinton. Trump’s controversial statements could have made it uncomfortable for some respondents to disclose their support for him to an interviewer. Thus, Trump voters would be less likely to express their true intentions.Response rates in telephone polls with live interviewers continue to decline, and response rates are even lower for other methodologies. Thus, there is a substantial potential that nonresponse bias could have kept a given poll from accurately matching the election results.Generally, decisions about responding to a poll are not strongly related to partisanship (Pew Research Center 2012). Studies have also shown, however, that adults with lower educational levels (Battaglia, Frankel and Link 2008; Chang and Krosnick 2009; Link et al. 2008) and anti-government views (U.S. Census Bureau 2015) are less likely to take part in surveys. Given the anti-elite themes of the Trump campaign, Trump voters may have been more likely than other voters to refuse survey requests.Many pollsters adjust their raw results to population benchmarks because of variations in how willing various subgroups in the population are to participate in polls. Younger people are quite difficult to find and interview, as are those with lower levels of education. Adjusting or weighting the raw data to take into account these differences is often required. But some pollsters did not weight their data by education in 2016.Another possible source of error is in the different likely voter models or screens used by pollsters. If these models do not accurately reflect who votes, it is unlikely that the poll results will match the election results. Generally, a poll result based on likely voters tends to be more Republican than the same result based on all registered voters (e.g., Perry 1973; Pew Research Center 2009; Silver 2014). But that was not always the case in 2016, suggesting likely voter models may not have been working correctly.Likewise, Nate Cohn (2016b) and others have argued that the voting electorate was never as diverse or educated as shown in exit poll data. Current Population Survey data and voter file analysis show a whiter, less-educated electorate than the exit polls. Thus, polls weighting to past exit poll parameters may have missed the mark in 2016.Political methodologists have documented a small but non-trivial bias in favor of candidates listed first on election ballots (e.g., Ho and Imai 2008; Miller and Krosnick 1998; Pasek et al. 2014). This bias is a version of a primacy effect, which is the tendency for people to select options presented near the top of a list when the list is presented visually, as on a ballot. To cancel out this effect, pollsters typically rotate the order of the candidate names presented to respondents. Most state boards of elections, however, do not rotate the order of candidate names, but list the presidential candidates in the same order in every county and every precinct. In states like Michigan, Wisconsin and Florida where Trump was listed first on the ballot state-wide, this order effect could have slightly boosted his support in the election relative to the polls (BBC News 2017; Pasek 2016; Gelman 2017).In the aftermath of the general election, many declared 2016 a historically bad year for polling. A comprehensive, dispassionate analysis shows that while that was true of some state-level polling, it was not true of national polls nor was it true of primary season polls. Key findings with respect to the performance of polls in 2016 relative to prior elections are as follows:National presidential polls in the 2016 general election were highly accurate by historical standards, resulting in small errors and correctly indicating Clinton had a national popular vote lead close to her 2.1 percentage-point margin in the certified vote tallies. In terms of the average of absolute value differences between each poll’s Clinton-Trump margin and the certified national popular vote margin, the final national 2016 polls’ average error was 2.2 percentage points off the actual vote margin. As shown in Figure 1, the 2016 national polls tended to be more accurate than 2012 national polls (2.9 points average absolute error) and roughly similar to polling in 2008 (1.8 points) and 2004 (2.1 points). The level of error in 2016 was less than half the average error in national polls since the advent of modern polling 1936 (4.4 points), and also lower than the average in elections since 1992 (2.7 points).Note – The 2016 figures are based on polls completed within 13 days of the election. Figures for prior years are from the National Council for Public Polls analysis of final poll estimates, some occurring before the 13-day period. Figures for 1936 to 1960 are based only on Gallup.Examination of the average signed error in 2016 (1.3 percentage points) confirms that national polls in 2016 tended to under-estimate Trump’s support more than Clinton’s. The size and direction of error contrasts with 2012, when polls under-estimated Barack Obama’s margin against Republican nominee Mitt Romney by 2.4 points. The average signed error in 2016 national polls was far lower than the typical level of signed error in either party’s direction in presidential elections since 1936 (3.8 points), and is also lower than the 2.0-point average signed error in polls since 1992.In recent elections, national polls have not consistently favored Republican or Democratic candidates. In 2016, national and state-level polls tended to under-estimate support for Trump, the Republican nominee. In 2000 and 2012, however, general election polls clearly tended to under-estimate support for the Democratic presidential candidates. Elections from 1936 to 1980 tended to show larger systematic errors and variation from election to election, in part, due to the small number of national polling firms.Several media outlets combined national polls using varying methodologies to produce their own estimate of national support for Clinton and Trump, though none produced a more accurate estimate than the average of final national polls. RealClearPolitics estimated Clinton held a 3.2-point lead using a simple average of some final surveys, while FiveThirtyEight estimated Clinton held a 3.6-point margin in its “Polls-Only forecast” using a more complex method accounting for systematic differences between pollsters’ and historical accuracy. The Huffington Post estimated Clinton’s lead at 4.9 percentage points nationally.The trend line for state-level polls is similar to the trend line for national polls in one respect and very different in another. Unlike national polls, state-level polls in 2016 did have a historically bad year, at least within the recent history of the past four elections. Analysis of 423 state polls completed at least 13 days before the 2016 election, shows an average absolute error of 5.1 percentage points and a signed error of 3.0 percentage points in the direction of over-estimating support for Clinton. In the four prior presidential elections, the average absolute error in state polls ranged from 3.2 to 4.6.Source – Figures for 2000 to 2012 computed from data made public by FiveThirtyEight.com.The trend line for state polls is, however, similar to that for national polls in that there is no partisan bias. For a given election, whether the polls tend to miss in the Republican direction or the Democratic direction appears random. In 2016, the average signed error in state polls was 3 points, showing an over-estimation of support for the Democratic nominee. In 2000 and 2012, the average signed error in state polls was approximately 2 points, both times showing an over-estimation of support for the Republican nominee. While U.S. pollsters may be guilty of pointing to the wrong winner on occasion, as a group their work does not reveal any partisan leanings.[6]Both absolute errors and signed errors were smaller in battleground states, the 13 states that were decided by five points or fewer in the 2012 or 2016 presidential elections, than in non-battleground states (Appendix Table A.1). The average absolute error for the 207 battleground state polls was 3.6 points, compared with 6.4 points for the 206 polls in non-battleground states. The polls in non-battleground states under-estimated Trump’s vote margin against Clinton by 3.3 points on average (signed error); the under-estimation of Trump’s standing was 2.3 points in battleground states.While the absolute errors tended to be lower in the more competitive states, under-estimation of support for Trump was substantial and problematic in several consequential states. Wisconsin polls exhibited the largest average signed error (6.5 points), with polls there showing Clinton ahead by between 2 and 12 points in the final two weeks before she narrowly lost the state (47.2 percent to 46.5 percent). Ohio polls also under-estimated Trump’s margin by a substantial 5.2 points on average, indicating he had a small lead, though he went on to win the state by eight points. Polls in Minnesota, Pennsylvania and North Carolina each under-estimated Trump’s margin against Clinton by an average of four to five percentage points, while polls in Michigan and New Hampshire under-estimated his standing by 3.5 percentage points on average. Under-estimation of support for Trump was smaller in Florida, Arizona and Georgia, while polls in Colorado and Nevada tended to over-estimate his support, and polls in Virginia exhibited little error.The 2016 presidential primary polls generally performed on par relative to past elections. The vast majority of primary polls predicted the right winner, with the predictions widely off the mark in only a few states. In short, the primary polls held their own in 2016. They improved in some important ways over previous years while retaining some weaknesses that the polling industry needs to note.The committee based its analysis on all publicly released state-level candidate preference polls conducted in the final two weeks before each state’s Republican and Democratic primaries. This totaled 457 state primary polls, including 212 polls in the Republican primaries and 245 polls in the Democratic primaries. Overall, there was at least one poll conducted in the last two weeks before the primary election in 78 of the contests. Additionally, the committee looked at the accuracy of the polling aggregator predictions made by three organizations: FiveThirtyEight, Huffington Post and RealClearPolitics.Examining the polling averages in each state, the polls correctly pointed to the winner in 86% of the 78 primaries. This included correct predictions in 83% of the Democratic contests and 88% of the Republican contests. The misses were in three Republican primaries (Idaho, Kansas, Oklahoma) and in six Democratic primaries (Indiana, Kansas, Michigan, Oklahoma, Oregon, Rhode Island).The average absolute error across all 457 state primary polls reviewed was 9.3 points,[7] not dramatically different from the performance of primary polls in other recent elections. While the average absolute error was higher in 2016 than in the four prior elections, a higher percentage of primary polls predicted the winning candidate in 2016 than was the case in 2008 and 2012. Analysis of the distribution of the size of primary poll errors in these recent elections (Appendix Figure A.1) shows a fairly stable pattern, with errors in 2016 polls looking similar to those in polls from other years.A hallmark of the current election polling era is the tremendous variation in how polls are designed and conducted. Design variation is highly relevant to an examination of poll performance because survey researchers have long recognized that some approaches for constructing election polls are more accurate than others (Mosteller et al. 1949).Many pollsters continue to use live telephone interviewing with random digit dial (RDD) samples of all the landlines and cell phones in the U.S. An even larger group conducts their surveys online, typically using opt-in samples of internet users. A third common approach is interactive voice response (IVR) either alone or in combination with an online opt-in sample. That combination is popular because IVR is only legal when dialing landline numbers, and so pollsters pair that with an opt-in internet sample in order to reach individuals who do not have a landline.Nearly all IVR samples and an increasing number of live telephone samples are being drawn not from the RDD frames of all telephone numbers but instead from state-based voter registration files (“registration-based sampling,” or RBS). While campaign pollsters have been using RBS for some time, the widespread use of RBS is a fairly recent development in public polls (Cohn 2014).We examined two main design features for their effects on accuracy: mode of administration (e.g., live phone, internet or IVR) and sample source (e.g., RDD, RBS or opt-in internet users). We coded these variables for all national pre-election surveys and battleground state surveys conducted in the final 13 days of the general election. The data are summarized in Figure 3. While this typology does not encompass every final poll in 2016,[8] over 95 percent of the final two week polls conducted fall into one of these categories. Most IVR samples were selected using RBS, but in some cases the sample source was ambiguous. This is why the figures in this section do not attempt to make that distinction.Notes – The Franklin Pierce and Data Orbital polls, which were conducted by live telephone and had ambiguous statements about sample source that suggested RDD (but were not totally clear), are coded as live phone(RDD).Several differences between national and battleground state polls are worth mentioning. In terms of mode, national polls were twice as likely to be conducted by live telephone as battleground state polls (36% versus 18%, respectively). Battleground state polls were about twice as likely to be conducted using some form of IVR as national polls (40% versus 18%, respectively). The share of polls conducted using the internet was basically the same for national and state-level polling.Figure 4 gets to the central question of whether polls with certain types of designs were more accurate than others. Samples sizes for this analysis are small, and the effects from mode and sample source are to some extent confounded with house effects, such as differences in the likely voter model used. Still, IVR polls tended to exhibit somewhat less error in the 2016 general election than live telephone or internet polls. Battleground state polls that just used IVR had an average absolute error of 2.7 percentage points. By contrast, battleground state polls conducted using RDD with live phone and online opt-in had average errors of 3.8 and 3.9 points, respectively. Among national polls, none was conducted using just IVR. The national polls conducted by IVR and supplemented with an online sample had an average absolute error of 1.2 points, as compared with 1.6 for live telephone and 1.5 for online opt-in polls.Notes – Figures based on polls conducted during the final 13 days. Samples sizes for this analysis are small, and the effects from mode and sample source are to some extent confounded with house effects. National poll averages are based on 7 polls (IVR+internet), 14 polls (live phone RDD) and 15 polls (internet opt-in ). Battleground state poll averages are based on 30 polls (IVR), 25 (live phone RBS), 34 polls (IVR+internet), 20 polls (IVR+live phone), 25 polls (live phone RDD) and 78 polls (internet opt-in).In one respect, the fact that IVR-only polls did relatively well is surprising because federal laws dictate that IVR can only be used with landline numbers and about half of adults do not have landlines (Blumberg and Luke 2016). This half of the population would not have any chance of selection in an IVR sample assuming that cell phone numbers were flagged and purged before the IVR dialing began. Such substantial noncoverage usually increases the risk of bias.[9]On the other hand, adults who have dropped their landline in favor of a cell phone or never had a landline to begin with tend to be younger and more racially and ethnically diverse than adults accessible by landline. These cell-only adults are more likely to be Democratic. In the 2016 election, in which turnout among African Americans and younger voters was not particularly high, under-coverage of cell phone-only voters appears not to have been a major problem and may help explain why IVR-only polls performed relatively well. In fact, when IVR polls were supplemented with an online component to capture cell phone-only voters, they did slightly worse.Analysis of national polling errors by mode in recent elections (Figure 5) shows that IVR polls did not do particularly well in 2008 and were only nominally better in 2012 – elections in which Democratic turnout was relatively high. In fact, internet polls fared the best in both 2008 and 2012, with live phone polls in the middle. This indicates that the IVR results in 2016 are likely an election-specific phenomenon related to the particular turnout patterns that year.Note – In 2016 there were no national polls conducted using only IVR.While the bivariate analysis presented in Figure 4 provides a high-level look at how accuracy varied by mode and sample source, it has a number of limitations. The number of polls and pollsters using each design during the final 13 days is modest at best and assignment to a given feature is not at all random. For example, polls using IVR were more likely than other types of polls to be conducted by partisan pollsters, especially Republican-affiliated pollsters. This raises the possibility that the relatively good performance on IVR polls in 2016 may have been due in part to some Republican pollsters making turnout assumptions slightly more favorable to Republicans. How much of the accuracy should actually be attributable to the IVR methodology per se is unclear.The varying difficulty in predicting battleground state outcomes and the fact that some polls were fielded closer to Election Day than others can affect bivariate comparisons of accuracy. To better isolate the impact of methodological features in the polls, two ordinary least squares regression analyses examined the association of absolute error with mode and sample source, controlling for the geography in which the poll was conducted and the number of days between the election and the middle date of its field period. The results are reported in Appendix Table A.2.The first regression model testing the association between mode and accuracy found that after taking geography and the number of days from the election into account, usage of IVR methods alone were associated with roughly 1-point lower absolute error than live-interviewer surveys, while internet, IVR/cell and IVR/internet polls did not have significantly larger or smaller errors than those conducted with live telephone interviewers. Usage of other, less common modes was associated with greater errors.The second model focusing on sample source found no significant association of different sample sources with absolute error in vote margin estimates compared with RDD samples, when state and timing are taken into account. Both regression analysis confirmed battleground state polls exhibited greater errors than national polls, particularly in Wisconsin, New Hampshire, North Carolina, Minnesota, Ohio, Pennsylvania and Michigan.In sum, the regression analysis confirms the bivariate result that polls using IVR tended to have less error in the 2016 general election. It also indicates that mode was a more important predictor of error than sample source. This also suggests that the fact that IVR polls nearly always used voter file sample is perhaps not the sole or even the dominant reason for their relatively good performance. Taken together with the bivariate results, including those from past elections, it appears the accuracy of IVR polls may be a 2016-specific phenomenon: live telephone and internet polls did better in the recent past and may surpass IVR once again in a future election with different turnout patterns.Turning to the primary polls, regression analysis was also used to evaluate the effects of different design features on the accuracy of these polls. The model, presented in full in Appendix A.B, yielded the following main findings:[10]Another major player in the polling scene during the primaries were the aggregators. The committee examined a total of five different estimation methods produced by three polling aggregators:There were no significant differences among the aggregators in their prediction accuracy in primary elections. Since RealClearPolitics is using a simple unweighted average of the polls, and there was no statistically significant difference in accuracy between this method and the others, this means that additional modeling did not greatly increase the accuracy of the predictions.The average signed error in the margin across all 230 aggregator predictions was -4.7, indicating that the predictions under-estimated the margin by 4.7 percentage points. The absolute error was much greater. The average error across all of the aggregators was 8.3, indicating that the average difference between the margin calculated by the aggregators and the actual margin for the winner was 8.3 percentage points. Although there appear to be significant differences across the aggregators in the absolute error overall, this is explained by different aggregators making predictions in different races. When only the same set of states are examined, there are no significant differences across the aggregators in the average absolute error (an average of 7.3 percentage points). There was no significant difference in the signed or absolute error between the Democratic and Republican contests for the aggregators, either overall or for the more commonly-polled contests.One of the central hypotheses about why polls tended to under-estimate support for Trump is late deciding. Substantial shares of voters disliked both major party candidates (Collins 2016; Yourish 2016) and may have waited until the final days before deciding. If voters who told pollsters in September or October that they were undecided or considering a third party candidate ultimately voted for Trump by a large margin, that would explain at least some of the discrepancy between the polls and the election outcome. There is evidence that this happened, not so much at the national level, but in key battleground states, particularly in the Upper Midwest.As reported by Blake (2016), the National Election Pool (NEP) exit poll conducted by Edison Research showed substantial movement toward Trump in the final week of the campaign – particularly in the four states Clinton lost by the smallest margins. In Michigan, Wisconsin, Pennsylvania, and Florida, 11 to 15 percent of voters said that they finally decided for whom to vote in the presidential election in the last week. According to the exit poll, these voters broke for Trump by nearly 30 points in Wisconsin, by 17 points in Pennsylvania and Florida, and by 11 points in Michigan. If late deciders had split evenly in these states, the exit poll data suggest Clinton may have won both Florida and Wisconsin, although probably not Michigan or Pennsylvania, where Trump either won or tied among those deciding before the final week. This pattern was not nearly as strong nationally.Overall, these exit poll data suggest that voter preferences moved noticeably, particularly in these four decisive states. This can be seen as good news for the polling industry. It suggests that many polls were probably fairly accurate at the time they were conducted. Clinton may very well have been tied, if not ahead, in at least three of these states (MI, WI, FL) roughly a week to two weeks out from Election Day. In that event, what was wrong with the polls was projection error (their ability to predict what would happened days or weeks later on November 8), not some fundamental problem with their ability to measure public opinion.The notion that pre-election polls fielded closer to Election Day tend to be more predictive of the election outcome than equally rigorous polls conducted farther out is not only intuitive, it has also been well documented for some time (e.g., Crespi 1988; Traugott 2001). The effect of late changes in voters’ decisions can be particularly large in elections with major campaign-related events very close to Election Day (AAPOR 2009). The 2016 general election featured a number of high profile campaign-related stories, as summarized in Table 3. Perhaps the most controversial single event was the FBI director’s announcement on October 28that the agency would review new evidence in the email probe focused on Clinton. The Clinton campaign claimed that that event was decisive in dooming her electoral chances (Chozick 2016).There were other major events that also could have changed a substantial number of voters’ minds. The Access Hollywood video tape released October 7seemed to noticeably affect the race (Bradner 2016; Salvanto 2016), but that occurred too far out from Election Day to explain the errors observed in polls conducted during the final week or two of the campaign. Other events, such as the circulation of fake news stories (e.g., see Kang 2016) and Russian interference in the election (Director of National Intelligence 2017), could have influenced voters’ decisions but seemed to emerge over time rather than at a clearly-defined point in the election.

Table 3. Major Events in the 2016 Presidential Campaign Following the Conventions

Aug. 1 Trump criticizes gold star familyAug. 10 Judicial Watch releases State Department emails related to Clinton FoundationSep. 9 Clinton makes “basket of deplorables” commentSep. 11 Clinton leaves 9/11 ceremony early due to illnessSep. 26 First presidential debateOct. 1 NYT reports Trump’s 1995 tax record suggests no federal taxes for yearsOct. 3 NY attorney general sends cease and desist letter to Trump FoundationOct. 4 Vice presidential debateOct. 7 Release of video of Trump discussing groping womenOct. 7 WikiLeaks releases emails hacked from Clinton campaign manager John PodestaOct. 9 Second presidential debate preceded by surprise Trump press conferenceOct. 12 Multiple women accuse Trump of touching them inappropriatelyOct. 19 Third presidential debateOct. 25 Announcement that Obamacare premiums will increase 25% on averageOct. 28 FBI Director James Comey announces review of new evidence in Clinton email probeNov. 6 FBI Director James Comey announces emails warrant no new action against Clinton

Nov. 8 Election day

Table 4. Comparing Individuals’ Pre- and Post-election Responses to Presidential Vote Reported Vote Pre-election vote preference Voted for Clinton Voted for Trump Voted for other candidate DK or Refused Clinton/Lean Clinton 44.2% 0.4% 1.2% 0.6% Trump/Lean Trump 0.3% 38.2% 0.3% 1.1% Other candidate 1.6% 2.6% 6.3% 0.2% DK-Refused to Lean 0.7% 1.4% 0.4% 0.6% 100% Source: Pew Research Center 2016 Election Callback Study. Based on 1,254 completed re-interviews with survey respondents who said that they voted in the general election. Estimates are unweighted.

Table 5. Pre-election Poll Responses by the Candidate Ultimately Supported Reported Vote Pre-election vote preference Voted for Clinton Voted for Trump Voted for other candidate DK or Refused Clinton/Lean Clinton 94 1 15 26 Trump/Lean Trump 1 90 4 45 Johnson/Lean Johnson 2 4 41 0 Stein/Lean Stein 1 1 25 3 Other candidate 0 1 11 3 DK-Refused to Lean 2 3 5 23 100% 100% 100% 100% Interviews (587) (533) (103) (31) Source: Pew Research Center 2016 Election Callback Study. Based on 1,254 completed re-interviews with survey respondents who said that they voted in the general election. Estimates are unweighted.

With one-time events, one might reasonably interpret a subsequent change in the horserace as an effect from that event. With ongoing, diffuse news stories, by contrast, it is not clear how one could measure the impact with polling data alone. Even under the cleanest of circumstances (i.e., a one-time event with no major competing news stories), the absence of a counterfactual makes investigations into the effect of particular campaign events a fraught exercise. In a hypothetical scenario in which the event of interest did not occur, a change in the horserace might still have been observed, just for different reasons.Still, given the volume of claims that the FBI announcement of October 28tipped the race in Trump’s favor, we felt it worthwhile to investigate whether there was support for that claim in the public polls. We examined the five national tracking polls conducted during the final three weeks of the campaign. The margins (%Clinton-%Trump) for these polls are plotted in Figure 6. Unlike other sections of this report (which focus on polling error), the goal of this particular analysis was to track voter sentiment as accurately as possible. To that end, Figure 6 presents the best available estimates for each tracking poll, which in the case of two polls meant using estimates produced with revised weights that better adjusted for sample imbalances (Cho et al. 2016; Tedeschi 2016).The trend lines of the tracking polls in the figure are not very consistent with one another. For example, the ABC News/Washington Post poll (blue line) shows Clinton’s support dropping precipitously in late October then rebounding before Election Day. The IBD/TIPP poll (yellow line) suggest a contradictory pattern, in which support for Clinton increased modestly in late October then tapered off in November. To try to detect a signal among these five somewhat unharmonious tracking polls, we computed the average margin giving each poll equal influence. It is interesting to note that this average shows the exact result of the popular vote (Clinton +2), which provides some confidence that collectively these polls were doing a reasonable job tracking voter preferences during this final stretch.The evidence for a meaningful effect on the election from the FBI letter is mixed at best. Based on Figure 6, it appears that Clinton’s support started to drop on October 24or 25th. October 28falls at roughly the midpoint (not the start) of the slide in Clinton’s support. What’s more, the lag between when interviewing was conducted and when tracking poll results are released means that the slide in Clinton’s support probably began earlier than estimates in Figure 6 suggest. For example, the ABC News/Washington Post estimate of a tied race on October 31 was based on interviews conducted October 28-31. The IBD/TIPP estimates are based on interviews conducted during the six days prior to the date shown. Factoring in this lag, it is reasonable to speculate that Clinton’s slide began as early as October 22 or 23. There were no notable campaign events on either of those days, though the announcement that Obamacare premiums will increase occurred roughly around that time (October 25).While Figure 6 indicates that Clinton’s lead was eroding before October 28, it is possible that the FBI letter news story made that erosion more severe than it otherwise would have been. Another way to analyze a possible impact of the first FBI letter is to check whether, all else equal, the trend in support changed following the release of that letter. To test this, we conducted a regression analysis using all national public polls fielded between September 1and Election Day. This analysis, which controlled for change over time and methodological characteristics of the polls, indicates that the Comey letter had an immediate, negative impact for Clinton on the order of 2 percentage points. The apparent impact did not last, as support for Clinton tended to tick up in the days just prior to the election.Based on all of the data examined here, we would conclude there is at best mixed evidence to suggest that the FBI announcement tipped the scales of the race. Pairing this analysis with the preceding one on NEP data for late deciders, it remains unclear exactly why late-deciding voters broke for Trump in the Upper Midwest. Anecdotal reporting offered a number of other suggestions (e.g., Republicans skeptical of Trump finally “coming home,” Clinton’s campaign – believing the Upper Midwest was locked up – allocating time and money elsewhere, Democrats lukewarm on Clinton deciding to stay home), but ultimately the data available do not offer a definitive answer to this question.If substantial shares of voters made up their mind about presidential vote very late in the campaign, one tool that should capture those late changes is a callback study. In a callback study the same people are interviewed before the election and after the election. Late change would manifest as discrepancies between pre- and post-election responses. It is also possible that Shy Trump responses would manifest the same way. Some poll respondents might have been inclined to censor their support for Trump before the election, but in light of his victory decide to be forthcoming about their vote for him in the post-election interview. So if poll respondents said in October that they were undecided and then said in November that they voted for Trump, the explanation could be either that they truly were undecided in October or that they intentionally misreported as undecided. For some voters, the truth may fall somewhere in between.While callback data cannot necessarily distinguish between real late change and intentional misreporting, it can help to disentangle measurement error (which includes both Shy Trump answering and late switchers) from other error sources. Specifically, if a callback study shows that some respondents did not report being a Trump supporter before the election but nonetheless said they voted for him in the re-interview, that would indicate that measurement error was at least partially to blame for the poll’s error rather than nonresponse bias (e.g., not enough Trump voters were in the study to begin with).To test this, we examined data from the Pew Research Center’s callback study. The study re-contacted registered voters in Pew’s August and October national cross-sectional dual frame RDD surveys. The re-interview was conducted by Princeton Survey Research Associates International November 10-14, 2016. Only respondents who self-reported having voted were eligible to complete the post-election re-interview (n=1,254). The crosstabulation of their pre-election and post-election responses are shown in Table 4.Cases on the left-to-right diagonal represent respondents who answered the presidential vote question the same way before and after the election. About nine-in-ten respondents (89 percent) answered consistently while 11 percent reported something different at the ballot box than what they told the pollster before the election. In the context of recent elections, that 11 percent is quite typical. Pew Research Center has been conducting callback studies since 2000. Over the past five cycles, 12 percent of respondents, on average, were inconsistent in their pre- and post-election responses (i.e., were in an off-diagonal cell). The highest level of inconsistent responding recorded by Pew’s callback studies was 18 percent in 2000, and the lowest was 7 percent in 2012.What is notable about the 2016 data is not how many inconsistent respondents there were, it is how the inconsistent responders voted. Figure 7 shows the presidential vote margin among respondents who gave inconsistent pre- versus post-election responses in each callback study since 2000. Typically, those who admit changing their minds more or less wash out, breaking about evenly between the Republican candidate and the Democratic candidate. In 2016 something very different happened. In 2016, inconsistent responders in the Pew study voted for Trump by a 16-point margin. That is more than double the second largest margin observed in this time series for inconsistent responders (+7 points for George W. Bush in 2000).Note – Data are from Pew Research Center RDD callback studies.Another way to evaluate this is with the crosstabular data in Table 5. That data shows that 10 percent of all the callback study respondents who ultimately voted for Trump said something different in the pre-election poll. The plurality of inconsistent responders who voted for Trump had described themselves in the pre-election poll as Gary Johnson supporters, about a third had described themselves as undecided or refused to answer, and the remainder had described themselves as supporting some other candidate. Clinton, by contrast, picked up only about half as many late-revealing voters as Trump in this study.

The estimates of Trump’s support should be lower in live-interviewer telephone polls than in self-administered polls (online and IVR).

There should be a relationship between estimates of support for Trump in the polls and the proportion of non-disclosers (comprising undecideds and refusals). No such relationship should exist for the other candidates.

Table 7. Trump's Over-performance of Polls Relative to Republican Senate Candidates in Battleground States Average Over-performance (Vote Margin - Poll Margin) Ave. Difference (Pres. error - Sen. error) Type of Poll Senate Rep. Candidate President Rep. Candidate Polls Live phone 1.3% 1.4% 0.0% 24 Online 4.5% 3.2% -1.3% 17 IVR, IVR+Online 2.7% 1.8% -0.9% 22 Other 7.7% 3.9% -3.8% 3

As discussed above, to describe the inconsistency as “misreporting” would not necessarily be correct because undecided or leaning to Gary Johnson may have been an accurate answer at the time of the pre-election poll. Regardless, the net effect on an election projection based on such a pre-election poll would be an error of roughly two percentage points in under-estimating support for Trump. Clinton’s estimated national popular vote lead based on the responses people in this study gave before the election was 6 percentage points, and her national lead based on those same individuals’ post-election responses was 4 points.[11]In addition, a small percentage of those screened for the post-election callback survey reported not voting (about 8 percent, n=104). Clinton led Trump 44 percent to 27 percent among those who reported not voting. Thus, nonvoting hurt Clinton slightly more than it hurt Trump among this small sample.Another widely discussed hypothesis about polling errors in 2016 is the Shy Trump effect. The Shy Trump hypothesis is a variation on what is generally called the Shy Conservative hypothesis in other countries (such as the U.K.). In most election polling misses, the conservative side has been under-estimated more often than the more progressive/liberal side (Jennings and Wlezien 2016). However, historically this has not been the case generally in the United States (see section 2.1). The Shy Trump/Conservative hypothesis has its roots in Elizabeth Noelle-Neuman’s famous Spiral of Silence hypothesis which states that “under the pressure of a hostile opinion climate (national, local, or group level) individuals are reluctant to voice their opinions on morally loaded issues” (Bodor 2012). However, research has generally failed to validate the existence of a spiral of silence, except in some very specific contexts (Bodor 2012).If Trump supporters refrained from revealing their vote more so than supporters of other candidates, they may have tended a) not to reveal any preference or b) reveal a preference considered more socially acceptable. This reaction should be more present in interviewer-administered than self-administered surveys because the former involves revealing preferences to another person. Therefore, if a Shy Trump effect did in fact contribute to polling errors there are several patterns that we would expect to observe.We examined polls to see whether interviewer-administered polls elicited lower estimates of Trump support than self-administered polls. For this analysis, we use the dataset of 208 battleground and 39 national polls conducted during the final 13 days of the campaign (section 2.4). The analysis showed that interviewer administered polls did not under-estimate Trump’s support more than self-administered IVR and online surveys, a finding that is inconsistent with the Shy Trump theory. Battleground state polls with live interviewers were actually among the least likely to under-estimate Trump’s support (average signed error of 1.6 points), higher than IVR surveys (0.9) but lower than polls using IVR + Internet administration (2.3) or internet-only administration (3.2). At the national level, live interviewer polls exhibited little systematic error under-estimating Trump’s vote margin (0.4), while under-estimation was slightly higher for Internet modes (1.1) and IVR/Internet surveys over-estimated Trump’s support slightly (-0.7 signed error). This pattern is mirrored by results from the regression analysis of mode and other factors on absolute error, which found that only one self-administered mode (IVR) was associated with lower errors than live phone interviewers.If the Shy Trump effect was real, however, there is no reason to expect that it would have been confined to polls conducted very late in the campaign. Presumably, any hesitation about disclosing support for Trump would have been just as pronounced (if not more so) in September and early October. Thus, we also tested for this mode of administration difference using published polls conducted from September 1st to Election Day. With this larger set of polling data, we were also better able to apply more sophisticated statistical tests.Figure 8 shows the national trend in voting intentions for Trump, by mode, using a local regression estimation. It illustrates that estimates produced by live telephone polls were similar to those produced by self-administered Web polls. The mode that stands out somewhat is IVR + Internet, which tended to show Trump garnering about 50 percent of the major party vote. Estimates of Trump support from the two other modes tended to be about 2.5 percentage points lower.However, these aggregate effects may be due to other features of the polls than just mode of administration, hence the necessity for refined statistical testing. To better isolate an effect from mode, we conducted a regression analysis that controls for length of field period, tracking poll versus non-tracking poll, likely voter (LV) versus registered voter (RV) estimate and change over time (Appendix A.E). The results were highly consistent with the analysis just using polls from the final 13 days. Self-administered online polls and interviewer-administered phone polls both recorded lower levels of support for Trump than IVR polls.Note – Each point represents a poll estimate positioned at the midpoint of the field period. Lines represent Loess estimates of change over time using Epanechnikov .65 estimation. © C. Durand, 2016.The finding that live telephone surveys did not consistently under-estimate Trump’s support more than self-administered online polls is informative, though not conclusive, evidence against the Shy Trump hypothesis. Live telephone polls and self-administered polls differ by too many important factors (e.g., sample source, weighting) for this type of analysis to cleanly isolate the effect from interviewer presence, even when using statistical modeling. That said, the results are inconsistent with expectations of the Shy Trump theory.One possibility is that Trump supporters were more likely than other respondents either to report being undecided or to refuse to reveal their preference. In that case, we would expect to observe a relationship between the proportion of nondisclosers and the proportion of Trump supporters in the polls and no such relationship for Clinton. However, the proportion of nondisclosers is related to the methodological characteristics of the polls. The average rate of nondisclosure was highest for online polls (8.5 percent) and lower for IVR + Internet (5.6 percent) and live phone (4.3 percent).[12]Appendix Table A.9 shows that the proportion of non-disclosers in polls is not related to the proportion of support for Trump, all else being equal. However, if we consider the estimates for all the candidates, we see that polls that had larger shares of nondisclosers showed more support for both Trump and Clinton and less support for third party candidates. The main takeaway is that there is no evidence that higher rates of undecided or refusals to answer (that is, nondisclosure) is associated with level of Trump support, thus failing to yield evidence supporting the Shy Trump hypothesis.In 2016, one polling organization, Morning Consult, conducted two experiments designed to isolate the effect of self- versus interviewer-administration (Dropp 2016) on support for Trump. While the first experiment was conducted in the run up to the primaries and the second during the general election, they used the same basic design. A group of likely voters was recruited from an online opt-in sample source and asked a set of background questions. They were then randomly assigned to complete the remainder of the interview by either proceeding with an online survey or dialing into a call center and answering questions from a live interviewer. The general election edition of the experiment yielded a mode difference in the expected direction (Clinton +5 points in the live phone condition versus +3 points in the web condition), but the result was not statistically significant. Dropp did report a statistically significant mode effect in the expected direction (more Trump support in the online condition than the live telephone condition) among well-educated and higher-income voters.More recently, Pew Research Center (2017) conducted an experiment that randomized mode of interview on the Center’s American Trends Panel, which is recruited from national landline and cell phone RDD surveys. Half of the panelists were assigned to take the survey online and the other half via a live phone interview. That study, conducted February 28-March 12, 2017, found little evidence that poll participants were censoring support for Trump when speaking to an interviewer. There was no significant difference by mode of interview on any of four questions asking directly about Trump (e.g., presidential job approval, personal favorability). Questions asking about major policy priorities of the Trump administration also showed no mode effect, except on treatment of undocumented immigrants, which showed 8 percentage points more support for the conservative position online relative to on the phone.As with the other analyses presented in this report, the experiments have their limitations. While Dropp’s results may generalize to other polls conducted with online opt-in samples, it is not clear how well they generalize to other polls with samples from a voter file or RDD samples. It is also not clear whether differential nonresponse to the latter part of the interview posed a threat to the mode comparison. It seems likely that break off was higher in the phone condition than the web condition, but how well that could have been corrected through statistical modeling is not clear. For its part, the Pew study speaks more directly to polls conducted since Trump took office, than it does to 2016 pre-election polls. As noted in the report, the timing of that study (conducted more than one month after Trump took office) and the fact that it was not focused on presidential vote means that it only indirectly speaks to the possibility of a Shy Trump phenomenon in 2016.As discussed above, polls generally under-estimated Trump’s support in Pennsylvania and Michigan – but there was one exception. Trafalgar Group, a Republican-affiliated IVR firm, was the only pollster to correctly project Trump victories in both states. In fact, in each of the six battleground states they polled, they over-estimated support for Trump. In states like Michigan, Pennsylvania and North Carolina, Trafalgar’s pro-Trump tilt yielded impressive results. But in Colorado and Florida, the over-estimation of Trump support led to larger absolute errors (3.9 and 2.8 points, respectively), albeit with numbers that projected the correct winner. While Trafalgar did forecast Trump wins in both Michigan and Pennsylvania, they were not necessarily the most accurate pollster or even the most accurate IVR poll in 2016.Two distinctive design decisions seem to explain why Trafalgar’s results were consistently more favorable to Trump. They took a novel approach for producing final vote preference estimates. According to their methods report, “the final published ballot test is a combination of survey respondents to both a standard ballot test and a ballot test gauging where respondent's neighbors stand. This addresses the underlying bias of traditional polling, wherein respondents are not wholly truthful about their position regarding highly controversial candidates.” (emphasis added) The general idea is that if people will not admit they personally would vote for Trump, they would admit that their neighbors would. As Stinson (2016) reported, the other distinctive feature of Trafalgar’s polling was that they selected their samples from voter files using a more-inclusive-than-normal approach that included registered voters who had not voted for years. Some had not voted since 2006. According to Trafalgar CEO Robert Cahaly, other pollsters tend not to sample such records.It is not clear what the relative contributions of these two factors were on the overall performance of the poll. Also, while the Trafalgar methods statement asserts that the incorporation of the neighbor vote intention question is effective because it corrected for Shy Trump-type responding (and that may have been the case), it also seems possible that in states like Michigan and Pennsylvania it was correcting some other error (e.g., over-representation of Democratic-leaning college graduates). The methods report suggests that Trafalgar, like a number of other IVR pollsters, did not measure respondent education, so this may remain something of a mystery. Regardless it is informative to observe that these two unusual methodological levers were pulled and they had the effect of overcoming the general pro-Clinton error that seemed to plague most pollsters to varying degrees in 2016. On its face, the practice of using a more inclusive voter file sample that brings in dormant voters seems like something others may want to evaluate. The other idea of integrating reports about neighbors’ vote choice with self-reported vote choice also warrants experimentation in a broad array of contests so as to better understand the properties of that measurement approach.A different way to test whether polling errors were attributable, at least in part, to misreporting is to compare Trump’s performance in state-level polls to the performance of Republican candidates for Senate in those same polls. Presumably, respondents who may have felt pressure to censor their support for Trump did not feel similar pressure to censor support for the Republican Senate candidate. If such differential censoring did occur, then we would expect to see – at the individual poll level – that Trump outperformed his poll number by a larger margin than the Republican Senate candidate did. Such a result, while not definitive, would suggest that part of the error in the presidential race estimates was attributable to misreporting.To examine this, we used battleground state polls conducted entirely within the final two weeks of the election. To be included, each poll needed to measure both Senate and presidential vote preference. There were 34 Senate contests in 2016, eight of which were held in states where the presidential vote margin was less than five percentage points (AZ, CO, FL, NH, NV, NC, PA, WI). We examined the final polls for these eight states and, for each state, used only the last poll conducted by each firm. This yielded an analytic dataset with 66 polls, 24 of which were conducted entirely with live telephone interviewing.Here we defined “over-performance” as the signed difference between the final vote margin and the poll margin, where the margin is the Republican vote minus the Democratic vote.[13] Table 6 provides an illustration of how these computations were done using one state, Wisconsin. The final Marquette Law School Poll had the Senate margin at -1 (44% for Johnson, the Republican and 45% for Feingold, the Democrat) and the presidential margin at -6 (38% for Trump and 44% for Clinton). The actual election in Wisconsin went +3.4 for Johnson and +0.7% for Trump. In this analysis, Johnson over-performed the Marquette poll by 3.4 - (-1) = +4.4 points, and Trump over-performed by 0.7 – (-6) = +6.7 points. Comparatively speaking, Trump over-performed the poll by 6.7 – 4.4 = 2.3 points more than the Republican Senate candidate did.[14] This difference in differences, the dependent variable in the analysis, is shown in the far right column of Table 6.Most Senate races featured just one or two late live telephone polls. Rather than attempting this analysis separately at the state level (where data are too sparse), we use a combined dataset with results for the 66 polls from eight states. A number of findings in the summary statistics (Table 7) merit discussion. The central question is whether Trump tended to out-perform his poll numbers more than a Republican Senate candidate in the same poll, particularly for live telephone polls. As shown in the first row of Table 7, we find no support for that idea. In the 24 live telephone polls analyzed, Trump beat his poll estimate by 1.4 percentage point on average, and the Republican Senate candidate beat his or her poll estimate by a nearly identical 1.3 percentage points on average. An independent, very similar analysis by Harry Enten (2016) reached the same general conclusion.

Table 8. Trump Margin by Region Liberal and Relatively Hispanic States Other States Competitive, White Working Class States Actual vote margin (T-C) -23% 5% 2% Pew Research Center Poll margin (T-C) -30% 1% 2% Difference from vote -7% -4% 0% N 489 1,284 347 CNN/ORC Poll margin (T-C) -23% 5% -13% Difference from vote 0% 0% -15% N 181 475 123 ABC News/Washington Post Poll margin (T-C) -21% 4% -2% Difference from vote 2% -1% -4% N 761 1,957 493 SurveyMonkey Poll margin (T-C) -23% 0% -1% Difference from vote 0% -5% -3% N 10,150 51,648 12,388 Sources: ABC News/Washington Post RDD tracking poll interviews from November 1-7, 2016, Pew Research Center RDD survey fielded October 20-25, 2016, CNN/ORC RDD survey fielded October 20-23, 2016, SurveyMonkey interviews fielded November 1-7, 2016. Note: Some differences do not sum due to rounding. States coded as "Liberal and Relatively Hispanic" were CA, NY, NV, IL and WA. States coded as "Competitive White Working Class" were PA, MI, MN, OH and WI.

Table 9. Estimates of the Share of U.S. Adults Living in Staunchly Pro-Trump Counties Share of the U.S. Population Living in Those Areas CNN/ORC poll Pew Research poll Three definitions of staunchly pro-Trump areas Number of counties Census Benchmark Weighted estimate Unweighted estimate Weighted estimate Unweighted estimate Counties Trump won by 40+ points 1,486 13% 16% 16% 13% 13% Counties Trump won by 60+ points 524 3% 4% 4% 3% 3% Rural counties (< 50 people/mi2) 1,657 9% 12% 12% 9% 10% Note - The Census figures are based on people of all ages. Source: Census figures are 2015 population estimates. CNN/ORC estimates based on 1,017 interviews conducted October 20-23, 2016. Pew data are based on a cumulated file with all 15,812 interviews conducted in routine dual frame RDD surveys in 2016. The CNN/ORC and Pew figures are based on people age 18 or older.

Table 10. Share of Pollsters That Adjusted on Education in Weighting Type of Poll Share of polls that weighted for education Number of final polls Michigan polls 18% 11 Wisconsin polls 27% 11 North Carolina polls 29% 14 Florida polls 31% 16 Pennsylvania polls 33% 18 Ohio polls 36% 11 National polls 52% 21 Note - Figures reflect only polls fielded in the final two weeks and only a given pollster's final poll. The requisite weighting information was missing for 23 polls, which were all imputed as not weighting on education, based on information among similar polls that did disclose their weighting variables.

Also, as election observers will recall, not only did Trump out-perform poll estimates, so did most Republican candidates in competitive Senate races. This pattern is evidenced by the fact that all of the values in the first column are positive. This finding is suggestive of systematic under-estimation not just of support for Trump but of Republican candidates more generally. Indeed, Republican candidates for the U.S House of Representatives also tended to outperform their poll numbers. Nationally, the actual congressional vote was +1.1 for Republicans, whereas the final polling average from RealClearPolitics was estimated at +0.6 for Democrats. The fact that polls tended to under-estimate support for Republican candidates writ large in 2016 – not just support for Trump – undermines the notion that polling errors were caused by socially desirable reporting.Another indirect test for socially desirable reporting is to look at whether responses to the vote preference question varied by potentially discernable interviewer characteristics, such as gender and race. For example, if poll respondents interviewed by white males were significantly more likely to report intending to vote for Trump than those with female and/or non-white interviewers, that would suggest misreporting was a problem. It is possible that some respondents who knew they were Trump voters were reluctant to say so, even to white male interviewers, so this is an imperfect test.Two microdatasets made available to the committee contained variables for interviewer race and gender, the ABC News/Washington Post poll and Pew Research Center’s October poll. While simple bivariate analysis seems to suggest some effect from interviewer characteristics (the margin was +2 Clinton among interviews completed by non-white interviews versus -1 Clinton among interviews by white interviewers in the ABC News/Washington Post poll), no meaningful effects were detected. Because interviewers are not randomly assigned to respondents, statistical models are required to estimate the effects of interviewer race and sex on respondent vote preferences. In multivariate modeling, if one controls for basic respondent demographics (gender, race/ethnicity, education) any effect from interviewer race or interviewer gender disappeared (i.e., became nonsignificant). The lack of any evidence for an effect of interviewer race or gender on how respondents answered the presidential vote question is not conclusive evidence against the Shy Trump hypothesis. However, the result is inconsistent with expectations of the Shy Trump theory, and suggests other factors than socially desirable reporting were responsible for the bulk of error in general election polls.One alarming possibility raised by the direction of polling errors was that, broadly speaking, some segment of Trump’s support base was not participating in polls. Participation in polls is quite low across the ideological spectrum and has been for some time (Pew Research Center 2012), and even the most rigorous polls in 2016 had single digit response rates. So it is not the case that most Clinton supporters were taking polls while most Trump supporters were not; rather the pattern would have been much more subtle. Differential nonresponse, if it was a real problem, would have manifested as Trump supporters being somewhat less willing to participate in surveys, on average, than Clinton supporters.While national polls clearly performed better than state-level polls on average, at least one set of commentators suggested that the strong performance of national polls was a mirage. Cohn, Katz and Quealy (2016) observed that Trump out-performed his poll numbers in states with a large number of white voters without a college degree and under-performed his polls numbers in large, liberal states with sizable Hispanic populations. Overall, they noted, “the two types of misses nearly canceled out in national polls.” If true, then the conclusion reached here and elsewhere that the national polls were generally accurate while many state polls were not would be discredited. If the low error on national polls was simply a fortuitous outcome of two large errors canceling, then it would be more accurate to conclude that neither state-level nor national-level polls did a good job of capturing the voting electorate in the 2016 general election.To test whether national polls appeared to perform well simply because large errors canceled, we used final microdatasets from three RDD polls (ABC News/Washington Post, CNN/ORC, Pew Research Center) and one online opt-in poll (SurveyMonkey). If the assertion were true, we would expect to find that these national polls noticeably under-estimated Trump’s support in key working class white states (PA, MI, MN, WI, OH) while simultaneously over-estimating Trump’s support in liberal, relatively Hispanic states (CA, NY, NV, IL, WA). The results are presented in Table 8. It must be noted that of these four surveys, only one (SurveyMonkey) was designed for state-level inference and released state-level vote estimates. In addition, two of the polls (CNN/ORC and Pew Research Center) were conducted at least two weeks before Election Day. While it is, therefore, unrealistic to expect the subnational estimates for all of these polls to align perfectly with the vote, we felt it was reasonable to check the data for the general pattern in question.Overall, the data are not consistent with the claim that the relatively accurate results in national polls in 2016 resulted from two large errors canceling each other out (over-statement of Trump support in liberal, heavily Hispanic states and understatement in working class white states). For any given poll, that narrative only gets about half the story correct. The CNN/ORC poll accurately estimated Trump support in predominantly liberal and Hispanic states; all the projection error was in Upper Midwest. For the Pew Research Center poll, the opposite was true; the projection error was predominantly in the liberal, Hispanic states. For the ABC/Washington Post and SurveyMonkey polls, the estimates of Trump support were too low in the Upper Midwest (relative to the outcome), but Trump’s margin of defeat in liberal, Hispanic states was smaller or the same in those polls than in the vote.In fairness, if one looked at how the state-level polls performed and assumed that national polls are basically conducted the same way, then this theory of canceling errors seemed very plausible. It overlooks one key point, however. State-level polls and the national polls are not conducted the same way. As discussed in section 2.4, live telephone interviewing represents a much larger share of national polls (36%) than state-level polls (18%). While an IVR or online opt-in poll may cost in the neighborhood of $5,000 to $15,000, live telephone polls with professional interviewers cost closer to $100,000 (Cassino 2016). This means that the resources going into a typical state poll can be dwarfed by those that go into a national poll. In addition, national pollsters are nearly twice as likely to adjust for education as state-level pollsters.One way to test for differential partisan nonresponse is to leverage information about which parts of the country were staunchly pro-Trump and how many people live in those areas versus the rest of the country. If polls systematically failed to interview people in staunchly pro-Trump areas, we would expect to find residents of such counties under-represented in polls. For example, if the Census shows that 13% of Americans live in staunchly pro-Trump areas, but polls estimate that only 9% of Americans live in those same areas – that would be evidence that polls were, indeed, systematically missing Trump supporters. Somewhat surprisingly (given the polling errors), we found no evidence to that effect. The results are presented in Table 9.Since there was no obvious, definitive way to define a “staunchly pro-Trump” area, we tested three definitions. The definition used in the first row of the table identifies counties in which Trump won by at least a 40-point margin. The definition used in the second row identifies counties in which Trump won by at least a 60-point margin. Finally, third row simply identifies rural counties, defined as those with a population density of fewer than 50 people per square mile. The rural definition was motivated by the fact that Trump, like most Republican presidential candidates, generally had much stronger support in rural areas than metropolitan areas. Census estimates for the share of the population living in areas identified using each of these three definitions come from the 2015 Census population estimates. Poll estimates come from two microdatasets that contained the requisite county-level information – the mid-October CNN/ORC poll (n=1,017) and a cumulative dataset with all 15,812 telephone interviews Pew Research Center conducted in 2016 political polling.[15]If the polls were systematically missing people in staunchly pro-Trump areas, then the figures in the unweighted estimate columns would be noticeably lower than the Census benchmarks in the second column. If such a pattern was not fixed by the weighting, then the estimates in the weighted estimate columns would also be noticeably lower than the Census benchmarks. Neither of those patterns is present in the data. If anything, people living in the most pro-Trump parts of the country are slightly over-represented.These findings do not rule out the possibility that differential nonresponse was a factor in polling errors in 2016. For example, it is possible the people interviewed in these pro-Trump areas were not representative with respect to their vote choice. It is also important to note that this analysis, based on telephone RDD polling data may not generalize to online opt-in polls or IVR polls. Even with these caveats, it is informative that this particular test, which we expected might detect under-representation of pro-Trump areas, does not show evidence of bias.One hypothesis about 2016 polling errors is that pollsters did not interview enough white voters without a college degree (Silver 2016b). Indeed, many pollsters are likely to acknowledge that contemporary polls almost never interview enough voters without a college degree. Numerous studies have shown that adults with less formal education tend to be under-represented in surveys on an unweighted basis (Battaglia, Frankel and Link 2008; Chang and Krosnick 2009; Link et al. 2008; Pew Research Center 2012). A seasoned pollster would be quick to emphasize, however, that this well-established education skew need not bias their estimates. Many pollsters adjust their samples to population benchmarks for education in order to address this very issue. As long as the pollster accounts for the under-representation of less educated adults in their weighting, then this issue would not lead to bias, so long as the less educated adults they did interview were representative of the ones they did not interview.In the weeks following the 2016 general election, however, one intriguing fact started to emerge: not all pollsters, particularly those polling at the state level, adjusted their weighting for education. Why would that have undermined polls in 2016 but not previous elections? The answer is that in 2016 the presidential vote was strongly and fairly linearly related to education; the more formal education a voter had, the more likely they were to vote for Clinton (see the right-hand panel in Figure 9). Historically, that has not been the case. In most modern U.S. elections, presidential vote (defined here as support for the Democratic candidate) exhibited a U-shaped or “curvilinear” pattern with respect to education. For example, as shown in the left-hand panel of Figure 9, in 2012 both the least educated and most educated voters broke heavily for Barack Obama, while those in the middle (with some college or a bachelor’s degree) split roughly evenly for Mitt Romney and Barack Obama.Source: NEP national Exit Poll 2012, 2016To understand why pollsters could perhaps get by without weighting on education in an “U-shape” election like 2012 but not a linear election like 2016, consider the post-graduate results. In a U-shaped election, the post-graduate voters who are likely to be over-represented in polls that are not adjusted for education vote in much the same way as the low-education voters that such polls under-represent. By contrast, in 2016, that completely fell apart. In 2016, highly educated voters were terrible proxies for the voters at the lowest education level. At least that was the case nationally and in the pivotal states in the Upper Midwest.Following the election, two different state-level pollsters acknowledged that they had not adjusted for education and conducted their own post-hoc analysis to examine what difference that would have made in their estimates. Both pollsters found that adjusting for education would have meaningfully improved their poll’s accuracy by reducing over-statement of Clinton support.The final University of New Hampshire (UNH) poll had Clinton leading in the Granite State by 11 points. She ultimately won by a razor thin 0.4-point margin. The UNH poll director, Andrew Smith (2016), reported that the released poll adjusted for age, gender and region but not education – a protocol that had served the Granite State Poll just fine for numerous election cycles. According to Smith (in email correspondence), “We have not weighted by level of education in our election polling in the past and we have consistently been the most accurate poll in NH (it hasn’t made any difference and I prefer to use as few weights as possible), but we think it was a major factor this year. When we include a weight for level of education, our predictions match the final number.” Indeed, as shown in Figure 10, had the UNH poll adjusted for education in 2016, that single modification would have removed essentially all of the error. The education-adjusted estimates showed a tied race.Source – University of New Hampshire poll conducted November 3-6, 2016 with 707 likely voters. Michigan State University poll conducted September 1 – October 30, 2016 with 743 likely voters.The story is similar, though less dramatic, for Michigan State University’s (MSU) State of the State Poll. That poll, which like the UNH poll was conducted via live phone with a dual frame RDD sample, showed Clinton leading Trump in Michigan by 17 points.[16] She ultimately lost that contest by another slim margin (0.2 points). The MSU poll did not adjust for education, but if it had, Clinton’s estimated lead would have been 10 points, instead of 17. One other noteworthy feature of the MSU poll is that, unlike the UNH poll, it was fielded relatively early, with most interviews completed before mid-October. This means that the MSU poll largely missed what appears to be a significant, late shift in support to Trump. As discussed in Section 3.1, the national exit poll indicates that about 13 percent of Michigan voters made their presidential vote choice in the final week of the campaign, and that group went for Trump by about an 11-point margin.It was not just RDD pollsters who, in hindsight, would like to have handled education differently in their weighting. SurveyMonkey’s Head of Election Polling (and report co-author), Mark Blumenthal (2016), reported their online opt-in poll weighting did adjust for education but used three categories that were quite broad (high school or less, some college, and college graduate). According to Blumenthal, “If we had separated out those with advanced degrees from those with undergraduate degrees in our education weighting parameters, we would have reduced Clinton’s margin in our final week’s tracking poll by 0.5 percentage points to +5.5 (47.0% Clinton to 41.5% Trump).”Despite this, it is not clear that adjusting to a more detailed education variable would have universally improved polls in 2016. Analysis of the effect from weighting by five education categories rather than three categories in four national polls (Appendix A.H) yielded an average change of less than 0.4 percentage points in the vote estimates and no systematic improvement. In sum, the difference between weighting to education or not (as with UNH and MSU) is much more dramatic than the difference between weighting to more or less detailed education categories (as with SurveyMonkey).Given the evidence that not adjusting on education led to an unintended pro-Clinton bias in several polls – might this exp