On 20 June 2017, the Twitter account of the morning show “Fox & Friends” tweeted a significant sounding bit of news:

As many as 5.7 million illegal immigrants might have voted in the 2008 election, report finds.

A more accurate tweet, if it could fit, might be:

The Washington Times is reporting that a web site named “JustFacts.com” has concluded that a widely-discredited 2014 study arguing up to 2.8 million non-citizens voted in the 2008 presidential election (based on the extrapolation of 38 survey responses from people who may have voted as non-citizens) has been unfairly debunked by “liberal fact checkers” and that, in reality, the number could be as high as 5.7 million.

The Original Study

That 2014 study, published in the journal Electoral Studies and authored by Jesse Richman, Gulshan Chattha, and David Earnest at Old Dominion University, made waves when the researchers first described their results in a Washington Post column that inspired three different rebuttals — and one additional rebuttal to those rebuttals, as well as a disclaimer about the disputed nature of the research paper itself.

In this study, the authors used data collected by Internet polling firms for a Harvard University initiative known as the Cooperative Congressional Election Studies, or CCES:

The 2008 and 2010 Cooperative Congressional Election Studies (CCES) were conducted by YouGov/Polimetrix of Palo Alto, CA as an internet-based survey using a sample selected to mirror the demographic characteristics of the U.S. population. In both years survey data was collected in two waves: pre-election in October, and then post-election in November. The questionnaire asked more than 100 questions regarding electoral participation, issue preferences, and candidate choices.

The thrust of their work was to demonstrate that some people checked off that they were both non-citizens and that they voted, in some cases going so far as to describe the candidate they voted for. As a check of their work, they used information provided by CCES from a research firm named Catalyst to verify that people who said they voted actually voted:

Validation of registration and voting was performed by the CCES research team in collaboration with the firm Catalyst. Of 339 non-citizens identified in the 2008 survey, Catalyst matched 140 to a commercial (e.g. credit card) and/or voter database.

Out of the 38 cases from 2008 in which non-citizens claimed to have voted (or had a vote validated they didn’t admit to in the survey), the authors found five (as in, the number after four) cases of survey responses from non-citizens who both said they had voted and that Catalyst could verify as having voted.

Using this data, some modeling, and error analysis, the authors concluded that between 7.9 percent and 14.7 percent of non-citizens voted in the 2008 elections. They then simply applied this to the entire non-citizen population in the United States. The findings are as crude as they are controversial:

Since the adult noncitizen population of the United States was roughly 19.4 million, the number of non-citizen voters […] could range from just over 38,000 at the very minimum to nearly 2.8 million at the maximum.

These numbers rest on the assumption that a subset of 38 (possible) non-citizen votes out of 339 non-citizens can be used to extrapolate countrywide voting behavior.

The Rebuttal

If extrapolating to a number based from Internet survey response data from a pool of 339 non-citizens into the millions sounds problematic to you, you are not alone. Brian Schaffner is professor of political science at the University of Massachusetts, Amherst and the co-principal investigator of the Harvard CCES from which Richman got his data. He told us via e-mail:

I don’t know any serious survey researchers who would have tried to extrapolate 100 or so respondents from a large survey like this to produce a range that large without tracking back to think about the dubiousness of that projection. […] It is totally worthless as a range of anything.

Schaffner was an author on a challenge to the Richman paper (“The Perils of Cherry Picking Low Frequency Events in Large Sample Surveys”), also published in Electoral Studies in 2015. Schaffner’s paper makes the argument that even a nearly non-existent amount of misreporting from the non-citizen group would create deeply flawed results if one tried to use that data to extrapolate. In that paper, they offer the following mental exercise:

Suppose a survey question is asked of 20,000 respondents, and that, of these persons, 19,500 have a given characteristic (e.g., are citizens) and 500 do not. Suppose that 99.9 percent of the time the survey question identifies correctly whether people have a given characteristic, and 0.1 percent of the time respondents who have a given characteristic incorrectly state that they do not have that characteristic. (That is, they check the wrong box by mistake.) That means, 99.9 percent of the time the question correctly classifies an individual as having a characteristic — such as being a citizen of the United States — and 0.1 percent of the time it classifies someone as not having a characteristic, when in fact they do. […] It implies, however, that one expects 19 people out of 20,000 to be incorrectly classified as not having a given characteristic, when in fact they do. Suppose that 70 percent of those with a given characteristic (e.g., citizens) engage in a behavior (e.g., voting). Suppose, further, that none of the people without the characteristic (e.g., non-citizens) are allowed to engage in the behavior in question (e.g., vote in federal elections). Based on these suppositions, of the 19 misclassified people, we expect 13 (70%) to be incorrectly determined to be non-citizen voters while 0 correctly classified non-citizens would be voters. Hence, a 0.1 percent rate of misclassification […] would lead researchers to expect to observe that 13 of 519 (2.8 percent) people classified as non citizens voted in the election, when those results are due entirely to measurement error, and no non-citizens actually voted.

To further raise the possibility that this kind of error could have happened and could be significant, Schaffner and his colleagues went back and re-interviewed people in the survey using data from 2010, telling us:

In 2012, we re-interviewed 19,000 people who had been respondents for the 2010 CCES. We asked them the same question about citizenship status as we had asked them in 2010. Of these 19,000, 121 had claimed to be non-citizens in in 2010. In 2012, 36 of the 121 had changed their response and to “citizen.” Additionally, 20 people who had clicked on the “citizen” option in 2010 changed to “non-citizen” in 2012. Thus, it is clearly the case that a small share of respondents were mis-clicking on response options to that question in at least one of the two surveys (about .3 %).

The existence of even the possibility of misreporting, especially when you consider that only five (5) of the non-citizen voters identified in 2008 were actually verified as voting, is problematic, as articulated by University of California, Irvine political scientist Michael Tesler in his Washington Post rebuttal to the Richman study:

With the authors’ extrapolations of the non-citizen voting population based on a small number of validated votes from self-reported non-citizens (N = 5), this high frequency of response error in non-citizenship status raises important doubts about their conclusions.

The Washington Times / “Just Facts” Take

One surefire way to make it sound like something carries authority without actually understanding any aspect of the topic you are covering would be to describe the process, as the Washington Times did in the story linked by “Fox & Friends”, as “a series of complicated calculations”.

Outside of the fact that these calculations are found in the 1,010th footnote of the JustFacts.com report, the calculations (shown below) don’t involve much more complicated mathematics than multiplication, subtraction, and addition (no division, thankfully). What Just Facts did was take the United States Census Bureau estimate of the number of non-citizen adults in the United States (19,805,000) and multiply it by, in essence, high-end and low-end estimates of the percentage of people in that group who vote in elections based on data from the Richman study — but with their own estimates of error:

19,805,000 non-citizen adults × ((8% self-declared voting – 5% margin of error) + (8% undeclared voting – 8% margin of error)) = 594,150 19,805,000 non-citizen adults × ((8% self-declared voting + 5% margin of error) + (8% undeclared voting + 8% margin of error)) = 5,743,450

The “8% self-declared voting” number comes from the 27 non-citizens out of 339 in the Richman study who said “I definitely voted”. The “8% undeclared voting” also comes from that same study, and is calculated as the 11 non-citizens identified by the Catalyst system as voting (out of the total 140 verified non-citizens matched to records in the Catalyst database). Any conclusion about sweeping waves of millions of non-citizen votes is tied to these undeniably small numbers.

In a 15 December 2016 post, JustFacts.com’s president James Agresti provided its justification for taking the results of the Richman study seriously. This post, however, serves mainly as an effort to debunk the claim made by Schaffner and his colleagues in their 2015 paper that “zero” non-citizen votes were cast in the 2008 presidential election.

For his part, Schaffner told us:

What we are saying […] is that once you account for measurement, the best estimate of the number of non-citizen voters is zero. That doesn’t mean we actually think there are zero non-citizen voters.

The JustFacts.com post also does very little to address the fact that the Richman study’s non-citizen dataset was so limited:

The critics make a legitimate point that random errors by survey respondents will overcount non-citizens. This is because far more citizens were sampled in this survey. For instance, if a survey sampled 100,000 citizens and 100 non-citizens, and 1% of them misidentified themselves, this would mean 1,000 citizens called themselves non-citizens, but only one non-citizen said he was a citizen. Such logic makes sense in a vacuum where all other evidence is ignored, but the reality is that misidentification of citizenship is not just a random phenomenon. This is because illegal immigrants often claim they are citizens in order to conceal the fact that they are in the U.S. illegally.

Agresti supports the latter part of this statement by providing evidence that “certain groups of illegal immigrants” frequently use fraudulent Social Security numbers and “misrepresent themselves as citizens”. He brushes off the former part of the statement by echoing claims made by Richman in a working paper (not peer-reviewed) that other demographic data in the CCES, as well as their own investigation of voter registration data, prove that people were not misreporting their citizenship status after all.

People can debate the virtues of those arguments as much as they want, but they certainly do not prove that zero people misreported their citizenship status, or that millions of non-citizen votes occurred in the 2008 election. The arguments also do not change the fact that the conclusion of “5.7 million noncitizen votes in 2008” is based on applying broad estimates of behavior from an exceedingly small subpopulation. The problems with this approach are evidenced by the absurdly large possible range Richman and later Agresti collectively came up with from the same data (38,000 to 5.7 million illegal votes).

We asked Richman how he felt about Agresti’s analysis of his work, and his response concedes the point that there is a lot of room to play around with this kind of data:

Ultimately there are a variety of assumptions one can make when interpreting the survey data, so I am not surprised that a different analyst approaching the numbers with a different set of assumptions might come to a distinctly different set of conclusions. My impression is that this is what Mr. Agresti has done. And while those numbers are not the ones I came to, ultimately it comes down to which set of assumptions one thinks are most plausible.

Straight-faced claims that there is material evidence for up to 5.7 million non-citizen votes in 2008 are remarkable, given that the study commonly cited as the basis for this claim has provided material evidence for five (not even six!) non-citizen votes in that year.

Any analysis based off of Richman’s study or Agresti’s analysis must square itself with the reality that they are based on numbers generated from just these facts:

1) In a group of 339 self reported non-citizens, 27 claim to have voted; and

2) In a group of 140 verified non-citizens, 11 may have voted.

In the defense of “Fox & Friends”, we acknowledge that this additional information would make for a far less flashy tweet.