How often do people conducting surveys simply fabricate some or all of the data? Several high-profile cases of fraud over the past few years have shone a spotlight on that question, but the full scope of the problem has remained unknown. Yesterday, at a meeting in Washington, D.C., a pair of well-known researchers, Michael Robbins and Noble Kuriakose, presented a statistical test for detecting fabricated data in survey answers. When they applied it to more than 1000 public data sets from international surveys, a worrying picture emerged: About one in five of the surveys failed, indicating a high likelihood of fabricated data.

But that claim is being hotly disputed by the Pew Research Center, one of the major funders of such surveys. And the organization has gone so far as to request the researchers desist from publishing their work. Pew’s actions are "pretty disappointing," says Kuriakose, a research scientist at SurveyMonkey in Palo Alto, California. "This problem isn't going to just go away."

Ironically, Robbins and Kuriakose met at Pew, which is based Washington, D.C., when they were both researchers there. "Michael was doing methodology work on Pew's international surveys and we connected about data quality," Kuriakose says. They then devised an early version of their fake data test.

The basis of the test is the likelihood, by chance alone, that two respondents will give highly similar answers to questions on a survey. How similar is too similar? After running a simulation of data fabrication scenarios, they settled on 85% as the cutoff. In a 100-question survey of 100 people, for example, fewer than five people would be expected to have identical answers on 85 of the questions.

The 85% rule isn’t appropriate for all kinds of surveys, Kuriakose points out. "For example in customer satisfactions surveys where each question measures characteristics of the same product or company, the questions aren't really independent or even meaningfully different." Nor does it work for surveys of health outcomes, in which healthy people all have similar responses. But for the large-scale opinion surveys typically carried out in the developing world—many questions covering broad topics that are designed to identify the differences between communities—"this is exactly the appropriate method for detecting fabrication," Kuriakose says.

After he left Pew, Robbins became director of Arab Barometer, a project that surveys opinion across the Arab world, and applied the test to his own data. Surveying communities in the developing world often requires face-to-face interviews, going house-by-house in dangerous environments. So one of the inevitable problems, Robbins says, is "curbstoning" where an interviewer sits on the curb and invents survey responses—often duplicating answers—in order to avoid risk or save time.

The test he and Kuriakose developed helped Robbins identify data that subsequent detective work proved to be fabricated. And it made him wonder, how big is this problem? So he teamed up again with Kuriakose to refine the technique and apply it to publicly available data sets from international surveys. With few exceptions, they limited their analysis to studies that asked more than 1000 people at least 75 questions on a range of topics. And to be conservative, they forgave studies for which at least 95% of the data passed the test.

That made the results all the more worrying: Among 1008 surveys, their test flagged 17% as likely to contain a significant portion of fabricated data. For surveys conducted in wealthy westernized nations, that figure drops to 5%, whereas for those done in the developing world it shoots up to 26%.

Robbins and Kuriakose have uncovered a massive problem and the Pew paper doesn’t change that. Michael Spagat

Robbins and Kuriakose began presenting their study at meetings last year, drawing Pew’s attention. "We found out about this study and were very alarmed," says Courtney Kennedy, director of survey research for the Pew Research Center, which has undertaken hundreds of international surveys.

Kennedy says that Pew used the newly developed test on its own data. "And yes, a certain share exceeded the threshold," she says. "So we dug deeper, and by the time we finished our process there were [just] literally a handful of surveys where we had serious questions." On that basis, Kennedy says, "clearly this method is prone to false positives."

In November 2015, Kennedy and other top officials at Pew sent an email to Kuriakose and Robbins, which Science has obtained. By then, the pair had submitted their method and findings as a paper to the peer-reviewed Statistical Journal of the IAOS, one of the field's leading publications. "We strongly suggest that you retract the paper," the email states, "as we believe the analysis is severely underspecified and will give both survey vendors and contractors a false metric for identifying fraud." Kennedy calls the letter "appropriate" because "our organization's reputation is on the line. You can't make cavalier claims like that."

Kuriakose and Robbins did not withdraw their paper. It was accepted in December 2015 and is in press.

Although yesterday's meeting, dubbed by attendees as Datafab2016, was intended as a collegial gathering to improve best practices in survey research, neither side budged. Kuriakose and Robbins went a step further than their accepted paper by showing the results of their test on 309 of Pew's international studies, including their Global Attitudes surveys, and several high-profile surveys of religious beliefs: Thirty percent failed.

During her turn on the stage, Kennedy mounted an attack on the test's methodology. For example, she points out, it does not account for the number of questions on a survey, the number of respondents, nor other factors that can skew the results. She also takes exception to the 85% similarity threshold. "I would choose a different threshold depending on the population and the survey," she says. By putting a number on the extent of data fabrication across all surveys, "they took it too far," Kennedy says. Pew's rebuttal is now online.

Some at the meeting saw merit in both sides of the fight. Rather than overestimating data fabrication, the method of Kuriakose and Robbins "very likely underestimates the true extent of the problem," says Michael Spagat, an economist at Royal Holloway, University of London, who has investigated high-profile cases of possible data fabrication in war zones. Yet Kennedy’s response impressed him, too. "I think the Pew paper is interesting and made some good points," he says. "Specifically, there isn’t a hard and fast cutoff beyond which you know there is fabrication." Overall, however, Spagat remains very concerned about data fabrication in surveys. "Robbins and Kuriakose have uncovered a massive problem and the Pew paper doesn’t change that."

Nothing was settled by the end, says the meeting's co-organizer Steven Koczela, president of the MassINC Polling Group in Boston and a previous survey research leader for the U.S. State Department. The case laid out by Kuriakose and Robbins "seems unassailable to me," he says, "but [Pew] are giving it their level best."