A political scientist on Tuesday said he was retracting a paper he’d co-authored — one with wide influence on how campaigns can change public opinion — when faced with evidence that the paper’s central finding was based on polling that probably never happened.

The article, published last December in Science Magazine by UCLA graduate student Michael J. LaCour and Columbia University political scientist Donald P. Green, appeared to show that an in-person conversation with an openly gay person made voters feel much more positively about same-sex marriage, an effect that persisted and even spread to the people those voters lived with, who weren’t part of the conversation. The result of that purported effect was an affirmation of the power of human contact to overcome disagreement.

By describing personal contact as a powerful political tool, the paper influenced many campaigns and activists to shift their approach to emphasize the power of the personal story. The study was featured by Bloomberg, on “This American Life” and in activists’ playbooks, including those used by backers of an Irish constitutional referendum up for a vote Friday that would legalize same-sex marriage.

“How to convince anyone to change their mind on a divisive issue in just 22 minutes — with science,” was one catchy headline on a Business Insider story about the study. (The article was updated Wednesday with news of the retraction.)

Now that the underlying data appears to be fallacious, and Green has asked to retract the study (in a letter to Science and in his online CV), the study reveals different lessons. It shows how easily a scientist can invent data to show a desired result. It also shows how other scientists looking to replicate the result, with access to the original data, can quickly expose bad research. It took the authors of a study debunking the Science paper just two days to write their findings after they first noticed anomalies in the research. Their findings were published Tuesday night in a 27-page report complete with programming code and charts.

We can’t yet definitively say what went wrong, and what it means for the science of political persuasion. LaCour tweeted that he will “provide a single comprehensive response” at his “earliest opportunity.” He didn’t respond to emails requesting an interview.

Green, who told Retraction Watch he had no access to the primary data because Columbia’s institutional review board hadn’t approved the study (approval came from UCLA’s board), said in an email that when he last spoke to LaCour, on Tuesday by phone, LaCour said he couldn’t find files containing the survey data and indicated that he would write a retraction; when he didn’t, Green sent off his. Science hasn’t yet ruled on whether it will retract the original paper. The debunkers haven’t been able to dig into a similar LaCour study of how talking to a woman who has had an abortion affects people’s attitudes toward abortion, because he hasn’t yet published the underlying data.

The significance of the retraction was hotly debated Wednesday in political science circles and among other scientists. Some saw a positive story about how academia polices itself.

“This episode demonstrates how science is self-correcting,” the researchers who wrote the paper debunking the Science article wrote in a joint emailed response to questions. “We hope that this incident will further spur the open-science and data transparency movements.”

To others, though, the episode was far more troubling. The debunkers could do their debunking only because of a bit of luck: Data they needed happened to be available not from its original source, but through another researcher who had posted it to meet a journal’s open-data policies. And they weren’t specifically trying to replicate the Science study to see whether it held up. They were trying to extend the study, and grew suspicious when their early results didn’t line up.

To James Newburg, a political science graduate student, the episode shows that not enough data is shared in the field, and not enough people try to replicate each others’ results. “I think the best response the field can have would be to increase the incentives for practicing open science,” he said in an email.

Sanjay Srivastava, associate professor of psychology at the University of Oregon, sees skewed incentives as part of the problem. “A paper like this, with its huge impact, can make somebody’s career at an early stage, and can bring in huge rewards at later ones,” he said in an email. “I think all of us in science need to think about how to change those incentives, so we reward people for asking good questions and doing good science to answer those questions — and less on whether they happen to produce spectacular results.”

The retracted study, begun in 2013, was designed to test a theory activists were developing: that hearing someone’s personal story could shift political attitudes. Here’s what LaCour and Green wrote in Science about what they did: They recruited respondents for an online survey in Los Angeles neighborhoods where relatively large percentages of voters supported a 2008 ballot measure banning same-sex marriage in California. That gave them a baseline of attitudes toward gay people and other issues. Then they sent canvassers to the homes of registered voters. Some of the canvassers said they were gay and wanted to have the right to get married. Others described themselves as straight and said the ban on gay marriage restricted the rights of a child, friend or relative. Still others talked about recycling instead of marriage rights — to establish a control group. Follow-up online surveys of the visited voters gauged how the conversations affected their views.

A few days after their conversation with the study participants, the canvassers’ stated sexual identity hadn’t affected their persuasiveness: People visited by canvassers who spoke about the personal importance of same-sex marriage all became more supportive of gay marriage, by about the same amount — a whopping 0.5, on a five-point scale, relative to members of the control group who talked about recycling — in follow-up online polls. But by a month later, 90 percent of the shift in attitudes among people visited by straight canvassers had dissipated. Not so for those who talked to gay canvassers. Their opinion about same-sex marriage had shifted positively by even more: 0.77 on the five-point scale more than the control group since the survey before the canvassing visit. That’s the difference, the authors noted, between attitudes of residents of Georgia and Massachusetts in an earlier national survey of attitudes toward gay people. That comparison was a catchy one, also cited in the Science editor’s summary of the paper.

To repeat: This was a big effect. “It was the kind of effect scientists are trained to doubt,” Sasha Issenberg wrote in Bloomberg last October, previewing the forthcoming Science study. “’Unless this is replicated,’ Green told LaCour, ‘no one will ever believe it.'” LaCour duly ran a follow-up study, conducted in the same way, finding the same result and included it in the Science paper.

There remained good reasons to doubt the findings. Earlier work on the politics of persuasion showed that personal contact and attempts at persuasion could shift opinion, but the effect was far smaller and less persistent. This new result seemed too good to be true to some anonymous posters at Political Science Rumors who suggested soon after Science published the study that it was a hoax.

Columbia statistician Andrew Gelman was equally wowed by the effect size when he covered the study last December on the Monkey Cage blog at the Washington Post. He called it “amazing” and asked, “How could this happen?” Instead of challenging the study, though, he offered the theory that the experiment pushed people in a direction they were ready to move.

Enter David Broockman and Joshua Kalla, University of California, Berkeley, researchers intrigued enough by the Science study to try to extend it. Instead, they found they couldn’t match one of its basic features: its high response rate. They contacted the survey firm that they believed LaCour had worked with, but the firm said it wasn’t involved and probably couldn’t do the work as described.

Then they started digging into the data. What they found is that if LaCour really had fabricated the data, he’d left lots of clues. Broockman, Kalla, and Yale political scientist Peter Aronow outlined eight reasons they doubted the data was based on an actual poll. Instead it looked like an earlier survey was repurposed to show the desired result.

One of their reasons for thinking so is called heaping. When asked to quantify their feelings on a scale from 0 to 100 — a feeling thermometer — lots of people choose round numbers such as 50. And that shows up in the data from the first wave of the LaCour-Green survey. But in later surveys, they no longer see a heap of responses at 50 — instead they see what you’d see if someone fiddled with the data to make it look like overall opinion had shifted by a certain amount, along with some random statistical noise. Real survey respondents, though, would still choose 50 a disproportionate amount of the time — more often than 49 or 51. They might be different people than the ones who’d originally given an answer of 50. But you would be unlikely to see a change as dramatic as the fall-off in the Green-LaCour data — from 19 percent of responses being 50 in the first survey to just 2 percent or 3 percent in later waves.

Another example is the remarkable stability of respondents’ opinions. When pollsters ask the same respondents the same question at a later date, they typically get a lot of big shifts — more likely random measurement error than big swings in individual people’s responses. But that wasn’t the case with LaCour-Green. There was nearly perfect correlation between people’s responses on the first two waves of the survey. The debunkers took the evidence to Green, who took it to LaCour and his supervisor. Green then decided to retract the paper. After he sent Broockman and the others a copy of his retraction letter, they published their findings along with his.

“The study’s findings had huge implications for people who were trying to advance the cause of equality and have changed how advocates were doing ongoing work,” Broockman and co-authors wrote in the emailed response to questions. “Every minute we knew about the irregularities in the data and did not disclose felt like a lie by omission to the advocates out there who we talk to every day.”

Green said he appreciated the team’s work. “They are very thoughtful and technically gifted scholars, and I’m indebted to them for fitting the pieces of a complex puzzle together,” Green wrote me in an email.

If there was fabrication, it appears to have been isolated to the survey itself. “I am quite convinced that the canvassers went out,” Green said.

David Fleischer helped with the study’s canvassing work and is director of the Leadership Lab of the Los Angeles LGBT Center, which uses the method of personal-story persuasion and says it has seen good effect. He told me in an email, “I have confidence we’re on the right track. The simple truth appears to be that Mike LaCour never actually measured our work. We will continue to seek out independent measurement of our work. That’s the way we’ll know for sure whether we are having an impact, and if so the magnitude and duration of our impact.”

The revelation that LaCour may have fabricated the data was “a punch in the gut,” Fleischer said in a telephone interview Wednesday. He said his canvassers had spent tens of thousands of hours on training and other work for the project, all, apparently, for naught. “Of course we feel betrayed,” he said. The news sets back his group’s effort to learn the magnitude and duration of the effect of their work. He said he hopes Broockman and Kalla continue their planned follow-up.

Fleischer said that in 2013, he watched as LaCour and his research assistants saw what looked like survey data trickle in after canvassing. Fleischer now thinks LaCour made it look like people were answering surveys and data was coming in. “Do you know how much … effort that must have been?”

Fleischer said he last spoke to LaCour on Tuesday morning, when LaCour told him he was hiring a lawyer.

On Wednesday, many people who had relied on the validity of the Science paper walked it back. Science tweeted that it was assessing the retraction request and would in the meantime publish “an Editorial Expression of Concern.” Vox, Buzzfeed and Bloomberg updated their earlier reports on the study. “This American Life” published a blog post about the problems with the data. Gelman wrote that “the message, I suppose, is to be aware of the possibility that someone’s faking their data, next time I see an effect that’s stunningly large.”

LaCour removed from his website the line, “As of July 2015 I will be an Assistant Professor of Politics and Public Affairs in the Woodrow Wilson School and the Department of Politics at Princeton University.” A Princeton spokesman said in an email, “at this time the individual is not a Princeton University employee. We will review all available information and determine next steps.”

If other researchers hadn’t tried to extend the original study, would this ever have been caught? Researchers I spoke to were doubtful. They pointed out recent episodes of academic fraud that took time to come to light. Researchers, scientific journals, the polling industry and the press — including FiveThirtyEight, where we’ve run election forecasts using polls from firms that we later determined to have fabricated polls — are all vulnerable to made-up data because none routinely vet others’ work at the level that Broockman did. Even when the data is vetted, the signs of potential problems aren’t always as clear as they were in this case.

As Peter Winker, an economist at the University of Giessen in Germany who co-edited a book about poll fraud, put it in an email: “If the faker would have been a bit cleverer in this procedure, I doubt that the fraud would have been found this way.”