Photo: Courtesy of Donald P. Green

“I don’t know that I have anything particularly deep or profound to say,” Donald P. Green tells me when I get him on the phone. “Maybe because I’m just very close to this. I’m sort of in a state of bewilderment.” It’s an understandable reaction given what has happened to the Columbia University political scientist since the weekend.

Green co-authored an extremely impressive Science study released in December showing not only that a short conversation with a gay canvasser appeared to significantly nudge California voters in a pro-gay-marriage direction, but that the effects were contagious within those voters’ households and lasted at least nine months — the final point at which the researchers checked in with the study participants via online surveys.

Within the world of psychological and political science research on attitude change, which more commonly involves small-scale interventions that occur in labs rather than in real-life settings, effects of this size and durability are almost unprecedented. As a result of the study’s exciting findings, a wave of publicity followed — the New York Times and countless other outlets, including Science of Us, covered the study, and it garnered an entire “This American Life” segment.

As it turned out, the study’s too-good-to-be-true results were exactly that. While the canvassing did occur, there may not have been any survey data collected from California voters at all. Following an investigation by David Broockman and Joshua Kalla, two graduate students at UC-Berkeley looking to extend the findings, it became clear that Michael LaCour, a graduate student at UCLA and the study’s first author, had simply faked the data — when Broockman and Kalla noticed discrepancies and contacted the survey firm LaCour pointed them to, the firm told them it had never worked with LaCour at all. (Broockman and Kalla have posted their full accounting here.)

The grad students notified Green, who quickly became convinced that something was seriously amiss with the paper. Just days later he sent Science a letter asking that it be retracted. “Last week at this time, life was relatively normal,” he said. “We had an interesting hypothesis and a sound research design and robust findings, and now we have the first two but not the third.”

“The whole research agenda now pivots because things we thought were true — things I thought were true — are backing down,” he went on. “So I just need to kind of readjust my thinking.” In an interview, Green explained what this academic scandal means for the broader practice of social science, why he’s more confused about LaCour than angry at him, and why so many people fell for LaCour’s data-collection fable.

In the immediate wake of the news that the data were faked, I’ve seen a lot of people cite this study as an example of problems with the peer-review system or even the broader scientific method. What do you think about this argument?

The question of what does this mean about the integrity of science, or the integrity of scientific vetting procedures — I suppose you could look at it one of two ways. The negative way to look at it is here was a failure of the review process, or a failure of the vetting process, or a failure on my part as the senior author. We posted our data, we did all kinds of checks, and still fraud slipped through. That’s one way to think about it.

Another way to think about it is, it was because of the posting of replication data sets and because of the meticulous way in which the study was described that others who sought to do a study like the original studies like Michael LaCour had done, tried and failed and asked questions and got answers form the data, and recognized that things were out of sync with what had been reported in the article. And so from that standpoint, it’s a positive story about the self-correcting nature of science.

But you can understand why a lot of people aren’t interpreting it that way, that they think that the fact that he was able to fool so many people is a pretty big indictment of the process itself.

I think that it’s absolutely correct to say that in the short run, the process has its vulnerabilities, and one of the things I’m certainly reflecting on now is how can something positive come out of this in terms of the way in which we structure our procedures in our research group. Maybe the answer is that we need to have at least two people at all times gathering primary data. Maybe that was the source of this problem.

Was the personnel structure here, where he was the only guy doing the primary data collection, at all unusual?

No, it’s not that unusual.

On the one hand, there’s obviously the potential for abuse in that sort of situation. On the other hand, in any professional settings, if we didn’t have certain baseline assumptions that our colleagues are acting honestly, or not making stuff, everything would grind to a halt. There’s no way to not have some degree of trust baked into the research process, right?

I agree. I think that one wants to be skeptical and build in checks, but without some degree of trust one would have to build in so many checks and so much redundancy into the system that nothing would be feasible except at very high cost. So there’s a cost of ratcheting up the level of mistrust.

Which this study is probably going to do.

It’s hard for me to say. It remains to be seen. But certainly, as I go forward, I want to think about ways in which checks are put into place. The reason you want people to have people do primary data collection alone is that it’s the most efficient way to gather a lot of data. You’re not having duplication of effort. But if we think fraud is a very real possibility, then duplication is just a necessary cost.

What was the span of time between when irregularities were first brought to your attention and when you realized that basically the whole study had fallen apart?

Broockman and Kalla brought the concerns they had to my attention last weekend, and they brought aboard Peter Aronow to investigate further. During the weekend they showed me a preliminary draft of their report, and it was pretty convincing. Not just pretty convincing — I should say quite convincing. I brought it to the attention of LaCour’s advisor on Sunday, and we basically set in motion a series of investigative steps first thing on Monday morning. We had a list of five things we wanted to cover, though we really only got to the first two because it was over then.

So it sounds like within basically four days, you went from first finding out about this to sending the retraction request to Science.

Yeah. On Tuesday, I talked to Michael LaCour, trying to get him to admit that the data were fabricated, but he resisted that, and as far as I know he’s still maintaining the data are real. He indicated to me that he was considering writing a retraction. I waited for it on Tuesday, and when it didn’t arrive by Tuesday night, I just sent out my retraction.

In terms of your feelings toward him, has any sympathy or pity seeped in, or is it too soon?

Naturally, I’m quite embarrassed by the whole situation, embarrassed to have any role in the situation. It’s not my idea of fun or recreation to answer journalists’ questions hours after hours after hours, day after day. Naturally, I resent being put in this awkward position for no reason.

I obviously have gotten along very nicely with Michael, and we have been friendly. But my puzzlement now is, if he fabricated the data, surely he must have known that when people tried to replicate his study, they would fail to do so and the truth would come out. And so why not reason backward and say, let’s do the the study properly? I guess maybe another source of puzzlement I have is at some level, I don’t really care how the study comes out, I just want to know how the experiment comes out. It comes out the way it comes out — I just want it to come out the same way twice, however it comes out, so that other people will find the same thing I’m finding. Then they can do replications and extensions and new directions.

I guess there was this view that maybe you had to make the findings especially spicy for people to sit up and take notice, but I don’t think I — I hope I never conveyed that view to him. That’s one of the things that’s a real head-scratcher now for me.

It’s interesting hearing you ask those questions about how he could have not understood the risks of doing this, because people asked the same questions about Stephen Glass at The New Republic and Jonah Lehrer at The New Yorker. This is pop-psych speculation, but it’s like there’s a personality type that just can’t help it or something.

Yeah, I don’t know. This is where I’m so temperamentally on the other side of the continuum. I guess maybe there’s a kind of thrill that people get from doing things that they know are over the edge, but I’m on the other end of the continuum. I’d just be happy to do my own work. My inclination after doing an experiment is rather than talk about the experiment, let’s do another experiment, because I want to confirm it and be really sure. And that’s ironically what I told Michael when he first showed me his results: “I think you should do it again.”

And that was a good bit of advice. It just had one critical flaw: It’s right to try to replicate scientific findings and see if the results from the first remarkable experiment were a fluke. The only problem with the fact that they came out the same both times is that it’s subject to two interpretations: One is that he’d really discovered something, or the researcher was the same in both studies.

I couldn’t help but think about confirmation bias, as I went back to look at my own reporting on the study. Because the stuff the study said — that you’ll have more luck appealing to people on an emotional level, tying their own values to the issue in question — most of those basic findings aren’t really in question. That’s what political science and political psychology have known for a while, right?

Yes, I think that what’s at issue in this experiment is not whether it’s possible to use those elemental psychological forces to change minds, but rather whether you can do so in the context of a conversation like this one, with messengers like these, and have the effects endure and ramify throughout the household. That’s what makes the study interesting. Everybody knows that there’s some degree of truth in these propositions, and the reason you do an experiment is you want to measure the quantity.

Is there any chance somebody’s actually going to do this experiment again, now, or is —

Yeah. Yeah, I’m quite confident that people are going to do this experiment. I want to do this experiment. In some ways, Jesse, the irony of this whole thing is that the experiment was done, there really was an experiment. Dave Fleischer and his canvassing team, they really did bust their hump to do these interventions, to give treatment messages, placebo messages, with gay canvassers, with straight canvassers — that’s all true, and it happened not once but in two separate studies. They did the experiment, but the outcomes were never measured, so now we just need to do it with real survey data …

The right thing to do here is set in motion a new study, and do it properly.

And you would want to take part in that study?

Absolutely. I’m the first to volunteer for either doing work directly on that study or being an advisor to those who want to do that study. I’d be very glad to contribute in any way.

This transcript has been condensed and lightly edited.