There are a bunch of things about this story that just don’t make a lot of sense to me.

For those who haven’t been following the blog recently, here’s the quick backstory: Brian Wansink is a Cornell University business school professor and self-described “world-renowned eating behavior expert for over 25 years.” It’s come out that four of his recent papers—all of them derived from a single experiment which Wansink himself described as a “failed study which had null results”—were hopelessly flawed. An outside research team (Tim van der Zee​, Jordan Anaya​, and Nicholas Brown) looked at the papers and found over 150 errors. Earlier, I’d looked at the papers and found that they sliced and diced their data in different ways to come up with statistical significance. The data were all from the same experiment but different analyses used different data-exclusion rules and controlled for different variables.

All this led me to disagree with Wansink’s assertion that publishing that sort of work was a better use of one’s time than watching Game of Thrones.

Since then I’ve noticed a few weird things in this case:

1. Some people seem to be upset that Wansink isn’t sharing his data. If he doesn’t want to share the data, there’s no rule that he has to, right? It seems pretty simple to me: Wansink has no obligation whatsoever to share his data, and we have no obligation to believe anything in his papers. No data, no problem, right?

2. Wansink’s easygoing reactions seem to me to be dissociated from the seriousness of the problems that people have found with his work. A bunch of commenters on his blog have pointed out the obvious problems with his research methods, but he has just responded blandly in an in-one-ear-and-out-the-other kind of way.

Here’s a representative example. Anthony St. John writes:

With field studies, hypotheses usually don’t “come out” on the first data run. But instead of dropping the study, a person contributes more to science by figuring out when the hypo worked and when it didn’t.” [quoting Wansink] I suggest you read this xkcd comic carefully: https://xkcd.com/882/ It provides a great example of learning from a “deep dive”. [quoting Wansink]

Brian Wansink replies:

Hi Anthony, I like it. Thanks for the link. (Makes me grateful I’m more of a purple jelly bean guy). Best, Brian

Anyone who looks at that famous xkcd jelly-bean cartoon will immediately realize that it’s slamming the “deep dive and look for statistical significance” approach to research. But Wansink follows the link and . . . doesn’t get the point? Doesn’t realize that St. John, like most of the other commenters on the blog are saying he’s doing everything exactly wrong?

And there are lots more exchanges on that post that have the same flavor, people commenting that the work is “salami slicing null-results . . . worthless, p-hacked publications . . . junk science,” and Wansink giving mild, agreeable responses like, “I understand the good points you make.” Just a complete disconnect. The guy really does seem to be a living embodiment of that jelly bean cartoon.

3. But the weirdest thing of all is Wansink’s reaction to the three outside researchers finding 150 errors in his papers. Who has 150 errors in four papers? When does that ever happen?

Of course Wansink doesn’t want to share his data—that much is obvious. Zee et al. found those errors without even seeing the original data—these were just inconsistencies in the published tables. It’s hard to imagine what could’ve happened to get that many errors out of a single dataset, but whatever did occur must be a bit embarrassing to the people concerned.

What stuns me is Wansink’s attitude! When you publish four papers from a “failed study,” and the statistical methods in those papers are criticized by experts, and when an outside team finds 150 errors in the papers, the appropriate response is not to say you’re gonna go fix some little things and “correct some of these oversights.” No. The appropriate response is to consider that maybe, just maybe, the data in those papers don’t support your claims.

Let me put it this way. At some point, there must be some threshold where even Brian Wansink might think that a published paper of his might be in error—by which I mean wrong, really wrong, not science, data not providing evidence for the conclusions. What I want to know is, what is this threshold? We already know that it’s not enough to have 15 or 20 comments on Wansink’s own blog slamming him for using bad methods, and that it’s not enough when a careful outside research team finds 150 errors in the papers. So what would it take? 50 negative blog comments? An outside team finding 300 errors? What about 400? Would that be enough? If the outsiders had found 400 errors in Wansink’s papers, then would he think that maybe he’d made some serious errors.

The whole thing just baffles me. On one hand, Wansink seems so naive about statistics and research methods. But on the other hand, who could be so clueless as to not suspect a problem when hundreds of errors have been found in these papers? Most scientists I know would get concerned if someone found one error. Even from a purely strategic standpoint, if you’re only concerned about your reputation, wouldn’t it make sense to cut your losses and accept that these particular papers are hopeless messes?

And what can Wansink possibly mean when he writes, “We’ve always been pleased to be a group that’s accurate to the 3rd decimal point”? That makes no sense given the incredible density of errors on those four papers.

As I said, the whole thing just seems weird to me. I just can’t understand Wansink’s serene response. If you publish empirical work and someone finds 150 errors in your papers, that’s a concern, no?

To paraphrase the famous spiritualist Arthur Conan Doyle:

Detective: “Is there any other point to which you would wish to draw my attention?” Blogger: “To the curious incident of the researcher in response to people pointing out 150 errors in four of his papers.” Detective: “The researcher did almost nothing in response to people pointing out 150 errors in four of his papers.” Blogger: “That was the curious incident.”

I perhaps thought of this because Wansink has been called “the Sherlock Holmes of food” by the American Psychological Association.

Who cares?

The question naturally arises, why keep writing about this dude? (I think this is my fourth post on the topic.) Busy prof runs a science factory, whips out zillions of papers each year, very little quality control, students and postdocs in the lab are under huge pressure to come up with publications, they use every trick they can think of to come up with statistically significant results, some people take a careful look and find inconsistencies and errors in some of the papers, this is awkward because prof has described the student who did the work as a “hero,” prof tries to affably sweep the whole event away, questions are raised by bloggers and journalists who’d never even heard of the prof until this controversy, etc.

Same old same old, we hear about in Retraction Watch every day. The only new thing about it is the 150 errors—when does that every happen?—but, still, maybe that’s not enough to make the incident worthy of four separate posts.

I continue writing about this story because of the insight it gives into the inner workings of the famous self-correcting nature of science. The process of self correction is much more involved than people seem to realize. Sometimes people demand retractions, but as I’ve written before, I don’t see retraction as a serious solution for reform of poor research and publication practices, or as a way of cleaning the public record. The numbers just don’t add up: there are just too many hopelessly flawed papers, and retraction is done so rarely.

It’s impossible to solve problems such as Wansink’s “deep data dives” (actually I assume these dives were done by his student based on the encouragement and incentives provided by Wansink) on a case-by-case basis. There are just too many cases.

Paradoxically, this motivates me to examine at certain individual cases, like this one, in detail, to look at how people at different stages of their careers react to the realization that they’ve been doing junk science. This can help the many thousands of researchers out there who aren’t personally and professionally invested in discredited work, and who want to use scientific methods to learn about the world.

Not many of us have published multiple papers based on a “failed study which had null results,” and not many of us have had our names attached to papers with 150 errors, but we’ve all had research setbacks, ideas that didn’t pan out, and the excitement of major discovery—followed by the realization that we did something stupid and didn’t make that discovery after all. How to handle these disappointments? That’s not something covered in the usual course on research methods.

In judo, before you learn the cool moves, you first have to learn how to fall. Maybe we should be training researchers, journalists, and public relations professionals the same way. First learn about Judith Miller and Thomas Friedman, and only when you get that lesson down do you get to learn about Woodward and Bernstein.

P.S. Lots of great comments here. I just want to point out this one from Mark Palko:



When this guy finds an effect, he by-god finds an effect: http://www.cbsnews.com/news/slim-by-design-author-brian-wansink-gives-tips-on-avoiding-bad-food/ In a new book, “Slim by Design: Mindless Eating Solutions for Everyday Life,” food psychologist and director of the Cornell University Food and Brand Lab Brian Wansink says you don’t need willpower to shed the pounds but to change your surroundings instead. “You have a messy kitchen, a cluttered desk, you end up eating 44 percent more snacks than if the same kitchen is clear,” Wansink said on “CBS This Morning.” In fact, people who leave cereal boxes on the counter are more likely to be heavier. “Mainly women,” he added. “About 21 pounds heavier than the neighbor next door that doesn’t have any cereal visible at all.” Those findings are based off of observational studies that Wansink performed. He investigated 230 homes in Syracuse, New York, measured the women’s weight and took pictures of their kitchens. … “If you’re serving white rice on a white place, you don’t really see the difference, so you tend to put about 18 percent more on,” Wansink said. “If you put that on a darker plate or a colored plate, you automatically serve less and eat less.” … “We’ve analyzed lots of orders and restaurants. What we find is that if you sit near a window, you’re about 80 percent more likely to order salad; you sit in that dark corner booth, you’re about 80 percent more likely to order dessert,” Wansink said.

I’d say this is disgraceful and counter to all principles of quantitative science—except that this sort of ridiculous hype is standard operating procedure among celebrated and leading researchers in psychology and economics. So, although can blame Wansink for publishing papers with 150 errors and then not seeming to really catch that this might be a problem, I can hardly single him out for publishing and publicizing ludicrously high effect size estimates that, to a trained eye, are the obvious product of the statistical significance filter: Take small sample size, add noise, pick at the data until you find a pattern that fits your story and is more than 2 standard errors away from zero, then publish the paper and go on TV advertising your stunning results. 80 percent more likely, indeed. I believe that about as much as I believe that early childhood intervention increases people’s wages by 40% when they grow up. Or that there’s a province in China where the life expectancy would’ve been 96 in the absence of indoor coal heating.

All these things could be possible—ok, not the 96-year life expectancy, but all the others—but I have no reason to believe them, as they’re super-biased estimates. I don’t usually make a practice of scaling my estimates up by a factor of 10 or whatever, just for the hell of it, but that’s what these researchers are doing when they report a selection of raw and noisy estimates that happen to be at least 2 standard errors away from zero. Type M error is not just a slogan. It’s a way of life with much of the research community. And CBS News, NPR, etc. fall for it, every time.