something to say about Audrey Hepburn's dress in Funny Face, but is his research on solid ground? Two fellow researchers question that."/>

Psychologist Nicolas Guéguen publishes studies that create irresistible headlines. His research investigating the effects of wearing high heels made it into Time: "Science Proves It: Men Really Do Find High Heels Sexier." The Atlantic has cited his finding that men consider women wearing red to be more attractive. Even The New York Times has covered his work.

Guéguen's large body of research is the kind of social psychology that demonstrates, and likely fuels, the Mars vs. Venus model of gender interactions. But it seems that at least some of his conclusions are resting on shaky ground. Since 2015, a pair of scientists, James Heathers and Nick Brown, has been looking closely at the results in Guéguen's work. What they've found raises a litany of questions about statistical and ethical problems. In some cases, the data is too perfectly regular or full of oddities, making it difficult to understand how it could have been generated by the experiment described by Guéguen.

Heathers and Brown have contacted the French Psychological Society (SFP) with the details of their concerns. The SFP informed Heathers and Brown about French regulations and offered to mediate with Guéguen. But, after nearly two years of receiving unsatisfactory responses from the researcher, the organization announced it had done all it could as a mediator.

Rather than go through more achingly lengthy official procedures, Heathers and Brown have opted to make the scientific community aware of the issues. They will be publishing the nitty-gritty details of their critique over numerous blog posts, and they shared an overview of the findings with Ars.

Strangely regular data

Social media is where it all kicked off, when Nick Brown saw a tweet about a paper claiming that men were less likely to help a woman who had her hair tied up in a ponytail or a bun. “That evening,” Brown told Ars, “I was talking to James about [something else entirely]” and mentioned the paper in passing. “And he kind of fell about laughing.”

When they looked more closely at the paper, something odd jumped out at them: the numbers in the paper looked strangely regular.

The study had tested whether women’s hairstyles influenced people’s inclination to be helpful. On a busy city street, a female collaborator wore her hair loose, in a ponytail, or in a bun, and dropped her glove while walking. The bystanders were given a score that indicated how helpful they had been—if they returned the glove, they got three points; if they warned the woman that she’d dropped it, that was two points; and if they did nothing, one point.

The paper reports that men were more likely to help her if her hair was loose, with an average helpfulness score of 2.8. Hair in a ponytail or bun, however, both had a helpfulness score of 1.8 from men. For women, it made no difference: hair in a ponytail or bun had a score of 1.6, and loose hair was slightly higher at 1.8. The difference wasn’t statistically significant.

The reported results looked a little odd to Brown and Heathers. To see why, take a look at the patterns that pop up in averages like these. Imagine you have three kids, and you want to see how many cookies they ate this week: Monica ate three; Mitchell ate six; and Ted ate three. You’d add up the total number of cookies (12) and divide that by the number of kids (3) to reach an average of four cookies per kid this week.

But if your total wasn’t such a nice, neat number—if Mitchell ate seven cookies instead, giving you a total of 13 cookies eaten—you’d get an average of 4.33 instead. Or if Mitchell ate eight cookies, you’d get an average of 4.67 (rounded up from 4.666). When you’re dividing by three, the decimal points will always follow this pattern: either .000, .333, or .666. If you divide by 30, the pattern just moves up a decimal place: the second decimal will always be 3 or 6.

In this study, every average score was divided by 30, because each group (male-ponytail, male-loose, female-bun, and so on) had 30 people in it. But every average number was perfectly round: 1.80, 2.80, 1.60. That’s … unlikely. “The chance of all six means ending in zero this way is 0.0014,” write Heathers and Brown in their critique.

Initially, the duo thought these figures might be the result of a mistake. If you initially rounded all your decimals to one place and then expanded them to two, you could end up with numbers like this. But in the same table, other figures were reported with two decimal places, not all ending in zero. For Heathers and Brown, that probably rules out error as an explanation.

So they next assumed that Guéguen's means were correct but strange and decided to look more carefully. “We sat and worked for about an hour together,” Brown told Ars. “And we realized that we could reconstruct the entire data set.”

Their technique for this was simple: they entered combinations of scores into Excel, changing them one number at a time until they produced the means and standard deviations reported in the paper. This technique was the genesis of two statistics-checking tools called GRIM and SPRITE, tools that were later used in the similarly peer-driven investigation of Brian Wansink’s headline-grabbing "Mindless Eating" work

What this turned up was even stranger. There was only one combination of scores that worked: for every condition, each score appeared 6, 12, 18, or 24 times. For example, women in the bun condition had 12 scores of 1, and 18 scores of 2. “The chances of this happening randomly for all six combinations of participant sex and hairstyle are [one in 170 million],” write Brown and Heathers.

The probability calculation makes certain statistical assumptions that might not be justified, but regardless of the precise odds, it’s certainly a surprising set of scores. In November 2015, they contacted Guéguen asking for this data. He sent it as a spreadsheet, and they found that their reconstruction was perfectly on the money.

Prolific publishing

When Brown and Heathers looked at other papers of Guéguen’s, they were struck by how prolific his publication record is: he publishes a huge number of articles per year. On many of them, he is listed as the sole author. For research that involves collecting data from hundreds of participants, this is an eye-wateringly high rate of publication, one most researchers might only dream about.

Guéguen did have help. His research routinely involves the use of experimenters gathering data in the field as well as “confederates”—research assistants who often act as members of the public or other research participants by disguising their role in the study.

However, many of the studies that involve fieldworkers and confederates don’t name these individuals as authors or acknowledge them in any way. It’s possible that this is nothing more than varying cultural norms, but it struck Brown and Heathers as unusual.

Their curiosity piqued, they started reading other Guéguen papers in more detail. They focused on papers Guéguen had written himself and published recently (within the last five years), and then narrowed their analysis down to 10 papers that had a variety of statistical and other problems.