Sometimes it just jumps out at you (Image: Darren Greenwood/Design Pics/REX)

Diederik Stapel, the infamous “lying Dutchman” who in 2011 admitted to inventing the data in dozens of psychology research papers, unwittingly signalled his deceit through the language he used. As well as inflating the certainty surrounding his results, Stapel included more science-related terms to describe his methods when writing up his fraudulent “findings” than when describing genuine results.

Researchers who have analysed Stapel’s papers say they can separate his genuine research from the fictional with about 70 per cent accuracy. Now they are studying a larger sample of papers from many different scientific fraudsters, to see if the detection method works more generally.

Jeff Hancock‘s team at Cornell University in Ithaca, New York, has previously studied the language used by liars in situations including politics and online dating. When US presidents make false statements, for instance, they tend to use negative words such as “fear” or “doom” more frequently.


Leaking language

“Lying is a very stressful act,” says David Markowitz, a member of the team. “This anxiety sometimes leaks through into people’s language.”

Context matters: when presidents lie on the subject of war, they use fewer personal pronouns like “I” and “me”. But people who write deceitful online dating profiles actually use these pronouns more than those who tell the truth.

Markowitz and Hancock suspected that there may be specific linguistic tics that signal deceit in science. Stapel’s outrageous fraud provided the ideal testing ground. “He produced a tremendous amount of writing,” says Markowitz. “And the fact that he was investigated so closely provided us with a unique opportunity.”

So the pair selected 24 of Stapel’s papers now known to be fraudulent, and a further 25 that have withstood official scrutiny. They chose only papers of which Stapel was the first author listed – indicating that he actually wrote the paper.

Stapel, who worked at Tilburg University in the Netherlands, used more “amplifiers” – words like “profoundly” and “extreme” – in his fraudulent papers, and fewer “diminishers” – like “merely” and “somewhat”.

“He tried to overvalue the fraudulent research,” suggests Markowitz, who is now investigating whether this pattern holds true for other scientists who have been forced to retract fraudulent papers.

Screened by machine

If it does work more widely, it might be useful for policing the scientific literature. It couldn’t provide firm evidence of fraud, but might help flag research labs turning out large numbers of suspicious papers, prompting closer investigation.

Still, the current false-positive rate of about 30 per cent means that there would be many false leads.

“It’s not really good enough as a screening tool,” says Harold Garner of Virginia Tech in Blacksburg, who has developed software to screen published papers for examples of plagiarism.

However, Markowitz hopes that it will be possible to improve accuracy by employing machine learning – using examples of fraudulent and genuine scientific papers to train algorithms to detect subtle differences in the way that they are written.

Journal reference: PLoS ONE, DOI: 10.1371/journal.pone.0105937