Abstract

The rise of scientific fraud has drawn significant attention to research misconduct across disciplines. Documented cases of fraud provide an opportunity to examine whether scientists write differently when reporting on fraudulent research. In an analysis of over two million words, we evaluated 253 publications retracted for fraudulent data and compared the linguistic style of each paper to a corpus of 253 unretracted publications and 62 publications retracted for reasons other than fraud (e.g., ethics violations). Fraudulent papers were written with significantly higher levels of linguistic obfuscation, including lower readability and higher rates of jargon than unretracted and nonfraudulent papers. We also observed a positive association between obfuscation and the number of references per paper, suggesting that fraudulent authors obfuscate their reports to mask their deception by making them more costly to analyze and evaluate. This is the first large-scale analysis of fraudulent papers across authors and disciplines to reveal how changes in writing style are related to fraudulent data reporting.