Tie together an algorithm, an exchange-traded fund and an academic study finding an anomaly in the markets, and voilà! You have a formula for making money. Trouble is, it turns out that most of the supposed anomalies academics have identified don’t exist, or are too small to matter.

A new study making waves in quantitative finance tested 447 anomalies identified by academics and found more than eight out of 10 vanish when rigorous tests are applied. Among those failing to reach statistical significance: one anomaly recently set out by the godfathers of quantitative finance, Nobel-winning economist Eugene Fama and his colleague Kenneth French.

The study, “Replicating Anomalies,” published this week by Kewei Hou and Lu Zhang at Ohio State University and Chen Xue at the University of Cincinnati, is the biggest test of examples of inefficient markets carried out so far. The trio applied consistent analysis to the supposed anomalies, used the same database of stocks and set higher standards for statistical significance. Simply reducing the influence of the plethora of rarely traded penny stocks—which make up just 3% of market value but 60% of all listings—by using market capitalization weightings made more than half of past findings no longer significant.

Messrs. Hou, Xue and Zhang warn that academics have been fiddling the statistics to come up with interesting findings, known to statisticians as data mining or p-hacking. “The anomalies literature is infested with widespread p-hacking,” they write.

It isn’t all bad news for investors and those trying to make a living flogging what have become known as “factors.” The research confirmed that the most popular factors have indeed outperformed the market over long periods even when faced with rigorous tests, but found much smaller returns than previous studies estimated.