Nobel laureate Richard Feynman once asked his Caltech students to calculate the probability that, if he walked outside the classroom, the first car in the parking lot would have a specific license plate, say 6ZNA74. Assuming every number and letter are equally likely and determined independently, the students estimated the probability to be less than 1 in 17 million. When the students finished their calculations, Feynman revealed that the correct probability was 1: He had seen this license plate on his way into class. Something extremely unlikely is not unlikely at all if it has already happened.

The Feynman trap—ransacking data for patterns without any preconceived idea of what one is looking for—is the Achilles heel of studies based on data mining. Finding something unusual or surprising after it has already occurred is neither unusual nor surprising. Patterns are sure to be found, and are likely to be misleading, absurd, or worse.

In his best-selling 2001 book Good to Great, Jim Collins compared 11 companies that had outperformed the overall stock market over the previous 40 years to 11 companies that hadn’t. He identified five distinguishing traits that the successful companies had in common. "We did not begin this project with a theory to test or prove," Collins boasted. "We sought to build a theory from the ground up, derived directly from the evidence."

He stepped into the Feynman trap. When we look back in time at any group of companies, the best or the worst, we can always find some common characteristics, so finding them proves nothing at all. Following the publication of Good to Great, the performance of Collins’ magnificent 11 stocks has been distinctly mediocre: Five stocks have done better than the overall stock market, while six have done worse.

In 2011, Google created an artificial intelligence program called Google Flu that used search queries to predict flu outbreaks. Google’s data-mining program looked at 50 million search queries and identified the 45 that were the most closely correlated with the incidence of flu. It's yet another example of the data-mining trap: A valid study would specify the keywords in advance. After issuing its report, Google Flu overestimated the number of flu cases for 100 of the next 108 weeks, by an average of nearly 100 percent. Google Flu no longer makes flu predictions.

An internet marketer thought it could boost its revenue by changing its traditional blue webpage color to a different color. After several weeks of tests, the company found a statistically significant result: apparently England loves teal. By looking at several alternative colors for a hundred or so countries, they guaranteed that they would find a revenue increase for some color for some country, but they had no idea ahead of time whether teal would sell more in England. As it turned out, when England’s webpage color was changed to teal, revenue fell.

A standard neuroscience experiment involves showing a volunteer in an MRI machine various images and asking questions about the images. The measurements are noisy, picking up magnetic signals from the environment and from variations in the density of fatty tissue in different parts of the brain. Sometimes they miss brain activity; sometimes they suggest activity where there is none.

A Dartmouth graduate student used an MRI machine to study the brain activity of a salmon as it was shown photographs and asked questions. The most interesting thing about the study was not that a salmon was studied, but that the salmon was dead. Yep, a dead salmon purchased at a local market was put into the MRI machine, and some patterns were discovered. There were inevitably patterns—and they were invariably meaningless.

In 2018, a Yale economics professor and a graduate student calculated correlations between daily changes in Bitcoin prices and hundreds of other financial variables. They found that Bitcoin prices were positively correlated with stock returns in the consumer goods and health care industries, and that they were negatively correlated with stock returns in the fabricated products and metal mining industries. “We don’t give explanations," the professor said, "we just document this behavior.” In other words, they may as well have looked at correlations of Bitcoin prices with hundreds of lists of telephone numbers and reported the highest correlations.