As “Deflategate” rattles the National Football League in the run-up to this year’s Super Bowl, data analysts have swooped in, including Warren Sharp, one of many self-styled football analysts who blog about the topic. In a Slate article he analyzes the fumbling rate of the New England Patriots — the team accused of purposefully underinflating footballs to gain an advantage. The headline to his analysis calls the Patriots’ fumble rate compared to the rest of the league “nearly impossible.”

Sharp, you might think, found the smoking gun — a statistic that proves that the Patriots cheated. Only a patient reader who persists to the last paragraph will see that Sharp ultimately admits that New England’s spectacular performance on the metric could be explained in any number of ways, including legitimate ones like perfecting ball security techniques or practicing prevention.

In short, the data say the Patriots are excellent at preventing fumbles. It says nothing about why.

This distinction represents one of big data analysis’ most under-appreciated problems: talking about reverse causation. In reverse causation problems, we know the result and we work backwards to understand the causes.

Reverse causation investigations have the opposite structure from A/B tests, in which we vary known causes, and observe how the variations affect an outcome. If the number of visitors to your website jumped after you changed the image on your Facebook page, you conclude that the new photo is the reason for the traffic surge. (Note: Good A/B test construction can help you see most likely causes; bad A/B test construction creates its own set of problems.).

By contrast, the biggest obstacle to solving reverse causation is the infinite number of possible causes that might influence the known outcome. This is compounded by the fact that we want to assign a cause. So when some data is plucked out of a large set that fits a narrative we may have already constructed, it’s very tempting to simply assign causation when it doesn’t exist.

Most of the time, though, the data offer hints, but no proof. Sharp’s article on the Patriots is one such case. When reading this style of data journalism, pay attention to the structure of the statistical argument. Here is how I summarize Sharp’s:

New England is an outlier in the plays-per-fumbles-lost metric, performing far better than any other team (1.8x above the NFL team average).

Different ways of visualizing and re-formulating the metric yield the same conclusion that New England is the outlier.

There is a “dome effect.” Teams whose home stadiums are indoors typically suffer 10 fewer fumbles than the outdoors teams. New England is a non-dome team that surpasses most dome teams on plays-per-total-fumbles. If dome teams are removed from the analysis, New England is a statistical outlier.

Assuming that the distribution of the metric by team is a bell curve, the chance that New England could have achieved such an extraordinary level of play per fumbles lost is extremely remote.

Therefore, it is “nearly impossible” for any team to possess such an ability to prevent fumbles … unless the team is cheating.

Points 1 to 4 are essentially slightly different reiterations of the known outcome. It is point five in which a connection is established between that outcome and its cause(s). But the causal link is tenuous at best. However suggestive, the data does not prove intent or guilt. It simply describes a statistical phenomenon.

Indeed, digging in on the Patriots data shows that they may not be much of an outlier. In the “dome” analysis, Sharp switched from looking at fumbles lost to total fumbles (which includes recovered fumbles). Other football data analysts have concluded (more than halfway down the page) that fumble recovery is mostly random, so plays per total fumbles is the more useful metric.

Given this new measure, the Patriots are not an outlier, as they’re second to the Atlanta Falcons in fumble performance. Only when Sharp removed all dome teams (the Falcons being one) could he argue that the Patriots were an outlier.

Sharp showed that it is almost impossible for an average team to attain such a low fumble rate, but we have no data that proves the Patriots or any particular team couldn’t achieve it in a legal way. And in fact, the dome analysis suggests there are legitimate methods to perform equally or slightly better than the Patriots did — just look at the Falcons. Unless you want to allege the Falcons also tampered with footballs. (Others have also since refuted this fumbles-prove-malicious-behavior narrative and corrected what seems to be a major flaw in Sharp’s approach: eliminating dome teams from analysis, intead of dome games. When that change is made, the Patriots seem to perform well, but not strangely well; not even the best).

To his credit, Sharp did not argue point five. Nevertheless, many readers and incurious reporters made this causal leap. Sharp helped them along by using a loaded phrase “nearly impossible” to sell the story.

And that’s the reverse causation problem we face. Big data is exposing all kinds of outliers and trends we hadn’t seen before and we’re assigning causes somewhat recklessly, because it makes a good story, or helps confirm our biases. You see this all the time in your Twitter stream: “7 Charts that Explain This.” Or “The One Chart that Tells You Why Something Is Happening.” We’re getting better and better at analyzing and visualizing big data to spot coincidences, outliers and trends. It’s getting easier and easier to convince ourselves of specific narratives without any real data to support them.

Most good statistical analysis will be narratively unsatisfying, loaded down with “we don’t know,” “it depends,” and “the data can’t prove that.”

You can see how this can become a big problem for companies wanting to exploit the big data they’re amassing. If you think about most practical data problems, they often concern reverse causation. The sales of a particular product suddenly plunged; what caused it? The number of measles cases spiked up in a neighborhood; how did it happen? People with a certain brand of phone tend to shop at certain stores; why is that? In cases like these, we know the outcome, and we often don’t know the cause.

The possibility of any number of causes tempts us to retrofit a narrative but we must resist it. The astute analyst is one who figures out how to bring a manageable structure to this work. See this post by statistician Andrew Gelman for further thoughts.

In the mean time, maintain a healthy skepticism the next time someone suggests they’ve found causation in the reverse. Their claims may be overblown.

(Editor’s Note: This article is an edited version of a post that originally appeared on the author’s blog.)