Does Correlation Imply Causation After All? 1

Gregory Hill 2, Monash University

Extended Abstract

One of the most persistent untested maxims in the statistical sciences is "correlation does not imply causation". This adage is taught widely to students and practitioners, in order to prevent the logical fallacy of cum hoc ergo propter hoc ("with this, therefore because of this"). In short, it is the idea that a simple statistical correlation between factors is not sufficient to infer that a causal relationship exists. The argument is that such a result may be interesting but at best indicates further research is warranted before causality may be claimed. As Edward Tufte explains "Correlation is not causation but it sure is a hint."

This study examines this maxim through a rigorous meta-analysis of scientific articles published in Science, Nature and other leading journals, from 1970 to 2005 (see Appendix A). Articles were included in the meta-analysis if they cited both an estimate of correlation (in the form of the Pearson Product-Moment Correlation Co-efficient) and causation (as determined by the original researchers). Thresholds for statistical significance levels (alphas) and p-values were not used, as publication in these prestigious journals is sufficient to ensure the quality of results.

From the 7,234 articles in the catchment, 655 were selected for this study. Each article was classified as C (causation was found) or N (causation not found). The articles were then "binned" into twenty uniform intervals based on their Pearson's rho statistic: 0.00-0.05, 0.05-0.10, ..., 0.95-1.00. The following results were obtained, where rho is the independent variable, and the proportion finding causation (C/C+N) is the dependent variable.