“Researcher degrees of freedom” is an umbrella term for all the data-processing and analytical choices researchers make after seeing the data. It’s something I and many others are increasingly worried about. Insofar as your analytical choices–which data to include, which hypotheses to test, which predictor variables to include, etc.–depend on how the data happened to come out, you run a serious risk of compromising the severity of your statistical inferences (e.g., by inflating your Type I error rate). Not always of course. For instance, it’s legit to check whether your residuals conform to the assumptions of the analysis, and if not transform the data or otherwise modify the analysis to improve the residuals (not to “improve” the P value!). But increasing evidence suggests that researcher degrees of freedom compromises our analyses more often and more seriously than many researchers care to admit.

Recently, I had a rather scary thought: what if peer review is part of the problem here?

Think about it. What do reviewers often do, after having had a look at your results? They suggest alternative ways to process and analyze your data. They ask you to include or exclude certain data, because those desert sites are “obviously” going to be different or that data point is “clearly” an outlier or whatever. They suggest that you test your hypothesis using different analyses. They suggest that you analyze each subgroup of your data separately because there might be heterogeneity among subgroups. They question whether your result is mainly due to one or a few influential data points and so ask you to redo the analysis with those points excluded. They notice an apparent pattern in your data that you didn’t discuss, and ask you to test whether it’s significant. They ask you to include an additional predictor variable in your analysis, because from eyeballing Fig. X it looks like that variable might matter. Etc. etc. Probably most people who’ve written an ecology paper have gotten comments like this and not seen anything problematic about them. I mean, you might or might not agree with the comment, but you probably don’t consider it problematic to get this sort of comment. I certainly haven’t considered such comments problematic, until recently. And probably many of you have made such comments when acting as reviewers; I certainly have.

But aren’t these sorts of comments statistically problematic? Insofar as analytical decisions that aren’t pre-specified compromise your analyses, it doesn’t matter whether those non-pre-specified decisions are made by you or a reviewer. And crucially, reviewers don’t ordinarily ask that you respond to their analytical suggestions by collecting new data, planning in advance to analyze that new data as they’ve suggested. Rather, they suggest that you implement their analytical suggestions on the data you already have–the very same data that often inspired their suggestions in the first place.

I emphasize that I am not trying to concern troll here. I’m a big fan of pre-publication peer review, it continues to be a great thing for my own papers and for science as a whole. Seriously, this post is NOT an attack on pre-publication peer review! I also emphasize that it’s perfectly natural for reviewers to think about alternative analyses and suggest them to the author. That’s what you do after seeing someone’s results and thinking about them. I’m not questioning the statistical competence of reviewers who make these sorts of analytical suggestions (as I said, I’ve made such suggestions myself as a reviewer). Finally, I emphasize that some statistical suggestions reviewers make do not compromise statistical validity if followed by the authors. I am not saying that reviewers can’t ever legitimately question anything about the authors’ statistics after having read the paper! For instance, if the author made a flat-out statistical mistake, like treating a nested design as a factorial design, it’s totally legit for a reviewer to point that out, and for the author to redo the analysis correctly. If the residuals aren’t distributed in anything like the way assumed by the analysis, it’s totally legit for a reviewer to point that out and for the author to redo the analysis so as to fix the residuals. Etc. And there might even be times when it’s worth somewhat compromising statistical rigor at the request of a reviewer for the sake of some larger scientific goal. All I’m saying is that, if making data-dependent analytical choices can compromise one’s statistics–and it often does–then why should it matter if those data-dependent analytical choices are made by reviewers as opposed to authors?

If these sorts of reviewer comments are statistically problematic, I think that’s another argument for disclosure requirements. Requiring researchers to disclose all the data processing they did and all the analyses they did, including analyses that got “left on the cutting room floor” (e.g., exploratory analyses). And including all analyses performed at the request of reviewers.

I guess another option would be for authors to respond to reviewer requests for alternative analyses by saying something like “The reviewer makes an interesting suggestion. However, because the suggestion was not pre-specified, we are unable to pursue it in a statistically-rigorous way using the data reported in the ms.” That response would probably be best-justified if the authors had pre-registered their study design and planned statistical analyses. Indeed, if I understand correctly (please correct me if I’m wrong), that’s more or less how the authors of drug trials would be entitled to respond if a reviewer asked them to deviate from their pre-specified analytical plans (see here for background). I believe subatomic physics (e.g., the LHC folks who discovered the Higgs boson) is another field where data processing and analytical decisions are entirely pre-specified and aren’t ordinarily changed at the request of reviewers (again, please correct me if I’m wrong on this).

What do you think? Is peer review an important source of “researcher degrees of freedom”? If so, what if anything should be done about it?