We conducted two previous trials [15, 16], in which we found partially positive results from adding statistical reviewers and RGs to the peer review process. The first one was conducted in 2007 and called the “Improve Quality” (IQ) study [15], in which we randomly allocated 129 suitable manuscripts into 4 intervention groups (Fig. 1a). Unfortunately, after peer review, 16 manuscripts were rejected and 14 were lost to follow-up. Those losses introduced unpredictable (attrition) bias [22, 23] and may have affected the estimates.

Fig. 1 Scheme of the allocation of interventions of IQ and ET studies. Groups not included in the main analysis are in a shaded style. R = reference; C=Checklist; S=Statistician; SC = both Checklist and Statistician Full size image

The second trial was the 2011 “Enhance Transparency” (ET) study [16], in which we randomized 92 manuscripts either to both a statistical review and RGs or to neither (Fig. 1b). In both the IQ and ET studies, the main outcome was an assessed rather than measured endpoint. As masked evaluators were able to guess the intervention arm more often than could be ascribed to chance, partially unblinded evaluators could have introduced detection bias in both studies [8].

Due to these limitations, and in order to assess the long-term impact of those interventions, we adopted a new main outcome: the number of citations that each paper received on the WoS from publication up to December 312,016, with our hypothesis being that greater transparency and more comprehensible reporting may facilitate an increase in citations.

The IQ study divided the papers into 4 groups as a result of combining the two interventions into a 2 × 2 factorial design: a suggestion to the reviewers to employ an abridged checklist for the evaluation of basic biomedical research papers (C) [24]; and adding a statistician (S) from the reviewer panel list. Consequently, the 4 groups were defined as: papers which received a standard review process (reference), papers which received a review process using a local checklist (C), papers which received a standard review process and a revision from a statistician (S) and papers which received a standard review process and a revision from a statistician using a local checklist (SC). The reference intervention followed the usual journal process based on 1–3 reviewers. In order to combine those results with those of the ET study, only the 24 papers allocated to the group with both interventions (C and S) and the 27 allocated to the reference group (neither C nor S) were now included in the main analysis.

The ET study modified this design in 3 ways: first, by relying on just one senior methodological expert rather than choosing a statistical reviewer from an expert list; second, by combining both interventions, with the senior methodological reviewer proposing specific changes based on relevant international reporting guidelines; and, third, it avoided attrition by delaying the intervention until the decision had been made on whether or not to publish.

Masked to the intervention group, one of us (MV) collected from WoS the number of citations that the ET and IQ articles received. A search was made using the website’s search tab and including 3 references: (1) the publication name, “Medicina Clinica (Barcelona)”; (2) the publication year (either 2004 to 2005 or 2009 to 2010); and, (3) either the article’s title or by searching for the topic in order to consider posterior changes to the title (between the submitted and finally published version). Baseline MQAI and study group were obtained from the data of the ET and IQ studies.

We aim to estimate the ratio of the average citation-per-year between intervention arms (which we refer to in this paper as “mean citation ratio”). As the data did not fit to the distributional assumptions of the previously masked specified Poisson model, our main analysis relies on the more robust Jackknife method, which provides wider and more conservative intervals. As sensitivity analyses, we also report alternative analyses such as the previously mentioned Poisson model (Sections 2 to 4 of SM).

Additional collected variables are described in Section 1 of SM. Section 6 of SM and the master’s thesis of the first author [25] show the results of other exploratory data analyses that were previously performed with shorter follow-up.

Analyses were performed using R software version 3.2.1.

Availability of data and materials

The dataset supporting the conclusions of this article is available at https://www-eio.upc.edu/redir/NumberCitations, where researchers can: (1) reproduce the results of our analysis; (2) check our data at the Web of Science [19] as of December 2016; and (3) update the number of citations in order to replicate our results with a longer follow-up. The critical scientist can try to reproduce both our outcome measurements and analyses.