The Pilot

In November 2014, five Elsevier journals agreed to be involved in the Publication of Peer Review reports as articles (from now on, PPR) pilot. During the pilot, these five journals openly published typeset peer review reports with a separate DOI, fully citable and linked to the published article on ScienceDirect. Review reports were published freely available regardless of the journal’s subscription model (two of these journals were open access, while three were published under the subscription-based model). For each accepted article, all revision round review reports were concatenated under the first round for each referee, with all content published as a single review report. Different sections were used in cases of multiple revision rounds. For the sake of simplicity, once agreed to review, referees were not given any opt-out choice and were asked to give their consent to reveal their identity. In agreement with all journal editors, a text was added to the invitation letter to inform referees about the PPR pilot and their options. At the same time, authors themselves were fully informed about the PPR when they submitted their manuscripts. Note that while one of these journals started the pilot earlier in 2012, for all journals the pilot ended in 2017 (further details as SI).

Figure 1 shows the overall submission trend in these five journals during the period considered in this study. We found a general upward trend in the number of submissions, although this probably did not reflect-specific trends due to the pilot (see details in the SI file).

Fig. 1 Number of monthly submissions in the pilot journals Full size image

Following previous studies18, in order to increase the coherence of our analysis, we only considered the first round of review, i.e., 85% of observations in our dataset. For observation, we meant any relevant event and activity that were recorded in the journal database, e.g., the day a referee responded to the invitation or the recommendation he/she provided (see Methods)

Willingness to review

We found that only 22,488 (35.8%) of invited referees eventually agreed to review, with a noticeable difference before and after the beginning of the pilot, 43.6% vs. 30.9%. However, it is worth noting that while the acceptance rate varied significantly among journals, there was an overall declining trend, possibly starting before the beginning of the pilot (Fig. 2).

Fig. 2 Proportion of referees who accepted the editors’ invitation by journal. Thicker curves show smoothed fitting of the data (Loess) for each journal. The last 6 months were removed from the figure due to few observations Full size image

Descriptive statistics also highlighted certain changes in referee profile. More senior academic professors agreed less to review during the pilot, whereas younger scholars, with or without a Ph.D. degree, were more keen to review. We did not find any relevant gender effect (Fig. 3).

Fig. 3 Gender and status distribution of referees by review condition. Error bars represent 95% CI obtained via bootstrap (1000 samples) Full size image

The first impression was that the number of potential referees who accepted to review actually declined to do so in the pilot. However, considering that the number of review invitations increased over time, this may have simply reflected the larger number of editorial requests. To control for these possible confounding factors, we estimated a mixed-effect logistic model with referees’ acceptance of editors’ invitation as outcome. To consider the problem of repeated observations on the same paper and the across-journal nature of the dataset, we also included random effects for both the individual submission and the journal. Besides the open review dummy, we estimated fixed effects for the year, where the start date of the dataset was indicated as zero and each subsequent year by increasing integers, the referee’s declared status, with “professor”, “doctor” and “other” as levels, and the referee’s gender, with three levels, “female”, “male” and “uncertain” (in case our text mining algorithm did not assign a specific gender). The year variable allowed us to control for any underlying trend in the data, such as the increased number of submissions and reviews, or the increased referee pool. Furthermore, to check whether the open review condition had a different effect on specific sub-groups of referees, we estimated fixed effects for the interaction between this variable and the status and gender of referees (Table 1).

Table 1 Mixed-effects logistic model on the acceptance of editors’ invitation by referees Full size table

Results suggest that the apparent decline of review invitation acceptance simply reflected a time trend, which was independent of the open review condition and probably due to the increasing number of submissions and requests. The pure effect of the open review condition was not statistically significant. Furthermore, although several referee characteristics had an effect on the willingness of reviewing, only the interaction effect with the “other” status was significant. Referees without a professor or doctoral degree, and so probably younger or non-academic, were actually more keen to review during the pilot. However, by comparing the pilot with a sample of five comparable Elsevier journals, we found that this decline of willingness to review was neither journal-specific nor trial-induced, i.e., influenced by open peer review (see Supplementary Tables 1–3 and Supplementary Figure 1).

Recommendations

The distribution of recommendations changed slightly during the pilot, with more frequent rejections and major revisions (Fig. 4). On the other hand, the distribution of recommendations by referees who accepted to have their names published with the report was noticeably different, with many more-positive recommendations. Given that revealing identity was a decision made by referees themselves after completing their review, it is probable that these differences in recommendations could reflect a self-selection process. Referees who wrote more-positive reviews were more keen to reveal their identity later as a reputational signal to authors and the community. However, it is worth noting that only a small minority of referees (about 8.1%) accepted to have their names published together with their report.

Fig. 4 Proportion of recommendations by review condition and name disclosure. Error bars represent 95% CI obtained via bootstrap (1000 samples) Full size image

In order to control for time trends and journal characteristics, we estimated another model, including the open review dummy and all relevant interaction effects. As the outcome was an ordinal variable with four levels (reject, major revisions, minor revisions, accept), we estimated a mixed-effect cumulative-link model including the same random and fixed effects as the previous model. Table 2 shows that the pilot did not bias recommendations. Among the various referee characteristics, only referee status had any significant interaction effect, with younger and non-academic referees (i.e., the “other” group) who submitted on average more positive recommendations. Note that these results were confirmed by our robustness check test with five comparable Elsevier journals not involved in the pilot (Supplementary Table 2).

Table 2 Mixed-effects cumulative-link model on referee recommendations Full size table

Review time

We analysed the number of days referees took to submit their report before and after the beginning of the pilot. Previous research suggests that open peer review could increase review time as referees could be inclined to write their reports in more structured and correct language, given that they are eventually published8. The average 28.2 ± 4.6 days referees took to complete their reports before the pilot increased to 30.4 ± 4.4 days during it. However, after estimating models that considered the increasing number of observations over time, we did not find any significant effect on turn-round time (see Table 3). When considering interaction effects, we only found that referees with a doctoral degree tended to take more time to complete their report, but differences were minimal. Note that results were further confirmed by analysing five comparable Elsevier journals not involved in the pilot (Supplementary Table 3).

Table 3 Mixed-effects linear model on the time (days) used by the referees to complete the review Full size table

Review reports

In order to examine whether the linguistic style of reports changed during the pilot, we performed a sentiment analysis on the text of reports by considering polarity—i.e., whether the tone of the report was mainly negative or positive (varying in the [−1, 1] interval, with larger numbers indicating a more positive tone)—and subjectivity—i.e., whether the style used in the reports was predominantly objective ([0, 1] interval, higher numbers indicating more subjective reports). A graphical analysis showed only minimal differences before and during the pilot, with reviews only slightly more severe and objective in the open peer review condition (Fig. 5).

Fig. 5 Distribution of polarity and subjectivity in the report text before and during the pilot. Note that for polarity, the interval was [−1, 1], larger numbers indicating a more positive tone, while for subjectivity the interval was [0, 1], higher numbers indicating more subjective reports Full size image

Two mixed-effects models were estimated using the polarity and subjectivity indexes as outcome. The pilot dummy, the recommendation, the (log of) the number of characters of the report, the year, and the gender and status of the referees (plus interactions), respectively, were included as fixed effects. As before, the submission and journal IDs were used as random effects. Table 4 shows that the pure effect of open review was not significant. However, we found a positive and significant interaction effect with gender. Indeed, male referees tended to write more-positive reports under the open review condition, although this effect was statistically significant only at the 5% level. However, considering the large number of observations in our dataset, any inference to open peer review effects from such a significance level should be considered cautiously19.

Table 4 Mixed-effects linear model on the polarity of review reports Full size table

When testing a similar model on subjectivity, we only found that younger and non-academic referees were more objective, whereas no significant effect was found for other categories (Table 5).