Controlled experiments commonly find that the performance of women is scored more negatively than that of men when no actual difference exists. However, the extent to which such gender biases influence editors and peer reviewers remains uncertain. Despite a few high‐profile examples, most studies find no gender difference in the outcomes of peer review at academic journals, though there are some notable exceptions. In our study, we find that papers with female first authors are equally likely to be sent for peer review as are papers with male first authors, but they obtain slightly lower peer‐review scores and are less likely to have a positive outcome after peer review, though the magnitude of this gender difference varied among journals. The gender differences in both peer‐review scores and editorial decisions appear to be partly due to gender differences in authorial roles. Papers for the which the first author deferred corresponding authorship to a coauthor obtained (on average) substantially lower peer‐review scores and were less likely to have positive outcomes. Gender differences in corresponding authorship explained some of the gender differences in peer‐review scores and the frequency of positive editorial decisions. After publication, we also find that published papers with female first, last, or single authors are cited less often than those with male authors.

4.1 Gender differences in peer‐review outcomes

Our analyses uncover differences in editorial and peer‐review outcomes between papers authored by men and those authored by women. Though many of our individual analyses found no significant gender differences, the effects are consistently in the same direction: Papers with female authors obtain lower peer‐review scores and have lower probabilities of positive editorial decisions, than do papers with male authors. Effect sizes varied throughout stages of the process and across journals but, cumulative from submission through to the editorial decision, papers with female authors were, on average, 4 to 9% less likely (depending on author position; Table 1) to be invited for revision and/or resubmission than were papers with male authors (female:male success ratios of 0.96 to 0.91, averaged across journals and years).

Table 1. The cumulative disparity in relative success rates for papers authored by women compared to men Relative probability of positive outcome cumulative through entire review process Female/Male authors Revision invited Revision or resubmission invited First author 0.925 ± 0.045 0.958 ± 0.020 Senior author 0.948 ± 0.033 0.905 ± 0.026 Corresponding author 0.914 ± 0.039 0.963 ± 0.026

Our conclusion, that papers authored by women are less likely to have positive outcomes, contrasts with the conclusions of many previous studies of peer review at academic journals, albeit with some exceptions (summarized in the Introduction, above). Though most studies conclude that men and women have equal success rates at journals, many of these studies observe trends toward papers with male authors being more likely to be accepted for publication (e.g., 7%–12% more likely in Heckenberg & Druml, 2010; Primack et al., 2009), as reported here. Indeed, a previous study of the journal Functional Ecology (Fox et al., 2016), one of the journals included in the current study, observed trends similar to those reported here, though none were statistically significant.

We draw two important conclusions from this variation in conclusions among research studies. First, the presence of gender differences and magnitude of effects almost certainly vary among disciplines and journals. Second, very large sample sizes are necessary to detect small but meaningful (e.g., 5%–10%) gender differences in peer‐review outcomes. This is because of the tremendous amount of background variation due to heterogeneity in manuscript quality and in editor and reviewer populations. The large sample size of the current study, >23,000 papers submitted to six journals, provides the statistical power necessary to detect gender differences in the range of 5%–10%. It is notable that the previous studies that have provided the most compelling evidence of gender differences in peer review are of similarly large scale. For example, of >23,000 papers submitted to eLife, those authored by women were ~12% less likely to be accepted for publication than those authored by men (Murray et al., 2018). Similarly, of >8,500 manuscripts submitted to three Frontiers journals (Walker et al., 2015), papers authored by women obtained lower peer‐review scores than papers authored by men. However, at least one large study found the opposite; in an analysis of >22,000 papers submitted to journals of the American Geophysical Union, Lerback and Hanson (2017) found that papers authored by women had ~7% higher acceptance rates. It is thus clear that gender discrepancies vary a lot among journals, both within and among studies, and that large sample sizes are necessary to detect these differences when they exist.

What explains the discrepancy in success rates between men and women in our study? One possibility is that reviewers and/or editors discriminate against papers by female authors during their assessments of manuscript quality, novelty, or significance. Biases in which the performance or products of women are evaluated less positively than that of men have been demonstrated in a wide variety of contexts (discussed above). Unfortunately, our data do not allow us to directly test for unconscious or conscious biases because we have no independent metrics of manuscript quality and significance. Explanations other than gender discrimination could contribute to the gender disparities observed here. For example, women defer submission of their manuscripts to collaborators more often than do men and might use different criteria for evaluating the journals to which they send their papers (Regazzi & Aytac, 2008), such that submitted papers are, on average, slightly different between male and female authors. Though we cannot test these hypotheses, the importance of considering alternatives to gender discrimination is highlighted by Ledin et al. (2007). They observed that gender differences in success rate at obtaining fellowships from the European Molecular Biology Organization persisted when committees were blinded to applicant gender. Though not directly comparable to our study, in part because fellowship applications are reviewing applicant productivity rather than manuscript quality, the results of Ledin et al. (2007) highlight that gender differences in success rates can arise from factors other than discrimination (but see Witteman, Hendricks, Straus, & Tannenbaum, 2019 for a counterexample). Our results are highly suggestive of a problem, but hypotheses to explain the gender discrepancies observed here can only be tested with controlled experiments. In particular, we argue that a controlled experiment in which real journal submissions are randomly assigned to blind versus nonblind peer review, should be performed by one or more ecology journals, to test for gender discrimination (and other potential biases) in editorial and peer review. Such an experiment has recently been announced by the journal Functional Ecology (one of the journals considered in our study)(Fox et al., 2019). Similar experiments have been performed by nonecological journals, but few (Blank, 1991; Carlsson et al., 2012; Ross et al., 2006) have tested for evidence of gender discrimination in nonblinded manuscripts.

One striking result of our analyses is that papers for which the first author is also the corresponding author perform much better throughout all stages of the manuscript review process. Such papers were 18% more likely to be sent for peer review, obtained higher scores from reviewers, and were 10% more likely to be invited for revision or resubmission after review, with a cumulative 30% higher probability of a positive outcome across the entire review process. This is a strikingly large effect that warrants further investigation. We think it unlikely that biases against authors who defer corresponding authorship can explain an effect this large. Instead, we suspect the low success of papers being corresponded by someone other than the first author is because either: (a) These papers are being written, at least in part, by someone less familiar with (or less committed to) the research being described in the manuscript, such as a research mentor or a colleague more fluent in English; or (b) first authors are more willing to defer corresponding authorship when a paper is of lower significance and/or reports less robust research. Regardless of the explanation, this difference may be important for understanding gender differences in publishing success because women defer corresponding authorship more often than do men (Edwards et al. 2018; Fox et al., 2018), possibly because they are more likely than men to leave science (Adamo, 2013). Our results suggest that the gender difference in corresponding authorship contributes to the gender difference in peer‐review outcomes; including corresponding authorship in our statistical models causes first author gender differences to become statistically nonsignificant (cumulative through the entire process). However, the degree to which considering corresponding authorship changes estimated female:male success ratios is small, suggesting that gender differences in corresponding authorship, although possibly a contributing factor, are not enough to account for all of the observed gender differences in peer‐review outcomes.

Our analysis of peer‐review outcomes is limited to just six journals for which we have detailed data on all submissions. To better understand potential gender biases across the entire ecology literature, we tested for gender differences in a dataset collected via an author survey of manuscript publication histories. In this survey, we asked authors to provide the complete submission history for their published paper—to which journal each manuscript had previously been submitted and the outcomes of each separate submission. Interestingly, we observe no evidence a gender difference in the author‐reported outcomes of editorial review in this survey data; papers by female authors were no more likely to report having been rejected by one or more journals before eventual publication. The effect size, averaged across journals, was very close to 0 (Figure 6), with the sign of the effect opposite that in our peer‐review dataset; that is, not even suggestive of bias against papers authored by women. One possible explanation for the difference in conclusion between these two datasets is that papers about which we survey authors include only the subset of papers that are eventually published, and thus represent a biased sample of all papers that are reviewed; papers that are rejected from one journal and never published anywhere are unknown to us, and thus not included in our sample. If women are less likely than men to resubmit their paper (to another journal) following rejection, the rejection rate observed in our survey data could be biased against detecting rejections of papers with female authors. Some evidence suggests that women respond differently to social and peer rejection than do men (Stroud, Salovey, & Epel, 2002; Vanderhasselt, Raedt, Nasso, Puttevils, & Mueller, 2018), though it is unclear if this occurs in the academic publishing context. Also, because women leave science more often than do men (Adamo, 2013), they may not be able or willing to resubmit papers following rejection. Alternatively, men and women may respond differently to the survey itself. Estimated rejection rates from survey responses underestimate rejection rates obtained directly from individual journals (Paine & Fox, 2018). This suggests that our survey is either missing a population of papers that were submitted, rejected, and never eventually published, or that authors who had more positive experiences with their manuscripts are more likely to reply (survey response rates were higher for papers with male first authors, 21.3% vs. 17.5% for male vs. female authors). Or, possibly, the difference in conclusions reached from these two datasets (our submitted papers dataset including just six journals vs. the survey dataset including all ecology journals) may indicate that gender difference observed at these six journals does not extend to the ecology literature more broadly, though we think this unlikely.