Evidence suggests that women in academia are hindered by conscious and unconscious biases, and often feel excluded from formal and informal opportunities for research collaboration. In addition to ensuring fairness and helping to redress gender imbalance in the academic workforce, increasing women’s access to collaboration could help scientific progress by drawing on more of the available human capital. Here, we test whether researchers tend to collaborate with same-gendered colleagues, using more stringent methods and a larger dataset than in past work. Our results reaffirm that researchers co-publish with colleagues of the same gender more often than expected by chance, and show that this ‘gender homophily’ is slightly stronger today than it was 10 years ago. Contrary to our expectations, we found no evidence that homophily is driven mostly by senior academics, and no evidence that homophily is stronger in fields where women are in the minority. Interestingly, journals with a high impact factor for their discipline tended to have comparatively low homophily, as predicted if mixed-gender teams produce better research. We discuss some potential causes of gender homophily in science.

Data Availability: The raw input data from Holman et al. [ 5 ] is archived at https://osf.io/bt9ya/ , and all the derived data are contained within the paper and its Supporting Information files. The R scripts used to produce all results, figures and tables are freely available at https://github.com/lukeholman/genderHomophily , and a report explaining this R code and its outputs can be viewed online at https://lukeholman.github.io/genderHomophily/ .

In the present study, we test whether life sciences researchers tend to co-publish with same-gendered colleagues, while controlling for the Wahlund effect as strictly as we are able. We use a recently-published dataset describing the gender of 35.5 million authors from 9.15 million articles indexed on PubMed [ 5 ]. Holman et al. [ 5 ] reported large differences in the gender ratio of authors across research disciplines, journals, countries, and across the years 2002-2016. We therefore tested for gender homophily while restricting our analysis to particular journals (a proxy for research specialties), time periods, and countries. We quantified gender assortment using a metric called α′ [ 60 ], which is positive when same-gender authors publish together more often than expected (gender homophily), negative when opposite-gender authors publish together more often than expected (heterophily), and equal to zero when authors assort randomly with respect to gender (see Methods ).

Here, coloured circles represent male and female authors, and coauthors are linked with lines. Across the whole set of ten papers, there is an apparent excess of same-gender collaborations: there are six same-gender papers and only four mixed-gender papers, which is fewer than the 10 × 2 × 0.5 × 0.5 = 5 mixed-gender papers expected under the null hypothesis that authors assort randomly. However, within each subset, there is no evidence that authors prefer to publish with same-gendered individuals (if anything, this small dataset suggests gender heterophily). The Wahlund effect will tend to inflate the frequency of same-gender coauthorships whenever the data is composed of two or more disconnected subsets of literature with different author gender ratios; these subsets could be research disciplines, older versus newer papers, or papers from authors based in different countries. The example countries and disciplines were selected based on data in [ 5 ].

We believe that most or all earlier studies of gender homophily were hindered by a largely unacknowledged statistical issue that we will refer to as the Wahlund effect ( Fig 1 ), by analogy with the conceptually similar Wahlund effect in population genetics [ 59 ]. The Wahlund effect makes it deceptively difficult to test for gender-based co-author choice simply by counting the relative number of same- and mixed-gender coauthorships. Essentially, the Wahlund effect means that whenever coauthorship data are sampled from two or more discrete sets of literature, which vary in the author gender ratio and which are largely unconnected by collaboration, the number of same-gendered coauthors will be inflated. This can give the impression that authors preferentially publish with same-gendered colleagues even if no gender preferences exist, or if the true preference is for opposite-gendered colleagues (‘gender heterophily’). For example, a sample of literature containing a mixture of bioinformatics and cell biology papers will probably contain an excess of mostly-male and mostly-female author lists, simply because researchers usually collaborate within their own discipline, and because the author gender ratio is more male-biased in bioinformatics than in cell biology [ 5 ].

A high, steadily increasing proportion of research papers is written by more than one author [ 3 ], making collaboration a key predictor of publication output, and thus of career prospects [ 40 , 41 ]. Additionally, empirical studies imply that mixed-gender or otherwise diverse teams produce better outputs on collaborative tasks than less diverse teams [ 42 – 48 ]. For reasons such as these, multiple studies have examined the author lists of published research articles in order to test for gender differences in collaboration frequency or pattern. To our knowledge, most or all such studies imply that men co-publish with men, and women with women, more often than expected if collaborators assort randomly with respect to gender [ 49 – 58 ]. This non-random assortment is often termed ‘gender homophily’.

Writing papers, networking, and collaboration are all instrumental to research productivity and academic career advancement [ 22 – 25 ], and dozens of studies have tested for gender differences in these areas [ 5 , 26 – 29 ]. For example, studies have concluded that women tend to be less involved in international collaboration [ 19 , 28 , 30 – 32 ], collaborate less within their own university departments [ 31 ], have less prestigious collaborations [ 33 ], and fewer collaborations in total [ 34 ]. These gender differences in collaboration presumably have multiple causes, which might include implicit and explicit gender bias [ 20 ], differential family obligations [ 33 , 35 , 36 ], gender differences in confidence or self-esteem [ 37 ], concerns relating to sexual harassment [ 38 ], and unequal access to conferences [ 39 ] or travel funds [ 32 ].

Lastly, we note that if there is a gender gap between career stages and coauthorships between early-career and established researchers comprise >50% of the total, then the baseline expectation for α is actually less than zero (blue areas in Fig 5 ). Therefore, it is possible that researchers preferentially assort with same-gendered collaborators even more strongly than implied by our results, at least for certain journals or research disciplines.

Despite this overlap, Fig 5 suggests that our main conclusions (and those of other studies of gender homophily) are probably robust to this career stage issue. We only expect strongly positive α when A) the gender ratio is highly skewed across career stages (e.g. a 5-fold difference), and B) collaborations between early and established researchers are very rare (e.g. <10% of the total). Both of these conditions seem unlikely to be true for most fields: the gender gap across careers stages is generally less pronounced [ 1 , 5 ], and it is very common for early-career researchers to co-publish with an established mentor [ 61 ]. However, one can get α > 0 for realistic combinations of parameters, e.g. a moderate shortage of women in senior positions coupled with a moderate excess of within-career stage collaboration, suggesting this effect might contribute to some of the homophily observed by this and previous studies.

Specifically, if most collaborations occur between career stages, there will be an excess of mixed-gender collaborations (α < 0, blue areas), while if most collaborator pairs comprise two people at the same career stage, there will be an excess of same-gender collaborations (α > 0, red areas). However, the conditions required for strong gender homophily (i.e. the red areas) are quite restrictive, making it unlikely that this issue can fully explain the homophily observed in our study. Additionally, in research disciplines where between-career stage collaboration is common and there is a shortage of women among established researchers (i.e. the blue areas), our study will underestimate the strength of gender homophily. Contour lines mark increments of 0.1.

Given that we cannot identify individual researchers or their career stages, we used a simple model to derive the theoretical expectations for α when the gender ratio differs between career stages (see Methods ). As shown in Fig 5 , we predict that α is expected to be non-zero, even if collaborators are randomly selected with respect to gender, provided that there is a gender gap between career stages. The extent to which α deviates from zero depends on the relative frequencies of collaboration within and between career stages (rows and columns in Fig 5 ), and the size of the gender gap between stages (x- and y-axes in Fig 5 ). When >50% of coauthor pairs comprise one early-career and one established researcher, we expect gender heterophily (α < 0) whenever the gender ratio differs between career stages. Conversely, when >50% of collaborations are between people at the same career stage, we expect gender homophily (α > 0). In a few parameter spaces (shown in red; Fig 5 ), α was quite high, and overlapped with the values that we estimated ( Fig 2 ).

When we restricted the analysis by country, we observed statistically significant homophily for 72 of the 325 journal-country combinations tested (64 unique journals and 18 unique countries), and no significant heterophily ( S5 and S6 Figs). Additionally, the values of α′ calculated for each journal-country combination were only very slightly lower than the α′ values calculated for the journal as a whole (i.e. when pooling papers from different countries, as was done to make Fig 2 ): on average, the difference in α′ was only 0.002 ( S7 Fig ). These results suggest that our findings of widespread homophily in the main analysis were not driven solely by a Wahlund effect resulting from gender differences between countries.

We observed a noisy but statistically significant linear relationship between standardised journal impact factor and α′, such that journals with a high impact factor for their discipline had weaker gender homophily than did journals with a low impact factor for their discipline ( Fig 4 ; linear regression: R 2 = 0.043, t 1415 = -8.0, p < 0.0001). The slope of the regression was −0.012±0.0015, indicating that increasing the discipline-standardised impact factor by one standard deviation is associated with a reduction in α′ of 0.012. The Spearman correlation coefficient was -0.19 (p < 0.0001).

We next tested whether researchers are more or less likely to publish with same-gendered colleagues in strongly gender-biased disciplines (e.g. Surgery or Nursing), relative to disciplines with a comparatively gender-balanced workforce (e.g. Psychiatry). We found a positive, non-linear relationship between the gender ratio across all the authors publishing in a particular journal [ 5 ], and the estimated value of α′ for all authors and for first authors, but not last authors ( S4 Fig ). Journals with a balanced or female-biased author gender ratio tended to have higher α′ (i.e. stronger homophily) than journals with a male-biased author gender ratio (GAM smooth term p = 0.0002 for all-author homophily, p < 0.0001 for first-author homophily, and p = 0.13 for last-author homophily).

Papers with two authors had significantly lower (but still positive) α′ values relative to papers with more than two authors, while papers with 3, 4 or ≥ 5 authors had essentially identical average α′ values ( Fig 3 ). Specifically, the posterior estimate of mean α′ was 0.014 (95% CIs: 0.002—0.026) for 2-author papers and 0.065 (95% CIs: 0.056—0.074) for 3-author papers (and roughly the same for 4- and ≥ 5-author papers; Fig 3 ). One possible explanation for this finding is that 2-authors papers are more likely to have an author list that is evenly split between career stages (e.g. a postgraduate student and their supervisor), increasing the chance that the authors are mixed gender (see section ‘Theoretical expectations for α when the gender ratio differs between career stages’). The result also suggests that the causal mechanisms responsible for gender homophily are similar in small (e.g. 3-author) and larger (≥ 5 author) collaborations (and across disciplines where small versus large collaborations are the norm).

Nevertheless, when we calculated α across all non-single-author papers in our entire 15-year PubMed dataset (as before, excluding papers where at least one author’s gender was unknown; n = >3 million papers, >16 million authors), we found that α was 0.126. This figure is almost double the median value of α′ for individual journals ( Fig 2 ; α′ = 0.070 for ‘All authors’), suggesting that lumping together papers from different fields and different time periods can indeed produce spurious evidence for gender homophily as outlined in Fig 1 .

There was no indication that journals publishing on a wide range of topics have higher α′ values than more specialised journals due to the Wahlund effect ( Fig 1 ). For example, the journal category ‘Multidisciplinary’—which includes general interest journals like PLOS ONE, Nature, Science, and PNAS—did not have markedly elevated α′ ( Fig 2 ). This result suggests that our estimates of homophily, and estimates from some of the earlier studies of homophily listed in the Introduction, are probably not markedly inflated by the presence of disparate research topics (with variable author gender ratios) being published within individual journals.

Fig 2 illustrates the variance in journal homophily values (α′) across scientific disciplines. All disciplines had positive mean α′ (averaged over journals), although homophily appeared somewhat stronger in some disciplines than others (e.g. mean α′ was 0.12±0.02 for Urology journals and 0.03±0.01 for Veterinary Medicine journals; Fig 2 , S4 Data ). However, there was no formal evidence for consistent differences in α′ between disciplines: the random factor ‘Discipline’ explained around 1% of the variance in α′ in the two linear mixed models described in the previous section (see Fig 2 and mixed models in Online Supplementary Material). Thus, the causal mechanisms underlying the observed positive α′ values appear to be similarly strong in all the disciplines we examined.

When comparing pairs of α′ values estimated for the first and last authors for the same journals, we found that α′ tended to be higher for first authors than for last authors ( S3 Fig ; Effect of the fixed factor ‘Authorship position’ in a linear mixed model: Cohen’s d = 0.065±0.02, t 2024 = 4.28, p < 0.0001). This suggests that the gender of the first author was a slightly stronger predictor of the remaining authors’ genders than the gender of the last author, i.e. the opposite of what is predicted if senior scientists are causally responsible for homophily.

In the stacked density plot, the white area shows the number of journals for which homophily was significantly stronger than expected under the null hypothesis (corrected p < 0.05), while the blue area shows all the remainder. Patterns were similar whether α′ was calculated for all authors, for first authors only, or for last authors only. Points in the right panel show α′ for individual journals.

Fig 2 shows the distribution of α′ estimates in 2015-2016 across all journals for which we recovered sufficient data, when α′ was calculated for all authors, first authors only, or last authors only. Most journals had positive values of α′ (77-92%, depending on time period and author type; S1 Data ), and for many of these the false discovery rate (FDR)-corrected p-values suggested that α′ was significantly greater than zero (1469/2077 journals were significant in 2015-16, and 404/1192 in 2005-6; S1 Data ). Only 2/2077 journals had statistically significant heterophily (i.e. α′ < 0) in 2015-16, and 1/1192 in 2005-6 ( S2 Table ). The remaining 606 or 787 journals (in 2015 and 2005 respectively) had a value of α′ not significantly different from zero, consistent with the null hypothesis of random assortment with respect to gender. We also confirmed that in most journals ( S2 Data ) and most research disciplines ( S3 Data , S1 Fig ), the majority of papers had multiple authors.

Discussion

We found evidence that researchers work with same-gendered coauthors more often than expected under the null model, even after implementing stringent controls for Wahlund effects of the kind illustrated in Fig 1. Our study therefore reaffirms earlier studies’ conclusions [49–57, 62] using stricter methodology, and generalises their results across the life sciences. Relatively few journals had α′ values below zero, and almost no journals showed statistically significant gender heterophily after controlling for multiple testing. The excess of same-gender coauthorships was quite large: many journals had α′ > 0.1, indicating that the gender ratio of men’s and women’s coauthors differs by >10% in absolute terms. In relative terms, our findings are even more striking: for example, if men have 20% female coauthors and women have 30% (i.e. α′ = 0.1 in a field with a typical gender ratio [5]), then women publish with women 50% more often than men do.

An important limitation of our study is that we cannot reliably determine the cause(s) of the observed excess of same-gender coauthorships. As well as the obvious interpretation—conscious or unconscious selection of same-gendered collaborators by men, women, or both genders—our results could be partly explained by uncontrolled Wahlund effects. However, we suspect the contribution of these uncontrolled artefacts to be minor, for four reasons: we found positive α′ after controlling for three obvious sources of Wahlund effect; there was no inflation of α′ in highly multidisciplinary journals relative to specialised journals; restricting the data by country yielded similar estimates of α′; and our modelling work suggested that differences in gender ratio between career stages are unlikely to fully explain our results. On balance, we believe the data suggest that it is likely that some researchers preferentially select same-gendered collaborators, although it is difficult to ascertain what proportion of people show such a preference, or how much the strength of the preference varies between individual researchers. We also note that even in a world in which everyone selected their collaborators at random with respect to gender, a high proportion of individual researchers would have entirely same-gendered collaborators by chance alone (especially in gender-biased disciplines); thus, individuals who only have same-gendered co-authors are not necessarily doing anything differently from people with gender-balanced co-authors.

We hypothesised that disciplines with a strongly skewed gender ratio might show the strongest gender homophily, e.g. because being in the minority might increase one’s motivation to seek out same-gendered colleagues. Contrary to this hypothesis, we found no evidence that gender homophily is restricted to particular disciplines: α′ was similarly high across the board (Fig 2). Interestingly, gender homophily was weakest for journals with a male-biased author gender ratio, and strongest in journals with a female-biased author gender ratio. One possible reason is that men are more likely to preferentially seek out male collaborators in fields where men are a minority, relative to the homophily displayed by women in fields where women are a minority. However, this latter result only has tentative statistical support since our sample contains few journals in which most authors are women (S4 Fig).

We also found that gender homophily was marginally stronger in 2015-2016 relative to 2005-2006. Although this trend might reflect a change in the gender preferences of researchers seeking collaborators, there are alternative (and perhaps more likely) explanations. For example, this trend might result from the increasing number of women working in senior positions in STEMM over the past decade [63–65]. As shown in Fig 5, if enough coauthorships are between junior and senior researchers, a large gender gap between career stages can give the appearance of heterophily. As this gender gap between career stages lessens, the observed values of α′ may increase.

Regarding our finding of weaker homophily among 2-author papers, we suspect that many 2-author teams comprise a student/postdoc and a senior staff member, making these teams especially likely to be mixed-gender, due to the greater shortage of women among senior researchers [1, 5]. Assuming this interpretation is correct, this result suggests that our reported α′ values may underestimate the strength of peoples’ preferences for same-gendered collaborators; essentially, women seeking a senior collaborator could be constrained to work mostly with men, meaning that people’s ideal and realised gender preferences would be mismatched. On a related note, Ghiasi et al. [51] argue that women in engineering are “compliant [in reproducing] male-dominated scientific structures” because they do not collaborate often enough with other women (for reference, Figure 7 in [51] implies that coauthorships involving two women are c. 30% more frequent than expected under random assortment). By contrast, we feel that it may be counter-productive to recommend that women collaborate primarily with other women, e.g. because this constrains women’s options (particularly in fields like the one studied by Ghiasi et al.—engineering—where 90% of professors are men [1]). Instead, we suggest that researchers of both genders can help to close the gender gap in STEMM. In the context of collaboration, one way to do this is to undertake self-examination to ensure that one is not inadvertently overlooking or excluding women among potential students and colleagues. One should also take care to treat male and female collaborators equally, e.g. in terms of training and mentoring, allocation of work, and how one descibes the collaboration to other people (e.g. in conference presentations, on the lab website, or in the ‘Author contributions’ section of a paper). Experimental work suggests that unconscious bias causes people to undervalue women’s research achievements [20], and a study of author contribution statements found observational evidence that menial or under-valued tasks are more often assigned to women while more prestigious tasks are assigned to men [61].

Our study begs two questions: what causes gender homophily in science, and are our results cause for concern? We believe that the answers to these questions are closely related. For example, some of the homophily we observed might be caused by women seeking to avoid harassment or sexism from men [38], which would clearly be very concerning. Additionally, Sheltzer and Smith [66] concluded that ‘elite’ male academics (defined as recipients of major honours) have a higher proportion of male students and postdocs than non-elite male academics. This finding could contribute to the homophily we observed, and is cause for concern since the results might reflect discrimination against women during hiring [20], or avoidance by women of elite research groups (e.g. due to gender differences in confidence, or a perception that some groups are sexist). We also found a little evidence that gender homophily is detrimental to research quality, in that high-impact journals tended to have weaker homophily (though the relationship was very noisy). Assuming that papers published in high-impact journals are of higher average quality (which is contentious; [67]), our results provide non-experimental support for the hypothesis that mixed-gender teams produce better research than single-gender teams [42–48]. Another issue is that if many collaborations are between established researchers, there will be an excess of male-male collaborations in fields where women in senior positions are rare; some of the observed homophily might therefore reflect the elevated gender gap among senior researchers.

On the other hand, homophily might have more benign causes. Collaboration is often most enjoyable and productive when working with like-minded people, who might tend to be same-gendered more often than not. We also suppose that some people consciously choose to preferentially collaborate with women in order to help close the gender gap in the workforce; this would create homophily if women adopt this strategy more often than men. In support of this interpretation, there is some evidence that women are more likely than men to promote the work of female colleagues by inviting them to give talks [68, 69]. Given that many collaborative research projects unfortunately involve a gendered division of labour [61], working with a same-gendered colleague may provide exposure to new parts of the research process.