Literature search

We searched ISI Web of Science and Scopus on 9 June 2017 for peer-reviewed, English language studies that manipulated the presence or strength of sexual selection using experimental evolution, and then measured some proxy of population fitness. A detailed list of search terms is given in the Supplementary Information (Supplementary Methods).

After removing duplicates, we read the titles and abstracts of the remaining 1015 papers and removed those that did not fit our inclusion criteria (typically because they did not present primary experimental evolution data). This left 130 papers, for which we read the full text and applied the inclusion criteria outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram (Fig. 4). Briefly, we included studies that (1) were conducted in a dioecious animal, (2) experimentally manipulated the strength of sexual selection (e.g. via experimentally enforced random monogamy or an altered sex ratio) for at least one generation and (3) measured a trait that we judged to be a potential correlate of population fitness. This third criterion is the most subjective, because there is rarely enough data to determine whether a particular trait is (or is not) correlated with population fitness. We therefore relied on our best judgement when deciding what outcomes were correlated with population fitness. We categorised the fitness outcomes into three categories: ambiguous, indirect and direct (detailed in Supplementary Table 1). Briefly, ambiguous measures of fitness were those that are reported to have an unclear or variable association with fitness (e.g. body size, mating duration, early fecundity and male reproductive success). Indirect fitness components were those that are often used as a proxy of fitness, but do not directly measure aspects of success in reproduction or population viability (e.g. lifespan, mating success and ejaculate quality/production). Finally, direct measures of fitness (female/mixed sex reproductive success, offspring viability and extinction rate) are those that measure fitness through components of reproduction or long-term viability. The Supplementary Methods describe why each of the 130 papers was included or excluded (Supplementary Table 2).

Fig. 4 PRISMA diagram. Flow of inclusion and exclusion of studies identified during the literature search, presented as a PRISMA diagram with number of published papers in brackets Full size image

Of these 130 papers, 62 were excluded based on the PRISMA criteria (Fig. 4). Additionally, three papers presented insufficient information to calculate effect size. In these cases, we contacted the authors and attempted to obtain the missing data, with partial success. The final meta-analysis included data from 65 papers.

Data extraction

From each paper, we first attempted to extract the arithmetic means, standard deviations and sample sizes of each of the different treatment groups, which facilitate calculation of effect size (see below). Typically, there were two or three treatments, which varied in the strength of sexual selection on males through manipulations to the adult sex ratio; in these cases we considered treatments with the greater male-to-female ratio to be the high sexual selection treatment group. For some papers, summary statistics were not written down, but were presented in a figure such as a bar chart: in these cases, we extracted the data using WebPlotDigitizer v.3.1279. If the treatment means were not reported (and the raw data were unavailable), we instead calculated effect size from test statistics comparing treatment means (e.g. F, t, z or χ2 values), which we used to estimate effect size using several formulae (see below).

Where possible, we extracted data for each independent replicate or experimental evolution line within a study; otherwise, we used pooled treatment means. For studies that repeatedly measured the same population across multiple generations, we only extracted data for the last reported generation.

In addition to the data used to calculate effect size, we collected a set of moderator variables for each paper (see the Source Data file and associated Supplementary Information). The moderators were selected due to their ready availability, and because we hypothesised that they might explain some of the observed heterogeneity in effect size. A key moderator was whether the environmental conditions that a population evolved under were stressful (e.g. elevated mutation load, novel/sub-optimal food source, increased sub-lethal temperatures). Additionally, we collected details for each effect size on: sex (male, female or a mixed sample of both), taxon (flies, beetles, mice, nematodes, mites, crickets and guppies), the presence/absence of blind methodology and number of generations a treatment group underwent experimental evolution. In the interests of creating a useful data resource, we also recorded details about each experiment that were not formally analysed due to a shortage of data, such as the type of sexual selection that was manipulated (pre-copulatory, post-copulatory or both) and the male-to-female ratio, which is included in the Source Data file.

Effect size calculation

For each measurement of each pair of treatments, we estimated the standardised effect size Hedges’ g80. Similar to Cohen’s d, Hedges’ g expresses the difference in means in terms of standard deviations (making it dimensionless), but it is more robust to unequal sampling and small sample sizes81. For comparisons of extracted treatment means, we calculated Hedges’ g using the mes function in the compute.es R package82. To calculate Hedges’ g from test statistics, we used the fes, chies and tes functions in the compute.es package (for F, χ2 and t statistics, respectively). The propes function was used to calculate effect size from a difference in proportions; in two cases83,84, a proportion was equal to one (producing infinite effect sizes), and so we subtracted one from the numerator when estimating Hedges’ g. In all cases, we selected a direction for the effect size calculation such that in our meta-analysis, negative effect sizes indicate that the removal of sexual selection was associated with higher fitness trait values, and positive effect sizes indicate higher fitness when sexual selection was elevated or left intact. We also inverted the sign of effect sizes pertaining to measurements that are expected to be negatively related to population fitness (e.g. parasite load, mutation load, extinction risk/rate, mating latency (males) and rate of senescence). Because many of our 65 papers measured multiple fitness outcomes, studied multiple replicate populations or had three or more sexual selection treatments, we calculated a total of 459 effect sizes.

Additionally, using studies that presented means, standard deviations and sample sizes (n = 352) we were able to calculate an alternative measure of effect size: the lnRR85,86. The lnRR was used as a supplement to Hedges’ g because it relaxes the assumption in equal variances between control and treatment groups (homoscedasticity).

For the meta-analysis testing whether sexual selection affects phenotypic variance (as opposed to the mean), we estimated the difference in variance between each pair of treatments using the natural logarithm of the ratio between the coefficient of variation for each group (termed lnCVR)34: ln(CVfitness SS high /CVfitness SS low ). The use of lnCVR allows us to determine the effects of sexual selection on phenotypic variance, with the coefficient of variation implicitly controlling for the mean-variance relationship seen in the dataset (Supplementary Fig. 5). As a supplement, we also calculated the natural logarithm of the absolute ratio between the absolute variation for each group (lnVR) in order to assess the impact of sexual selection on trait variance, irrespective of their magnitudes34. The calculation of lnCVR and lnVR relies on the availability of arithmetic means, standard deviations and sample sizes for the two treatment groups34,87, and so we were only able to calculate lnCVR and lnVR for 354 of 459 comparisons.

Mixed-effects meta-analysis

First, we obtained a weighted mean effect size (Hedges’ g) for the entire dataset, using both Bayesian and REML approaches for completeness. The weighted mean was obtained by fitting a model with no moderator variables (i.e. fixed effects), but fitness component (e.g. body size, female reproductive success), study ID and taxon as random/group-level effects. That is, we separately model correlations between different effect sizes sourced from the same study, taxon or pertaining to the same fitness component, and account for these interdependencies when estimating the overall effect. Given the small number of phylogenetically diverse species, we did not utilise phylogenetic corrections within the models. In our meta-analyses, we report Bayes factors (BF), giving the likelihood ratio that the focal effect size differs from zero BF >0 .

Second, we fixed the relationship to fitness class (Ambiguous, Indirect or Direct) as a moderator variable in Bayesian and REML models (whilst maintaining study and taxon as group-level effects) to derive predictions for effect size within each of the three fitness-relationship classes, using the relevant predict functions for each of the R packages used (see below). This meta-analysis was then supplemented by another model where we fixed fitness component as a moderator variable (e.g. immunity, lifespan, offspring viability and female reproductive success); predictions for this model on the 22 fitness components were derived as above. Alternatively, to assess the impact of sexual selection on each fitness component independently of one another, we conducted separate meta-analyses (n = 18); subset for each fitness trait with more than three effect sizes. These models were were intercept only REML models with study and taxon as group-level effects. Further details on model parameters can be found by accessing the R code.

Third, we measured the impact of environment, sex and their interaction on the effect size (Hedges’ g, lnRR, lnCVR and lnVR) associated with the manipulation of sexual selection, by fitting these predictors as moderators in a pair of separate mixed-effects meta-analyses. These meta-analyses were restricted to effect sizes calculated from unambiguous outcomes (i.e. those scored as being directly or indirectly related to population fitness), as well as those where we were able to define the environmental conditions as either stressful or benign (Hedges’ g: n = 330; lnRR, lnCVR and lnVR: n = 269). We again fit study ID, fitness component and taxon as random/group-level effects. Models investigating other moderators such as number of generations and blinding are presented in Supplementary Table 9.

For our meta-analyses investigating the effects of environment and sex on the magnitude and variance of fitness-related traits, we provide estimates of heterogenity present in the dataset. We use the statistic I2 as an estimate of the proportion of variance in effect size that is due to differences between levels of a random effect (e.g. studies)88. I2 is preferred over other statistics as it is independent of sample size, is easily interpretable and can be partitioned between random effects32. Within ecology and evolution heterogeneity in datasets is often high, with the mean I2 from 86 studies above 90%33.

Meta-analyses fit by REML were implemented in the metafor R package89, while their Bayesian equivalents used the R package brms to run models in Stan90.

Publication bias

We tested for publication bias via funnel plots, using Egger’s test to quantify plot asymmetry91. Additionally, we tested for time-lag bias36, in which effect size magnitudes decline over time as more data are collected. Additionally, we assessed a potential source of publication bias through the correlation between effect size and journal impact factor37, which can arise if null or countervailing results are more difficult to publish (impact factors were from InCites Journal Citation Reports).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.