Abstract Symbionts that distort their host's sex ratio by favouring the production and survival of females are common in arthropods. Their presence produces intense Fisherian selection to return the sex ratio to parity, typified by the rapid spread of host ‘suppressor’ loci that restore male survival/development. In this study, we investigated the genomic impact of a selective event of this kind in the butterfly Hypolimnas bolina. Through linkage mapping, we first identified a genomic region that was necessary for males to survive Wolbachia-induced male-killing. We then investigated the genomic impact of the rapid spread of suppression, which converted the Samoan population of this butterfly from a 100∶1 female-biased sex ratio in 2001 to a 1∶1 sex ratio by 2006. Models of this process revealed the potential for a chromosome-wide effect. To measure the impact of this episode of selection directly, the pattern of genetic variation before and after the spread of suppression was compared. Changes in allele frequencies were observed over a 25 cM region surrounding the suppressor locus, with a reduction in overall diversity observed at loci that co-segregate with the suppressor. These changes exceeded those expected from drift and occurred alongside the generation of linkage disequilibrium. The presence of novel allelic variants in 2006 suggests that the suppressor was likely to have been introduced via immigration rather than through de novo mutation. In addition, further sampling in 2010 indicated that many of the introduced variants were lost or had declined in frequency since 2006. We hypothesize that this loss may have resulted from a period of purifying selection, removing deleterious material that introgressed during the initial sweep. Our observations of the impact of suppression of sex ratio distorting activity reveal a very wide genomic imprint, reflecting its status as one of the strongest selective forces in nature.

Author Summary The sex ratio of the offspring produced by an individual can be an evolutionary battleground. In many arthropod species, maternally inherited microbes selectively kill male hosts, and the host may in turn evolve strategies to restore the production or survival of males. When males are rare, the intensity of selection on the host may be extreme. We recently observed one such episode, in which the population sex ratio of the butterfly Hypolimnas bolina shifted from 100 females per male to near parity, through the evolution of a suppressor gene. In our current study, we investigate the hypothesis that the strength of selection in this case was so strong that the genomic impact would go well beyond the suppressor gene itself. After mapping the location of the suppressor within the genome of H. bolina, we examined changes in genetic variation at sites on the same chromosome as the suppressor. We show that a broad region of the genome was affected by the spread of the suppressor. Our data also suggest that the selection may have been sufficiently strong to introduce deleterious material into the population, which was later purged by selection.

Citation: Hornett EA, Moran B, Reynolds LA, Charlat S, Tazzyman S, Wedell N, et al. (2014) The Evolution of Sex Ratio Distorter Suppression Affects a 25 cM Genomic Region in the Butterfly Hypolimnas bolina. PLoS Genet 10(12): e1004822. https://doi.org/10.1371/journal.pgen.1004822 Editor: Harmit S. Malik, Fred Hutchinson Cancer Research Center, United States of America Received: April 24, 2014; Accepted: October 15, 2014; Published: December 4, 2014 Copyright: © 2014 Hornett et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information files, or are available from the Genbank database (accession numbers KM463601-KM463757) and the NCBI SRA database (accession number SRP045735). Funding: This work was supported by grants from the Natural Environment Research Council (NERC) (NE/E003095/1) to GDDH, CDJ & NW (http://www.nerc.ac.uk), and a NERC NBAF grant (MGF/219 NBAF-L) to GDDH, CDJ, EAH and NW (http://nbaf.nerc.ac.uk). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction In 1930, Fisher noted that the strength of selection on the sex ratio was frequency dependent, echoing earlier findings of Düsing [1], [2]. As a well-mixed outbreeding population progressively deviates from a 1∶1 sex ratio, selection on individuals to restore the sex ratio to parity becomes stronger. In natural animal populations, a common cause of population sex ratio skew is the presence of sex ratio distorting elements, in the form of either sex chromosome meiotic drive [3], or cytoplasmic symbionts [4]. In some cases, these elements can reach very high prevalence, distorting population sex ratios to as much as 100 females per male [5], and producing intense selection for restoration of the individual sex ratio to 1 female per male. The most common consequence of this selection pressure is the evolution of systems of suppression – host genetic variants that prevent the sex ratio distorting activity from occurring. Suppressor factors are known for a wide range of cytoplasmic symbionts and meiotic drive elements [3], [6], [7]. The evolution of suppression of Wolbachia induced male-killing activity in the butterfly Hypolimnas bolina represents a compelling observation of intense natural selection in the wild. Female H. bolina can carry a maternally inherited Wolbachia symbiont, wBol1, which kills male hosts as embryos [8]. The species also carries an uncharacterised dominant, zygotically acting suppression system that allows males to survive infection [6]. Written records and analysis of museum specimens indicate this symbiont was historically present, and active as a male-killer, across much of the species range, from Hong Kong and Borneo through to Fiji, Samoa and parts of French Polynesia [9]. Evidence from museum specimens also indicates that host suppression of male-killing had a very restricted incidence in the late 19th century, with infected male hosts (the hallmark of suppression) being found in the Philippines but not in other localities tested. By the late 20th century, suppression of male-killing was found throughout SE Asia, but not in Polynesian populations where the male-killing phenotype remained active [10]. The most extreme population was that of Samoa, where 99% of female H. bolina were infected with male-killing Wolbachia, resulting in a sex ratio of around 100 females per male within the population [5]. However, following over 100 years of stasis on Samoa, the rapid spread of suppression of male-killing activity of the bacterium was finally observed between 2001 and 2006, restoring both individual and population sex ratio to parity [11]. When strong selection occurs at a locus, it is expected to leave a genomic imprint beyond the target of selection, as a result of genetic hitch-hiking. A neutral (or even deleterious) variant that is initially present in the haplotype in which the favoured allele arose (i.e. is linked to the site of selection), will also increase in frequency [12]. When selection is very strong, the frequency of linked variants may increase across a broad genomic region [13]. Importantly, the extent of the chromosome over which this effect will occur depends on the selection pressure in the first few generations; before recombination has broken down associations between the target of selection and linked variants. Where sex ratio distorters are common, the selection pressure in these first generations may be very strong indeed (before the sex ratio becomes less biased through spread of the suppressor). It is thus likely that selection on the sex ratio will influence linked material over a broader genomic region compared to many other selective regimes. That is, the episode of selection is likely to have a very wide genomic impact. In this paper, we first mapped a genomic region in SE Asian butterflies that was required for male survival in the presence of Wolbachia. We then investigated the impact of the recent spread of the suppressor in Samoa on the pattern of variation around this region. To this end, we initially developed theory to predict the impact of suppressor spread on linked genetic variation. We then directly observed changes in the frequency of genetic variants surrounding the suppressor locus by comparing the pattern of genetic variation in H. bolina specimens collected in Samoa before (2001) and after the selective sweep (2006 and 2010). By examining post-sweep samples at two time points we were additionally able to track allele frequency changes following the initial sweep. The data revealed changes in the pattern of genetic variation over a 25 cM region surrounding the suppressor locus. We further suggest that the suppressor was probably derived by immigration, and that the sweep may have introduced deleterious material that was subsequently subject to purifying selection.

Discussion Our data identified a 10 cM genomic region on chromosome 25 of SE Asian H. bolina that was necessary for a male butterfly to survive Wolbachia induced male-killing. This region was also a focus of selection during the spread of suppression of male-killing between 2001 and 2006 in Samoa. During this episode, patterns of allelic variation were observed to be altered over a 25 cM region of chromosome 25, with increases in frequency of one allele at each locus creating the vast majority of heterogeneity between time points. The largest magnitude of change occurred in markers that co-segregated with suppression in SE Asia, and in this region the overall genetic diversity (as measured by A E and π) declined - the classical signature of a selective sweep. Three further features implicate the role of selection in altering allele frequency across this 25 cM region. First, the changes in allele frequency are too large to be accounted for by drift, even under conservative assumptions for population size and generation time. Second, LD is generated across this region, as predicted under a model of strong selection. Third, 9 markers unlinked to the suppressor linkage group showed no evidence of changes in the frequency of allele variants between 2001 and 2006, implying that demographic factors were not the major force driving changes in allele frequency. While we observed changes consistent with the operation of selection over a very broad genomic area, the degree of change was less than that predicted from our model. This is true both of the magnitude of allele frequency change at loci located near the suppressor locus, and the breadth of the region of chromosome over which changes in allele frequency occurred. Our model, which presumes a panmictic model and no cost to carrying the suppressor, predicts the suppressor should fix (and take alleles within 5 cM distance to frequency in excess of 87%), and that allele frequency changes should be observed chromosome-wide. In contrast, the swept allele at locus D (which lies within 5 cM of the target of selection) attains a frequency of just 0.67 (n = 172, CI 0.59–0.74) in 2006 and 2010 samples. Further, we observed only very small changes in allele frequency at the most distant locus from the region containing the suppressor, locus L. We suggest there are three non-mutually exclusive explanations for this lack of fit with the model. First, the suppressor mutation in natural populations diffuses spatially following its initial arrival, and each generation of spatial diffusion is associated with a narrower local sweep. The principle impact of spatial diffusion will be to narrow the genomic region that is affected by selection compared to that predicted in a panmictic model, and to reduce the magnitude of change at loci far from the target of selection. For a locus 25 cM distant from the suppressor, association with the suppressor allele may last just one or two generations, such that changes in allele frequency occur only near the point of origin, and are diluted by absence of any selection on these loci in the majority of the species range. However, spatial diffusion represents a poor explanation for the lower than expected frequency of variants at tightly linked loci post-sweep, which are expected to maintain strong association during the spread of the suppressor across the island, as this occurs in about 10 generations. A second possibility is the involvement of other loci in the genome, as enhancers of suppressor action. Our data indicate that the genomic region in question is necessary for male survival, but do not rule out involvement of other loci in enhancing suppression. If other loci are involved, either as required elements or enhancers of male survival, this would slow suppressor spread, and might account for the narrowness of the sweep observed compared to model predictions. However, the requirement of the genomic region for male survival in the presence of Wolbachia again makes this a poor explanation for the failure of tightly linked loci to reach high frequency. Because it is necessary, it should become fixed in the population, and closely associated allele variants should in consequence attain very high frequency. A third possibility is that there is a cost to being homozygous for the suppressor mutation, either in both sexes, or in female hosts only. A cost such as this could prevent fixation of the suppressor allele, and thus also help account for the decreased magnitude of effect at loci tightly linked to the suppressor. If the suppressor allele reaches >0.75 frequency, then males lacking the suppressor would be sufficiently rare that the population sex ratio would be near parity. Biologically, costs to suppressor carriage may be directly associated with the suppression system itself. Modification of a sex determination gene, for instance, might rescue males but be deleterious in females, or when homozygous. Alternatively, costs may be associated with linked mutations. The presence of deleterious loci in linkage with the suppressor is supported by our observation that some material that had been initially swept into the population was lost between 2006 and 2010. Finer-scale investigation of this linkage group, especially within the region identified as required for male survival, is necessary to illuminate the precise dynamics that occurred during this episode of selection. In our data, we observed concordance between the position of the suppressor ascertained in SE Asian butterflies, and the genomic region subject to selection during spread of suppression through the Samoan population of the butterfly. This observation has two possible interpretations. First, the suppressor mutation may have been introduced into Samoa by migration. Given that the suppressor is absent in the nearest island groups, American Samoa and Fiji, suppressor introduction would be associated with a long distant migrant. Second, the genomic region identified here may represent a hotspot for suppressor mutation, derived independently in Samoa by de novo mutation. This may be an identical mutation to that found in SE Asia, or an alternative mutation in the same gene, which still confers suppression. Alternatively, there may be a suppression-conferring mutation in a different gene within the region identified as containing the suppressor. The presence of novel swept alleles at loci linked to the suppressor indicates that migration is the most parsimonious explanation for suppressor origin. Variants not present in the 2001 sample were observed to be the main ‘swept’ allele at 4 of the 11 loci at which significant change was detected (indicated with green arrows in Fig. 3). At two of these loci (A & I), the invading allele was defined by a single nucleotide polymorphism (SNP) being absent from the 2001 sample, whereas the other two alleles represented different combinations of existing SNPs. The four loci were in three genomic locations spaced over 17 cM and showed no evidence of linkage disequilbrium in the 2001 pre-sweep sample, and thus they can be treated as independent from each other (Fig. 5). They therefore support (but do not definitely prove) a migratory origin. None of the loci tested in this study are likely to be the suppressor locus itself (markers were selected that spanned chromosome 25 and had conserved exon sequence – with several being housekeeping genes). Future research should aim to establish the actual nature of the suppressor mutation in both Samoa and SE Asia through fine-scale genetic mapping. Such a project will allow the source of suppression on Samoa (migration or in situ mutation) to be clarified. Beyond this, it will reveal the actual target of selection in this system. It has been widely conjectured that the evolution of sex determination systems might occur in response to the presence of sex ratio distorting microbes [20]. It is notable that a strong candidate gene – doublesex – resides within the equivalent genomic area in Bombyx mori, and with conservation of synteny being profound in Lepidoptera, is likely to reside in this area in Hypolimnas. Doublesex represents a tempting candidate as it is known that splicing of this gene is altered in the presence of male-killing Wolbachia in another lepidopteran, the moth Ostrinia scapulalis [21].

Materials and Methods Developing a genome-wide marker set for H. bolina We utilized high-throughput sequencing of the transcriptome of H. bolina to obtain coding sequence from multiple loci across the genome. Following total RNA extraction from 1 male and 1 female adult H. bolina, mRNA library construction and sequencing using the Roche 454 sequencing platform (http://www.454.com), 450 bp reads were de novo assembled into contigs using the Newbler assembler to create the first set of Expressed Sequence Tags (EST) for H. bolina. The trimmed reads have been deposited as one male, and one female, partial transcriptome datasets in the NCBI SRA database, accession SRP045735. In the absence of any annotated genome or transcriptome for H. bolina, the moth Bombyx mori was used as a proxy reference genome, this being the only available resource for Lepidoptera at the time of the study. There is a high level of synteny of gene location in the Lepidoptera [22] allowing a targeted gene approach, in which several genes could be selected from each chromosome across the genome. Coding sequence of highly conserved genes such as ribosomal proteins and housekeeping genes from B. mori were initially targeted and then retrieved from NCBI (http://www.ncbi.nlm.nih.gov). To determine putative H. bolina orthologs a local tBLASTx was then performed against the H. bolina EST set. Only genes that returned a single tBLASTx hit were included, reducing the likelihood of the inclusion of paralogs in our marker set. The orthologous H. bolina contigs were then translated into amino acid sequences using the ExPASY online tool (http://web.expasy.org/translate), with the sequence lacking mid-sequence stop codons chosen as the most likely translation. In a final test for paralogs, a reciprocal BLAST was performed of coding sequence from the orthologous H. bolina contigs as queries against the B. mori genome using the INPARANOID8 search tool ([23]; http://inparanoid.sbc.su.se/). Where present, intronic regions were targeted for marker development, as they are likely to have a higher degree of nucleotide diversity. Again, conservation of synteny in Lepidoptera genome organisation allowed the intron/exon boundaries in H. bolina genes to be inferred using the B. mori genome. Through tBLASTx analysis of the B. mori coding sequence of the targeted gene against the B. mori WGS (Whole Genome Shotgun contigs) database in NCBI, exonic regions were identified (as only these regions will align). The translated orthologous H. bolina contig and the corresponding B. mori amino acid sequence were aligned using ClustalW [24] and the position of the intron/exon boundaries subsequently located. Once intron/exon boundaries were identified in B. mori genes, and extrapolated to the H. bolina orthologous sequences, primers were designed for H. bolina that spanned introns of size 500–1000 bp (Bombyx size approximation). This size range was chosen to enable successful amplification of the intronic region during PCR. Marker optimisation was performed using three test H. bolina samples and successful PCR products were sequenced using Sanger technology. Mapping the suppressor linkage group In order to investigate the genetic architecture of male-killing suppression in H. bolina and determine markers in linkage with the suppressor locus, we crossed females of a butterfly population (the Philippines) that were Wolbachia-infected and homozygous for the male-killing suppressor allele (SS) to males from a Wolbachia-infected population (Moorea, French Polynesia) that lacked the suppressor (ss), to create suppressor-heterozygous Wolbachia-infected offspring (Ss) (S1 Figure). Recombination does not occur during female meiosis in the Lepidoptera [25], permitting the progeny of Ss females to be used to identify the linkage group (SLG, Suppressor Linkage group) in which the dominant suppressor allele was carried. To this end, Ss females were crossed with ss males to produce the female-informative families. For inclusion in the SLG, markers linked to the suppressor locus are characterized by being present in all surviving sons of the Ss heterozygous mother, rather than the 50% expectation from Mendelian segregation with random survival. Initially each marker was sequenced in the F1 parents (Ss female×ss male). In each case, SNPs were chosen that were heterozygous in the female and homozygous in the male – following the presumed pattern of the suppressor. These same SNPs were then scored in 16 male and 8 female F2 progeny. Once a marker had been found that was present in half of the daughters (following Mendelian inheritance) but all of the sons (for a son to survive it must have at least one copy of the suppressor, and hence linked marker allele), further markers were developed for that same chromosome based on synteny with B. mori. A final suite of 12 markers that produced clean sequence and that spanned the suppressor-associated chromosome were developed to form the SLG. Recombination does occur in male H. bolina, and thus crosses of Ss males to ss females (the male-informative families) allow a) mapping of genetic markers within a chromosome relative to each other and b) mapping of the suppressor within the linkage group, in terms of a region of the chromosome that is always present in surviving sons. To this end, the 12 linked markers were sequenced in the female F2 (n = 307) from one male informative cross (Ss male×ss female) and a linkage map created using JoinMap (version 3.0; Haldane mapping function) [26]. To place the suppressor locus within the map F2 males (n = 60) from this cross were analysed using the same 12 markers. Absence of recombinants in a core subset of markers, flanked by markers with an increasing numbers of recombinants, indicated the position of the suppressor locus (Fig. 1). Assessing the effect of suppressor spread on genomic variation A population sample of butterflies from three time points (2001: n = 48, 2006: n = 48, 2010: n = 46) were collected from the Samoan island of Upolu. For each individual, DNA was extracted using the Qiagen DNeasy kit (www.qiagen.com), and the suite of 12 suppressor-linked markers amplified using PCR. Following Sanger sequencing of the amplicons through both strands, the resultant marker sequences were alignment in Codoncode (www.codoncode.com/). SNPs present within and between the population samples were then identified and scored for each individual butterfly. Using the SNP data (given in S7 Table), the alleles present at each marker in each population sample were estimated using the haplotype reconstruction software PHASE (version 2.1 [27], [28]) with 1000 iterations, a thinning interval of 100 and 1000 burn-in iterations. Allele frequencies at each marker for each time group could then be calculated and compared. Output was also examined by eye, with alleles identified first where there was no ambiguity (either homozygous, or a SNP separating into two defined alleles). Thereafter, alleles were assumed identical to those already identified where possible. The low allele diversity meant this visual analysis produced very similar result to PHASE output, which can thus be considered robust. Patterns of genetic differentiation were estimated using GENEPOP [29]. First, heterogeneity of allele frequency distributions between pairs of time points was estimated using a G test based on allele frequency distribution. Where allele distributions were heterogeneous, we ascertained the allele whose frequency change made the greatest contribution to heterogeneity as that with the largest standardized residual within the heterogeneity test [18]. This allele was then removed (it was an allele increasing in frequency in each case), and the data retested to ascertain if the population samples were then homogeneous, or whether there was evidence for a second allele that changed in frequency (a second allele was identified in three cases). We additionally used F ST standardized population genetic differentiation to quantify the magnitude of change between allele frequency distributions between the two samples. In each case, the rare individuals where sequence could not be obtained for particular alleles, or not inferred accurately, were coded as missing information. DNA polymorphism statistics and estimates of nucleotide diversity (number of segregating sites, number of haplotypes, pi, theta, the average number of nucleotide sequences (k), Tajima's D, haplotype diversity (Hd)) for each marker for each time point were conducted in DnaSP (version 5) [30]. These statistics were estimated using sequence data excluding gaps i.e. indel mutations were not used (present in 8 of 9 unlinked markers). Nine unlinked markers, from 8 different chromosomes, were also sequenced for the 2001 and 2006 population samples to investigate the degree to which changes were observed in the wider genome and as a control for demographic effects. These were tested for the presence of heterogeneity between time points using a G test based on allele frequency distributions, for differentiation using the F ST statistic, and several polymorphism statistics as described above for the SLG markers. We additionally analysed evidence for alteration in the pattern of linkage disequilibrium, again using GENEPOP. The significance of LD between all possible combinations of loci was tested in the 2001 and 2006 samples separately. We do not report the magnitude of LD, as this is not a standardized measure, being dependent on the allele frequency distribution at each locus.

Acknowledgments We would like to thank Luana von Reiche for continued access to her land in Samoa, and the Government of Samoa, Ministry of Natural Resources and Environment, for permission to study and collect in Samoa. We also wish to thank Deborah Charlesworth, Gilean McVean, Pascal Campagne and the Evolution and Ecology of Infectious Disease group at the University of Liverpool, UK, for comments on the manuscript. Finally, we would like to thank three anonymous reviewers for their patient and scholarly comments that greatly improved the quality of analysis and clarity of our argument.

Author Contributions Conceived and designed the experiments: GDDH EAH NW SC CDJ. Performed the experiments: EAH BM LAR ST. Analyzed the data: EAH GH CDJ ST. Contributed reagents/materials/analysis tools: EAH GH SC CDJ ST. Wrote the paper: EAH BM LAR SC ST NW CDJ GDDH.