Abstract Hybrid incompatibilities play a critical role in the evolution and maintenance of species. We have discovered a simple genetic incompatibility that causes lethality in hybrids between two closely related species of yellow monkeyflower (Mimulus guttatus and M. nasutus). This hybrid incompatibility, which causes one sixteenth of F 2 hybrid seedlings to lack chlorophyll and die shortly after germination, occurs between sympatric populations that are connected by ongoing interspecific gene flow. Using complimentary genetic mapping and gene expression analyses, we show that lethality occurs in hybrids that lack a functional copy of the critical photosynthetic gene pTAC14. In M. guttatus, this gene was duplicated, but the ancestral copy is no longer expressed. In M. nasutus, the duplication is missing altogether. As a result, hybrids die when they are homozygous for the nonfunctional M. guttatus copy and missing the duplicate from M. nasutus, apparently due to misregulated transcription of key photosynthetic genes. Our study indicates that neutral evolutionary processes may play an important role in the evolution of hybrid incompatibilities and opens the door to direct investigations of their contribution to reproductive isolation among naturally hybridizing species.

Author summary The evolution of hybrid incompatibilities (gene interactions that cause hybrids to be sterile or inviable) is a common outcome of genomic divergence between lineages. However, evaluating the importance of hybrid incompatibilities for speciation requires that we identify the causal genes and evolutionary forces in recently diverged, wild species. We discovered that hybrid seedling lethality between two closely related sister species of yellow monkeyflower is caused by duplicate copies of a gene critical for chloroplast development. Because each lineage carries its one functional gene copy in a distinct genomic location, some hybrids inherit only inactive (or missing) alleles. We infer that hybrid lethality in this young species pair has arisen through divergent resolution of gene duplicates by degenerative mutations and (likely) genetic drift. These findings are an important step toward understanding the evolutionary dynamics of hybrid incompatibility genes in nature, as well as the role of such genes in species divergence.

Citation: Zuellig MP, Sweigart AL (2018) Gene duplicates cause hybrid lethality between sympatric species of Mimulus. PLoS Genet 14(4): e1007130. https://doi.org/10.1371/journal.pgen.1007130 Editor: Harmit S. Malik, Fred Hutchinson Cancer Research Center, UNITED STATES Received: October 10, 2017; Accepted: November 27, 2017; Published: April 12, 2018 Copyright: © 2018 Zuellig, Sweigart. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The genomic and transcript sequences used in our bulked segregant and RNAseq experiments are available under the GenBank accession SRP127135. Funding: This work was supported by an NIH Training Grant Fellowship T32 GM007103 to MPZ and a National Science Foundation (NSF) grant DEB-1350935 to ALS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Across diverse taxa, hybrid incompatibilities arise as a byproduct of genetic divergence among incipient species. The basic genetic underpinnings of this process are well understood: two or more mutational differences between species interact epistatically to cause hybrid inviability or sterility [1–3]. However, what is less clear, and often very challenging to uncover, is the nature of the molecular changes and evolutionary forces that lead to hybrid incompatibilities. What sort of mutations are perfectly functional within species but cause reproductive failure or death in hybrids? Do such mutations accumulate within species by neutral processes or are they positively selected, perhaps providing an ecological advantage or resolving an intragenomic conflict? Addressing the first of these questions is most straightforward in systems with well-developed genetic tools that facilitate positional cloning, which explains why most progress has been made in traditional models like Drosophila, Arabidopsis, and rice. However, insight into the evolutionary forces acting on hybrid incompatibilities during the speciation process requires a focus on young species pairs with natural populations. Over the past two decades, genetic dissection of diverse incompatibilities has provided some hints about their evolutionary origins (reviewed in [4–7]). Often, hybrid incompatibility genes show molecular signatures of positive selection [8–13], and there is suggestive evidence that incompatibility alleles can arise through ecological adaptation [14–16] or recurrent bouts of intragenomic conflict [11, 17–19]. On the other hand, there is also evidence from a handful of cases, all involving gene duplicates, that the evolution of hybrid dysfunction need not involve natural selection [20–22]. The idea that gene duplication might play a key role in hybrid incompatibilities was initially proposed by Muller as a variant of his original model [3]. He explained how gene duplication, followed by degenerative mutations and divergent copy loss, could lead to a difference in gene position between species with missing (or inactive) copies acting as recessive incompatibility alleles [3]. This same scenario was emphasized later as an explanation for defects in pollen development between subspecies of Asian cultivated rice, Oryza sativa [23], and more recently, as a general mechanism of hybrid breakdown via neutral processes [24, 25]. There have now been three empirical demonstrations of gene duplication/transposition giving rise to interspecific hybrid male sterility; one case involves Drosophila melanogaster-D. simulans hybrids [26] and the other two arise from crosses between O. sativa and wild species [21, 22]. Gene duplication also causes lethal and sterile combinations that segregate within Arabidopsis thaliana [27, 28] and O. sativa [20]. However, it is not yet clear whether divergent resolution of gene duplicates contributes to hybrid incompatibilities between wild species in the early stages of divergence. Only by identifying examples in young species pairs, particularly those with sympatric populations and still connected by some degree of gene flow, will it be possible to evaluate the contribution of such loci to speciation. In this study we investigate the molecular genetic basis of hybrid seedling lethality between two closely related sister species of yellow monkeyflower, Mimulus guttatus and M. nasutus. These recently diverged species (200-500kya; [29]) co-occur throughout much of their shared range in western North America, where reproductive isolation between sympatric populations occurs through a number of prezygotic [30–33] and postzygotic barriers [34–40]. Despite substantial reproductive isolation, patterns of shared variation across their genomes indicate historical and ongoing gene flow between the two species [29, 31, 41]. Here we focus on sympatric populations of M. guttatus and M. nasutus located at Don Pedro Reservoir (DPR) in central California, where both species coexist within centimeters of one another. Species at DPR are strongly isolated by divergence in flowering time and mating system [33]; nevertheless, studies have shown low levels of hybridization [33] and a clear signal of introgression [29]. Using high-resolution genetic mapping and genome-wide expression analyses we identify a duplicate gene pair as the cause of Mimulus hybrid lethality. As the first case of hybrid incompatibility genes identified between naturally hybridizing species, this study opens the door to direct investigations of their evolutionary dynamics and contribution to reproductive isolation.

Discussion Identifying the molecular genetic basis of hybrid incompatibilities between recently diverged, wild species is a critical first step toward understanding their evolutionary origins and role in speciation. We have shown that duplicate copies of Mimulus pTAC14, a gene critical for chloroplast development in A. thaliana [42], causes hybrid lethality between sympatric M. guttatus and M. nasutus at the DPR site. We fine-mapped hybrid lethality to hl13 and hl14, two small nuclear genomic regions on chromosomes 13 and 14. In DPR102-gutt (M. guttatus), pTAC14 is present in each of these genomic intervals, but only the hl14 copy is expressed. In DPR104-nas (M. nasutus), pTAC14 is present only in the hl13 interval, consistent with either of two possibilities: the hl13 copy is ancestral and this line lacks the duplication, or a large deletion has removed all trace of the gene from hl14. As a consequence of divergent resolution of these duplicate genes, F2 hybrids that are homozygous for DPR102-gutt alleles at hl13 and homozygous for DPRG102-nas alleles at hl14 contain no functional copy of Mimulus pTAC14. These hybrids fail to produce chlorophyll and die in the cotyledon stage of development, remarkably similar to what is observed in pTAC14 knockouts in A. thaliana [42]. To our knowledge, this is the first pair of hybrid incompatibility genes identified between naturally hybridizing species. Using complementary genetic mapping and functional genomics approaches, our study provides strong evidence that nonfunctional Mimulus pTAC14 is the cause of hybrid lethality between DPR M. guttatus and M.nasutus. But what causes the lack of pTAC14 expression in white hybrid seedlings? In M. nasutus (DPR104-nas), because pTAC14 is missing entirely from the hl14 interval, the hl13 copy (Mn.pTAC14_1) is the only one expressed. In M. guttatus (DPR102-gutt), the situation is less clear. Although both copies (Mg.pTAC14_1 at hl13 and Mg.pTAC14_2 at hl14) are present and highly similar in exons (S3 Fig), our qPCR and RNAseq experiments demonstrate that only one of them–Mg.pTAC14_2 –is expressed. Further work will be required to determine the molecular nature of this change in gene expression. The most obvious possibility is that non-sense mediated decay has efficiently targeted Mg.pTAC14_1, which carries a series of premature stop codons. Another possibility, is that a cis-regulatory mutation disrupts Mg.pTAC14_1 transcription in DPR102-gutt. Alternatively, expression might be prevented by the epigenetic silencing of one duplicate by the other, as was recently shown for sterile and lethal combinations segregating within A. thaliana [28, 51]. Whatever its cause, disrupted expression is not the only problem with DPR102-gutt Mg.pTAC14_1; this gene copy also carries a 1-bp insertion that, if transcribed, would result in a truncated, and potentially nonfunctional, protein. We do not yet know which of these two functional changes to DPR102-gutt Mg.pTAC14_1 arose first. The evolution of hybrid lethality in this system thus appears entirely consistent with a scenario of duplication and neutral non-functionalization within M. guttatus. Given the ubiquity of gene duplications in plant and animal genomes, divergent resolution of paralogs due to degenerative mutation and genetic drift has been proposed as a major source of hybrid incompatibilities [24, 25, 52]. Although initially redundant duplicate genes might sometimes evolve new or partial functions favored by selection [53], our study and others suggest that duplicates involved in hybrid incompatibilities are more often subject to mutations that disable function in one copy. Within A. thaliana and between closely related Oryza species, divergent resolution of duplicates has occurred through nonsense mutations [20, 22] and disruptions to expression [21, 27, 51]. In a more distantly related species pair of Drosophila, hybrid sterility is caused by a gene transposition, with degenerative mutations having presumably removed any remnant of the duplication that likely preceded its evolution [26]. Remarkably, then, to explain the evolution of hybrid dysfunction in Mimulus and several other diverse systems, there is no need to invoke processes beyond mutation and genetic drift. In addition to showing that Mimulus hybrid lethality is due to nonfunctional pTAC14, our analyses have begun to provide some insight into the duplication history of this gene. As might be expected, within M. guttatus (DPR102-gutt and IM62), pTAC14 copies on chromosome 13 are most related and pTAC14 copies on chromosome 14 are most related (Fig 4). However, somewhat counterintuitively, pTAC14 from DPR104-nas, which is located on chromosome 13, is most closely related to the M. guttatus copies on chromosome 14. We interpret this finding, along with the fact that we find no trace of pTAC14_2 at hl14 in DPR104-nas, as evidence that Mimulus pTAC14_1 on chromosome 13 is the ancestral copy. Under this scenario, both the duplicate copy on chromosome 14 (Mg.pTAC14_2) and the M. nasutus copy on chromosome 13 (Mn.pTAC14_1) would have arisen from a similar genetic variant (Fig 7). Standing genetic variation within and between populations of M. guttatus is high [29, 54–57], so it is likely this ancestral populations carried multiple variants of pTAC14_1. Both the duplicated copy in M. guttatus (Mg.pTAC14_2) and the ancestral copy in the selfing M. nasutus (Mn.pTAC14_1) would be expected to carry only a small subset of ancestral variation. Unfortunately, we have not yet been able to assess molecular patterns of Mimulus pTAC14 variation in a wider sample of M. guttatus and M. nasutus. Although whole genome resequence data are available from a number of lines [29, 54, 57], short-read sequences of Mimulus pTAC14 align equally well to both annotated copies in the IM62 reference genome (Migut.M02023 and Migut.O00467). Once Mimulus pTAC14 is sequenced from a broader sample of individuals, we speculate that Mn.pTAC14_1 from M. nasutus and Mg.pTAC14_2 from M. guttatus will be included in distinct monophyletic groups nested within the greater diversity of sequences present at the ancestral Mg.pTAC14_1 from M. guttatus. Interestingly, white seedlings are often observed segregating at low frequencies within M. guttatus populations, which manifest as epistatic inbreeding depression [58, 59] that may be due to divergent resolution of duplicate genes similar to the ones characterized here. However, future studies investigating natural variation for functional and non-functional pTAC14 alleles within species will be required to determine whether the hl13-hl14 incompatibility plays a role in such phenotypes outside of the DPR population. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 7. Hypothetical model for the evolution of pTAC14. Prior to gene duplication the ancestral M. guttatus-like population harbored genetic variation for pTAC14 at hl13 (different colored boxes represent genetic variation at pTAC14). During speciation, M. nasutus acquired a copy of pTAC14 harboring ‘blue’ variation (left). Similarly, the M. guttatus-specific duplication involved a closely related ‘blue’ pTAC14 variant. Contemporary variation within M. nasutus and pTAC14 variants located at hl14 resemble one another genetically due to common ancestry, while pTAC14 variants at hl13 in M. guttatus likely continue to harbor considerable genetic variation, including variants that contain functional (IM62) and non-functional (DPR102-gutt) variants. https://doi.org/10.1371/journal.pgen.1007130.g007 Our study provides the first detailed study of hybrid incompatibility genes from naturally hybridizing species and contributes to a growing body of literature that suggests hybrid incompatibilities might result from neutral evolutionary change within species. Going forward, it will be important to address whether these barriers can persist in the face of ongoing gene flow. Theoretical treatments of this question have consistently concluded that the maintenance of hybrid incompatibility alleles between hybridizing populations relies heavily on a selective advantage within species [60–64]. If so, neutrally evolving hybrid incompatibility alleles might be precluded from affecting reproductive isolation in any more than a transient fashion, with gene flow temporarily constrained until the hybrid incompatibility degrades with time. Nevertheless, other factors such as strong linkage to selected alleles (e.g., [16]) and constraints on gene dosage (e.g., [65]) may play an important role in such incompatibilities. By showing that the duplication of pTAC14 underlies hybrid lethality among sympatric Mimulus species, we now have a natural system in place to test broader questions regarding the evolutionary significance of hybrid incompatibilities in speciation.

Materials and methods Mimulus lines and genetic crosses In this study, we used inbred lines of M. guttatus and M. nasutus derived from wild individuals collected from Don Pedro Reservoir (DPR) in central California [33]. Wild-collected seed was sown on moist Fafard 3-B potting soil in 2.5” pots, cold-stratified in the dark at 4C for two weeks, and moved to the UGA greenhouses to germinate under 16 hour days at 23°C (growth conditions were constant across all experiments with the exception of plants used in RNAseq, which were grown in a growth chamber). Because M. nasutus is a selfer, the DPR104-nas line used in this study was naturally inbred. To generate the DPR102-gutt inbred line, we self-fertilized M. guttatus for three generations, which is expected to result in a genome that is 87.5% homozygous. The DPR102-gutt and DPR104-nas inbred lines were intercrossed to generate reciprocal F1 and F2 hybrids (maternal parent always listed first in crosses). To ensure that both inbred lines were homozygous at hl13 and hl14, we analyzed white:green seedling ratios in 32 F2 families (generated from distinct F1 hybrids, 16 from each cross direction); indeed, all F2 families segregated white seedlings at a frequency of one sixteenth, confirming homozygosity at both hybrid lethality loci. Molecular analyses and whole genome sequencing DNA was extracted from seedlings and adult leaf tissue using a standard CTAB-chloroform protocol [66] modified for use in a 96-well format. Genotyping was performed using a combination of exon-primed intron-spanning size polymorphic markers containing 5’ fluorescent tags (6-FAM or HEX) and SNP-containing gene fragments that were analyzed through Sanger sequencing. We designed size-polymorphic and SNP markers from polymorphisms observed in whole genome re-sequence data of multiple lines of M. guttatus and M. nasutus [29, 67, 68] and confirmed polymorphisms by genotyping parental lines used in our study. A standard touchdown PCR protocol was used in all amplifications and Sanger sequencing reactions were prepared using BigDye v3.1 mastermix (Applied Biosystems, Foster City, USA). Genotyping and Sanger sequencing reactions were run on an ABI3730XL automated DNA sequencer at the Georgia Genomics Facility and analyzed using GENEMARKER [69] and Sequencher (Gene Codes Corporation, Ann Arbor, USA) software, respectively. For whole genome sequencing of bulked segregants, we generated equimolar amounts of DNA from green (N = 26) and white hybrid seedlings (N = 34). Green and white DNA was pooled separately and sent to the Duke Center for Genomic and Computational Biology, where Illumina libraries with unique barcodes were prepared and sequenced using the Illumina Hi-seq platform (100bp single-end reads). Reads from both pools were aligned to the M. guttatus (IM62) reference genome (https://phytozome.jgi.doe.gov), along with previously generated whole genome re-sequence data for DPR104-nas [29]. Reads were aligned using Burrows-Wheeler Aligner (bwa, [70]) with a minimum alignment quality threshold of Q29 (filtering done with samtools, [71]). We identified 235,922 SNPs that differentiated the IM62 reference genome from DPR104-nas using the samtools mpileup function, which provided a list of SNPs that differentiate these two lineages. We used the samtools mpileup function to estimate the frequency of each SNP (‘alternate allele frequency’) within white and green BSA pools. Since SNPs were not based on differences between DPR104-nas and DPR102-gutt, our analysis assumes that M. guttatus lines IM62 and DPR102-gutt (which is not sequenced) share a common set of SNPs. qPCR and RNA sequencing To examine the potential for transcriptional misexpression associated with hybrid lethality, we performed quantitative PCR on a subset of strong candidate genes within hl13 and hl14 (9 genes total, S2 Table). We compared gene expression patterns in seedlings from DPR102-gutt, DPR104-nas, green F2s, and white F2s. We extracted RNA from pools of 10 seedlings for each genotypic class using a Zymo MicroRNA Kit (Zymo Research, Irvine, USA) followed by cDNA synthesis with GOscript Reverse Transcriptase (Promega, Madison, USA). We designed exon-specific primers to amplify fragments of each gene (S4 Table), amplified fragments using standard touchdown PCR, and visualized gene fragments on a 1% agarose gel. We performed an RNAseq experiment to compare genome-wide expression profiles between white and green seedlings. We used lines of DPR102-gutt and DPR104-nas that had been inbred for 5 generations and their green and white F2 progeny, which resulted in three classes of green seedlings (DPR102-gutt, DPR104-nas, and green F2s) and a single class of white seedlings (white F2s). Seedlings with fully expanded cotyledons began to emerge within 3 days and continued to emerge for a week thereafter. We collected pools of 10 seedlings from each biological class directly into 2mL Eppendorf tubes filled with liquid nitrogen. We then extracted RNA from these pools using the Zymo Quick-RNA microprep kit (Zymo Research, Irvine, USA) and estimated RNA concentration using a qubit fluorometer (Life Technologies, Paisley, UK). High quality RNA was subsequently submitted to the Duke Center for Genomic and Computational Biology, where Kapa Stranded mRNA-Seq libraries (Kapa Biosystems, Wilmington, USA) were prepared and samples were sequenced across a single lane of Illumina Hiseq 4000 with single-end 50 bp reads. In total, our analysis involved three replicates each of DPR102-gutt and DPR104 green seedlings, five replicates of green F2 seedlings, and six replicates of white F2 seedlings, where each replicate was a pool of 10 seedlings. We utilized the cufflinks pipeline [72] to assess patterns of differential expression among the four genotypic classes (DPR102-gutt, DPR104-nas, green F2s, and white F2s). We aligned trimmed and filtered reads (Q>20) to the M. guttatus IM62 reference genome in TopHat2 [73], which resulted in an average of 19 million reads aligned per biological replicate. We then assembled transcriptomes in cufflinks using default settings, the reference IM62 transcriptome to guide the assembly (-g), and allowing for both fragment bias (-b) and multi-read (-u) corrections. We used ‘cuffnorm’ to normalize transcript abundance for each genotypic class and ‘cuffdiff’ to calculate differential expression for all pairwise comparisons using default parameters for both program. For data management and sorting, we used Microsoft excel, the R statistical package [74], and the R package CummeRbund [72]. Gene ontology (GO) enrichment analyses were carried out for particular subsets of data that exhibited patterns of differential expression between white and green seedlings. To perform these analyses, we used GOstat [75] and GO::TermFinder [76] implemented in the Phytomine user interface (https://phytozome.jgi.doe.gov). For GO term analyses, we used all annotated genes in the v2.0 IM62 M. guttatus reference assembly to serve as the background population and used a Bonferroni cutoff value of 0.05 to test for significant GO term enrichment in our subset of differentially expressed genes. For our analysis of chloroplast-encoded genes, we generated a list of putative chloroplast genes, since no chloroplast genome assembly is currently available for M. guttatus. We generated this set by first downloading a list of 135 genes present in the chloroplast genome in A. thaliana from the TAIR database (www.arabidopsis.org). We used this list to identify M. guttatus homologs in the Phytomine database (https://phytozome.jgi.doe.gov). This approach yielded a set of 52 putative Mimulus chloroplast genes that are currently included (and, presumably, misassembled) in the nuclear genome (no homologs were identified for the other 83 genes used in our search). A substantial fraction of these genes (69%) occur along chromosome 4 from positions 6,719,000–7,985,375, which contains 183 genes total. Gene sequencing and phylogenetic analyses To obtain full-length pTAC14 sequences from DPR102-gutt and DPR104-nas, we amplified both genomic and cDNA using primers designed within conserved exonic sequence. PCR fragments were amplified using Phusion High-Fidelity DNA Polymerase (New England Biolabs, Ipswich, USA) and either directly sequenced or sequenced after cloning into the TOPO TA Vector (Thermo Fisher, Carlsbad, USA). Additionally, we extracted full-length transcript sequences from RNAseq data by performing de novo assemblies on reads that mapped to candidate genes using the Geneious Assembler (Biomatters, Newark, USA). When reads mapped to duplicated genes, they were combined into a single de novo assembly and 95% confidence consensus sequences were constructed. Using PHYML [77], we constructed a neighbor-joining tree for pTAC14 using 4,319 bp of genomic sequence (excluding 5’ and 3’ UTRs and insertions/deletions coded as single variants) with branch support determined with 1000 bootstraps. For the tree presented in Fig 4B, we used the general time-reversible model with four substitution rate categories and allowed the program to estimate the proportion of variable sites and the gamma distribution parameter (varying these parameters produced identical consensus trees).

Acknowledgments We thank Noland Martin and John Willis for first identifying white seedlings at Don Pedro Reservoir and encouraging us to study them. We thank Amanda Kenney for her help with BSA experiments and intellectual contributions. Our discussions with Dave Hall, Mike Arnold, John Burke, Kelly Dyer, and CJ Tsai were always enlightening and played an important role in shaping the design and interpretation of this study. John Willis, Dave Hall, Nick Arthur, Nick Batora, Rachel Kerwin, Sam Mantel, and two anonymous reviewers provided thoughtful comments, which greatly improved the quality of our manuscript.