Abstract Strains of Saccharomyces cerevisiae used to make beer, bread, and wine are genetically and phenotypically distinct from wild populations associated with trees. The origins of these domesticated populations are not always clear; human-associated migration and admixture with wild populations have had a strong impact on S. cerevisiae population structure. We examined the population genetic history of beer strains and found that ale strains and the S. cerevisiae portion of allotetraploid lager strains were derived from admixture between populations closely related to European grape wine strains and Asian rice wine strains. Similar to both lager and baking strains, ale strains are polyploid, providing them with a passive means of remaining isolated from other populations and providing us with a living relic of their ancestral hybridization. To reconstruct their polyploid origin, we phased the genomes of two ale strains and found ale haplotypes to both be recombinants between European and Asian alleles and to also contain novel alleles derived from extinct or as yet uncharacterized populations. We conclude that modern beer strains are the product of a historical melting pot of fermentation technology.

Author summary The budding yeast Saccharomyces cerevisiae has long been used to make beer. Yeast strains used to make ales are known to differ genetically and phenotypically from strains used to make wine and from strains isolated from nature, such as oak isolates. Beer strains are also known to be polyploid, having more than two copies of their genome per cell. To determine the ancestry of beer strains, we compared the genomes of beer strains with the genomes of a large collection of strains isolated from diverse sources and geographic locations. We found ale, baking, and the S. cerevisiae portion of lager strains to have ancestry that is a mixture of European grape wine strains and Asian rice wine strains and that they carry novel alleles from an extinct or uncharacterized population. The mixed ancestry of beer strains has been maintained in a polyploid state, which provided a means of strain diversification through gain or loss of genetic variation within a strain but also a means of maintaining brewing characteristics by reducing or eliminating genetic exchange with other strains. Our results show that ale strains emerged from a mixture of previously used fermentation technology.

Citation: Fay JC, Liu P, Ong GT, Dunham MJ, Cromie GA, Jeffery EW, et al. (2019) A polyploid admixed origin of beer yeasts derived from European and Asian wine populations. PLoS Biol 17(3): e3000147. https://doi.org/10.1371/journal.pbio.3000147 Academic Editor: Jeff Gore, MIT, UNITED STATES Received: November 8, 2018; Accepted: January 30, 2019; Published: March 5, 2019 Copyright: © 2019 Fay et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Sequence reads are available under PRJNA504476 at NCBI's short read archive. Genotype data and data underlying the figures are available at http://doi.org/10.6084/m9.figshare.7550009.v1, as described in the File Summary. Funding: This work was supported by a National Institutes of Health grant (GM080669) to J. Fay, the Rita Allen Foundation, a gift from Karl Handelsman, and a National Science Foundation grant (1516330) to M. Dunham. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction The brewer's yeast Saccharomyces cerevisiae is known for its strong fermentative characteristics. The preference for fermentation in the presence of oxygen arose as a multistep evolutionary process around the time of an ancient genome duplication, endowing numerous species with the ability to produce levels of ethanol toxic to many microorganisms [1,2]. One of these species, S. cerevisiae, also gained the ability to competitively dominate many other species in high-sugar, low-nutrient environments, such as grape must [3]. Wine is largely fermented by S. cerevisiae and is thought to be the first fermented beverage, having been made for over 9,000 years [4]. However, S. cerevisiae is not the only Saccharomyces species used to make fermented beverages; others, particularly S. uvarum, S. kudriavzevii, S. eubayanus, and hybrid derivatives, are also used, particularly for fermentations at low temperatures [5–8]. Besides S. cerevisiae, the most widely used species is S. pastorianus, an allopolyploid hybrid of S. cerevisiae and S. eubayanus, used to make lager beer [7]. The use of this hybrid emerged during the 15th century in Europe and was formed from an S. eubayanus strain closely related to wild populations from North America and Tibet [9,10] and a S. cerevisiae strain related to those used to ferment ales [11–13]. The origin of ale and other domesticated strains of S. cerevisiae is beginning to emerge through comparison with wild populations [12–16]. Multiple genetically distinct populations of S. cerevisiae have been found associated with fermented foods and beverage. These include grape wine, Champagne, sake and rice wine, palm wine, coffee, cacao, cheese, and leavened bread [14,17–20]. Ale strains have also been found to be both genetically and phenotypically differentiated from other strains [12,13]. Multiple populations of ale strains have been identified and found to exhibit high rates of heterozygosity and polyploidy [12,13,16]. However, the origin of such domesticated groups is not always clear because it requires comparison to wild populations from which they were derived, and these wild populations have not all been identified. The best characterized wild populations of S. cerevisiae have been isolated from oak and other trees in North America, Japan, China, and Europe [21–24], the latter of which is most closely related to and the presumed wild lineage from which European wine strains were derived. Despite clear differences among many domesticated groups, human-associated admixture is common [20,22,25,26] and can blur the provenance of domesticated strains. For example, wine strains show a clear signature of admixture with other populations, and clinical strains appear to be primarily derived from admixed wine populations [27–29]. Ale strains, with the exception of a few found related to sake and European wine lineages, have no obvious wild population from which they were derived [12,13]. In this study, we examined the origin of ale and lager strains in relation to a diverse collection of S. cerevisiae strains. Through analysis of publicly available genomes and 107 newly sequenced genomes, we inferred a hybrid, polyploid origin of beer strains derived from admixture between close relatives of European and Asian wine strains. This admixture suggests that early industrial strains spread with brewing technology to give rise to modern beer strains, similar to the spread of domesticated plant species with agriculture.

Discussion Inferring the origin of domesticated organisms can be complicated by extinction of wild progenitor populations, human-associated migration, polyploidy, and admixture with wild populations. In this study, we find that extant beer strains are polyploid and have an admixed origin between close relatives of European and Asian wine strains. Ale genomes, like lager genomes, carry relics of their parental genomes captured in a polyploid state as well as novel beer alleles from an extinct or undiscovered population. Loss of heterozygosity through mitotic exchange provided a means of strain diversification but has also potentially eroded precise inference of the timing and order of events giving rise to modern beer strains. Below, we discuss models and implications for an admixed, polyploid origin of beer strains. Polyploidy is thought to mediate rapid evolution [36], and prior work showed that polyploidy is common in beer and baking strains [12,18,31]. We find that the Ale 1, Ale 2, and Beer/baking population all have a polyploid origin. Although not all strains had sufficient coverage for calling polyploidy, all those that did were either triploid or tetraploid. Chromosome level aneuploidy is also more common in strains within the Ale 1 (52%), Ale 2 (19%), and Beer/baking (52%) populations than in the nonbeer populations (5.1%). A notable consequence of both polyploidy and aneuploidy is that they can limit admixture with haploid or diploid strains due to low spore viability [34,37,38], thereby maintaining their brewing characteristics. Indeed, beer strains exhibit low sporulation efficiency and spore viability [12]. Both grape wine and particularly sake wine strains have also evolved more limited capacities to interbreed through low sporulation efficiencies [39,40]. Human-associated admixture is well documented in wine strains, which have been dispersed around the globe with the spread of viticulture [20,22,25,26]. However, admixture between close relatives of European grape wine and Asian rice wine populations presents a conundrum regarding where and how these populations became admixed. A crucial yet unresolved piece of information is where European wine strains were domesticated. The discovery of a Mediterranean oak population closely related to European wine strains suggests a European origin of wine strains [21]. An alternative model is that the Mediterranean oak population is a feral wine population and both the European wine and Mediterranean oak populations are nonnative. Analysis of a diverse collection of Asian strains suggested an East Asian origin of all domesticated S. cerevisiae strains, including European wine strains [14]. Domestic populations from solid and liquid state fermentations (bread, milk, distilled liquors, rice wines, and barley wines) were found related to wild populations from East Asia. In support of European wine and Mediterranean oak populations also originating in East Asia, these populations carry duplicated genes involved in maltose metabolism and grouped with fermented milk and other strains isolated from China. However, this model also has some uncertainty given the small number of Chinese isolates within the European wine group, the dispersion of European wine strains with viticulture, and the absence of samples from the Caucasus where grapes are thought to have been domesticated [4,41]. Considering the uncertainty of where European wine strains were domesticated, we put forth two hypotheses regarding the admixed origin of beer strains. First, European wine strains were domesticated in East Asia and admixed in situ with a population related to the Asia/sake group, which contains eight sake/rice wine strains, seven distillery strains, and seven bioethanol strains, mostly from Asia. Second, European wine strains were domesticated in Europe from a Mediterranean oak population, or perhaps in the Caucasus, and the admixed beer populations arose through East–West transfer of fermentation technology, including yeast by way of the Silk Route. Resolving these scenarios would be greatly facilitated by finding putative parental populations of diploid but not necessarily wild strains that carry alleles we find to be unique to the Ale 1, Ale 2, Beer/baking, and Lager groups. As yet, such populations have not been sampled or are extinct. Even with a clear signature of a polyploid and admixed origin of beer strains, there are uncertainties regarding the founding strains and the order of events. The decay in linkage disequilibrium suggests that admixture occurred prior to polyploidy, and the distribution of beer-specific alleles suggests that admixture involved at least one uncharacterized population. However, polyploid genomes are often labile, and it is hard to know the extent to which mitotic recombination and gene conversion have altered genetic variation in the beer strains. In yeast, the rate of mitotic gene conversion and recombination has been estimated to be 1.3 × 10−6 per cell division and 7 × 10−6 per 120 kb, respectively [42,43], and both can lead to loss of heterozygosity. Converting to the size of a tetraploid genome (approximately 48 Mbp), we expect 0.0038 (using a median track length of 16.6 kb) conversion events and 0.0028 recombination events across the genome per cell division. Three lines of evidence support the role of these mitotic events in beer strains. First, many of the switches between the European and Asian alleles involved one or a small number of adjacent SNPs rather than long segments, indicative of gene conversion (S4 Table). Second, one strain (A.2565) shows clear loss of heterozygosity on multiple chromosomes, indicative of mitotic recombination (S4 Fig). Third, there is substantial genotype diversity within each of the beer populations (Fig 3). This would be expected to occur if loss of heterozygosity occurred during strain divergence but subsequent to the founding of each beer population. Two other factors besides mitotic gene conversion and recombination must be considered in regards to diversity within the beer populations—outcrossing and de novo mutation. Outcrossing with strains outside of the beer population is unlikely because there is no evidence for this type of admixture in our analysis and admixture proportions from the Asian population is fairly constant at 37% to 47% across beer strains. However, it is worth noting that outcrossing of strains within or between different beer populations may not easily be detected. De novo mutations have undoubtedly occurred, but even using a reasonable estimate of 150 generations per year for brewing strains [12] and a per base mutation rate of 5 × 10−10 [44], the beer lineage substitution rates yield divergence times of 2.0 × 104 (Ale 1), 1.3 × 104 (Ale 2), 1.1 × 104 (Beer/baking), and 9.2 × 103 (Lager) years. Therefore, a sizable fraction of beer-specific alleles was likely inherited from populations closely related to European wine and Asian wine populations rather than de novo mutations that accumulated subsequent to polyploidy. Regardless of the relative impact of mitotic recombination, gene conversion, outcrossing, and de novo mutation, beer strains have diversified from one another but have remained relatively distinct from other populations of S. cerevisiae [12,13]. In conclusion, beer strains are the polyploid descendants of strains related to but not identical to European grape wine and Asian rice wine strains. Therefore, similar to the multiple origins of domesticated plants, including barley [45] and rice [46,47], beer yeasts are the products of admixture between different domesticated populations and benefited from historical transfer of fermentation technology.

Materials and methods Genome sequencing and reference genomes Genome sequencing was completed for 47 commercial yeast strains, which include 33 ale, 7 lager, 2 whiskey, and 5 baking strains. For reference, sequencing was also completed for 60 strains of diverse origin, including 22 isolates from trees or other nonhuman-associated sources and 38 isolates from human-associated ferments such as togwa, coffee, and cacao (S1 Table). For each strain, DNA was extracted and indexed libraries were sequenced on Illumina machines (NextSeq, HiSeq2000, or HiSeq2500). A median of 10.7 million reads per strain was obtained, ranging from 272,000 to 26 million. The sequencing data is available at NCBI (PRJNA504476). Genomic data was obtained for 430 strains from publicly available databases. These include 138 additional beer strains from [12,13]. We also obtained reference genomes for S. paradoxus, S. mikatae [48], and S. eubayanus (SEUB3.0) [49]. Two large sets of recently published genomes [14,16] were obtained for comparison with our set of 537 genomes. Genotype calls for SNPs identified in this study were obtained from gvcf files of the 1,011 yeast genomes project [16], and genotype calls were generated for 266 strains from China [14] using the mapping and genotyping pipeline described below. Because these two later sets of data were only available recently, they were only incorporated into the S1 Fig heatmap. Alignment, variant calling, and genotyping Reads were aligned to the S. cerevisiae S288c reference genome (R64-1-1_20110203) using BWA-v0.7.12-r1039 [50]. Lager strains were mapped to a concatenated S. cerevisiae and S. eubayanus genome and reads mapping to S. eubayanus were discarded. For short reads (<70 bp), we used BWA-sampe, and for the remainder, we used BWA-mem. Duplicate reads were marked prior to genotyping. Assembled genomes were also mapped using BWA-mem, and flags for secondary alignments were removed to facilitate complete mapping of large contigs. For S. paradoxus and S. mikatae, we obtained higher coverage of the S288c genome by mapping synthetic reads fromshredded contigs compared to mapping of full contigs and so used the former. SNPs were called using short read data and then genotyped in those strains with assembled genomes. For SNP calling, we used GATK-UnifiedGenotyper-v3.3–0 [51] and applied the hard filters: QD < 5, FS > 60, MQ < 40, MQRankSum < −12.5, and ReadPosRankSum < −8. The dataset was filtered to remove strains and sites with more than 10% missing data. Among those strains removed were lager strains of the type 1 Saaz group [11], but we retained S. paradoxus and S. mikatae for which we obtained calls at 78% and 40% of sites, respectively. Biallelic SNPs with a minor allele frequency of at least 1% and with at least four minor allele genotype calls were selected for analysis, resulting in a total of 273,963 SNPs. The 399 strains retained for analysis are listed in S2 Table, and the genotype data is available in variant call format from http://doi.org/10.6084/m9.figshare.7550009.v1. Genotype calls for these SNPs were also obtained for the 1,277 strains in the comparative data set [14,16]. To estimate our genotyping error rate, we compared six pairs of strains that were independently sequenced. Two of the strains, YJF153 and BC217, were haploid derivatives of diploids strains, YPS163 [52] and BC187 [53], respectively, that were also sequenced. The other four pairs were all beer strains independently obtain from Wyeast (Wyeast 1728, 1968, 2565, 2112) and independently sequenced at Washington University in St. Louis and University of Washington in Seattle. Between the pairs of strains, we found genotype discordance rates of 9.62 × 10−4 (YJF153/YPS163), 1.31 × 10−3 (BC217/BC187), 3.57 × 10−3 (L.2112/YMD1874), 3.00 × 10−3 (A.2565/YMD1952), 1.81 × 10−2 (A.1968/YMD1981), and 5.74 × 10−3 (A.1728/YMD1866). We retained the six pairs of strains throughout the analysis as a measure of robustness. Ploidy and aneuploidy Ploidy and aneuploidy were assessed by read counts at heterozygous sites and read coverage, respectively. For ploidy analysis, genotypes of 317 strains were from assemblies, and so no information on heterozygous sites was available, and 117 strains had few heterozygous sites indicating they were haploid or homozygous diploid. Of the remaining 105 strains, 66 had sufficient coverage at heterozygous sites to make visual designations of ploidy [20,54,55]. Visual designations were based on dominant trends consistent with expected percentage of read counts supporting—diploid (50:50), triploid (33:66), tetraploid (25:50:75) allele configurations. Of the 39 strains without sufficient coverage to distinguish triploids from tetraploids, most (33) showed distributions consistent with polyploidy (ploidy > 2), and of these, 29 were beer strains (S2 Fig). Aneuploidy was assessed by visual inspection of read coverage across the genome. Aneuploidy was only called for clear cases in which one or more chromosomes showed a deviation in read coverage compared to all other chromosomes. Population structure and admixture Population structure was inferred by running ADMIXTURE [56] on a set of 20,394 sites with a minimum physical distance of 500 bp. The variants from 138 strains in a recent study of beer strains [12] were removed because the assemblies eliminated heterozygous sites and raw reads for these genomes were not available. Based on 20 independent runs using between 4 and 20 populations for the 399 strains, we chose 13 based on an average change in the log-likelihood greater than 3 standard deviations of the variation in the log-likelihood among independent runs (S3 Fig). The beer populations of interest were not affected by this choice; with 12 populations, the 2 Japanese populations merged and with 14—a new population of admixed European wine strains was formed (S3 Fig). Population admixture graphs were inferred using Treemix [30]. A subset of 199 strains with less than 1% admixture were used to generate a population admixture graph. The population from China was used to root the tree because two strains in the China population, HN6 and SX6, were most closely related to both S. paradoxus and S. mikatae, and blocks of 500 SNPs were used to obtain jacknife standard errors. Five episodes of migration were inferred (P < 4.9 × 10−12), with weights ranging from 0.18 to 0.49. Migration events were validated using f 4 tests of admixture (S3 Table). For tests of tree discordance, we did not use the clinical and lab populations as reference populations because these showed evidence of admixture. f 4 admixture proportions were estimated by the ratio of f 4 (Mediterranean, Africa; test, Europe) to f 4 (Mediterranean, Africa; Asia, Europe), in which each of the 64 beer strains in the Ale 1, Ale 2, lager, and beer/baking populations were individually tested. Long-read phasing Three strains were selected for PacBio sequencing and variant phasing. Two of the strains were beer strains, A.2565 and A.T58, and the third, YJF1460, was a hybrid we generated by mating a European/wine strain (BC217) and a Japan/North America 2 oak strain (YJF153). PacBio reads were aligned to the S288c reference genome using Blasr [57], and heterozygous variants in each genome were phased using HapCUT2 [58], and our own heuristic phasing method that accounts for variable ploidy levels across the genome. Average coverage at 56k, 59k, and 33k variant sites was 13.1, 18.8, and 13.0 for YJF1460, A.T58, and A.2565, respectively. Our custom phasing method used the variant call format files and fragment files from HapCUT2 as input, and output a variable number of phased haplotypes. HapCUT2 fragment files were generated with minimum base quality of 10. Reads were merged into haplotypes using a minimum overlap of four matching SNPs and a minimum of 80% matching SNPs. Reads were iteratively joined to haplotypes using the best scoring overlap based on score = matches– 5 × mismatches. Haplotypes were formed by three rounds of merging. In the first round, reads were merged into haplotypes without any mismatches. In the second and third rounds, haplotypes were merged using the criteria defined above. Error rates were estimated by counting the minimum number of mismatches of reads to the final set of haplotypes. Error rates of 1.84%, 2.03%, and 1.90% were obtained from comparison of reads to 3337, 2452, and 2607 haplotype alleles for YFJ1460, A.T58, and A.2565, respectively. The average number of haplotypes at phased sites was 2.29, 3.27, and 2.98 for YFJ1460, A.T58, and A.2565, respectively. Sites where three haplotypes were inferred in the YJF1460 control are largely due to overlapping haplotypes that were too short to merge. The long read data, custom phasing script and inferred haplotypes are available from http://doi.org/10.6084/m9.figshare.7550009.v1. After phasing, two sets of SNPs were selected for analysis. The first set consisted of nearly fixed differences between the Europe/wine and Asia/sake populations. After excluding strains with more than 1% admixture, there were 34,022 sites with an allele frequency of 99% in Europe/wine strains (n = 47) and less than 1% frequency in Asia/sake strains (n = 28) or vice versa. The nearly fixed differences between Europe/wine and Asia/sake strains were used to quantify switching between European and Asian haplotypes. Switching events were measured by counting switches involving one or more sites, five or more sites, or sites spanning 4 kb or longer (S4 Table). The latter two measures were used to avoid counting switches caused by sequencing errors or mitotic gene conversion events, which should not affect multiple adjacent sites or regions longer than 4 kb [59], respectively. The switching rate for the YJF1460 control was similar to that obtained using HAPCUT2 (S4 Table), which minimizes errors when merging reads but assumes a ploidy of two, and SDhaP [60] run assuming a ploidy of two for YJF1460 and four for the two ale strains. The second set consisted of alleles abundant in the four beer populations but absent in all others. After excluding strains with more than 1% admixture, there were 32,829 sites with allele frequencies over 25% in either the Ale 1 (n = 13), Ale 2 (n = 12), or Beer/baking strains (n = 2), but less than 1% in all other populations. To avoid problems with low-coverage strains, we estimated population allele frequencies from counts of homozygous calls and half of heterozygous calls. Decay in linkage disequilibrium was measured by the covariance in alleles between sites [61]. An exponential decay function was fit to the average covariance of sites binned every 100 bp from 1kb to 50kb. Rather than weight linkage disequilibrium based on the allele frequency differences between the two admixed populations, we used the unweighted covariance across 34,022 sites that show nearly fixed differences between the Europe/wine and Asia/sake population. For the phased strains, we used the covariance across sites on the same haplotypes. For the population decay estimates, we only used strains with 99% or more ancestry assigned to either the Clinical, Laboratory, Ale 1, Ale 2, Beer/baking, and Lager populations. Invariant sites were excluded in each case. We assumed 0.34 kb/cM [35] to translate decay in physical distance to genetic distance and infer the number of meiotic equivalents. We estimated divergence using four-fold degenerate sites in coding sequences. Excluding splice sites and sites with overlapping gene annotations, there were 1,036,317 four-fold degenerate sites surveyed. At these sites, we found 1586, 1040, 899, and 716 alleles at a frequency of 25% or more in the Ale 1, Ale 2, Beer/baking, or Lager population, respectively, but not in any other population.

Acknowledgments We thank Xueying Li, Ching-Hua Shih, Emery Longan, Kathryn Williams, Nilima Walunjkar, Casey Bergman, and Andrea Del Cortona for their comments and suggestions.