Abstract We investigated genomic diversity of a yeast species that is both an opportunistic pathogen and an important industrial yeast. Under the name Candida krusei, it is responsible for about 2% of yeast infections caused by Candida species in humans. Bloodstream infections with C. krusei are problematic because most isolates are fluconazole-resistant. Under the names Pichia kudriavzevii, Issatchenkia orientalis and Candida glycerinogenes, the same yeast, including genetically modified strains, is used for industrial-scale production of glycerol and succinate. It is also used to make some fermented foods. Here, we sequenced the type strains of C. krusei (CBS573T) and P. kudriavzevii (CBS5147T), as well as 30 other clinical and environmental isolates. Our results show conclusively that they are the same species, with collinear genomes 99.6% identical in DNA sequence. Phylogenetic analysis of SNPs does not segregate clinical and environmental isolates into separate clades, suggesting that C. krusei infections are frequently acquired from the environment. Reduced resistance of strains to fluconazole correlates with the presence of one gene instead of two at the ABC11-ABC1 tandem locus. Most isolates are diploid, but one-quarter are triploid. Loss of heterozygosity is common, including at the mating-type locus. Our PacBio/Illumina assembly of the 10.8 Mb CBS573T genome is resolved into 5 complete chromosomes, and was annotated using RNAseq support. Each of the 5 centromeres is a 35 kb gene desert containing a large inverted repeat. This species is a member of the genus Pichia and family Pichiaceae (the methylotrophic yeasts clade), and so is only distantly related to other pathogenic Candida species.

Author summary Infections with yeasts resistant to antifungal drugs are an increasing cause of concern. One species, Candida krusei, has innate resistance to the widely-used drug fluconazole. It is one of the five most prevalent causes of clinical yeast infections, and is responsible for significant levels of morbidity and mortality in immunocompromised patients. In this study, we show that C. krusei is the same species as Pichia kudriavzevii, a yeast that is regarded as non-pathogenic and has important applications in biotechnology and food industries. We examined the genomes of 20 clinical isolates (‘C. krusei’) and 12 environmental isolates (‘P. kudriavzevii’) and find that there is no genetic distinction between them. The environmental isolates have similar levels of drug resistance to the clinical isolates. As well as providing a resource for future studies of this yeast, our results indicate that caution may be needed in the use of drug-resistant P. kudriavzevii strains for biotechnology and food applications.

Citation: Douglass AP, Offei B, Braun-Galleani S, Coughlan AY, Martos AAR, Ortiz-Merino RA, et al. (2018) Population genomics shows no distinction between pathogenic Candida krusei and environmental Pichia kudriavzevii: One species, four names. PLoS Pathog 14(7): e1007138. https://doi.org/10.1371/journal.ppat.1007138 Editor: Aaron P. Mitchell, Carnegie Mellon University, UNITED STATES Received: April 2, 2018; Accepted: June 5, 2018; Published: July 19, 2018 Copyright: © 2018 Douglass et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The sequence data reported in this manuscript has been submitted to the NCBI nucleotide database with the following accession numbers: P. kudriavzevii CBS573 PacBio/Illumina annotated reference genome sequence (CP028773-CP028778); P. kudriavzevii CBS573 RNAseq Illumina reads (SRA accession SRP139056); P. kudriavzevii CBS573 MATα allele region (MH260578); P. kudriavzevii CBS5147 PacBio genome sequence (CP028531-CP028535); Illumina genomic sequencing reads from 32 P. kudriavzevii strains (SRA accession SRP139299); P. fermentans strain fo/MP/02 (Pferm-PL1) Illumina WGS assembly (QAWB00000000); P. norvegensis strain CBS1922 (Pnorv-NO1) Illumina WGS assembly (QAWC00000000). Funding: This study was supported by the Wellcome Trust PhD programme in Computational Infection Biology (105341/Z/14/Z to APD), and Science Foundation Ireland (13/IA/1910 to KHW). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Pathogenic Candida species are ascomycete yeasts that cause over 46,000 invasive infections annually in the US alone, with a 30% mortality rate [1]. C. albicans is the most common and the most extensively studied, but non-albicans candidiasis infections are becoming increasingly common. The top five pathogenic Candida species in order of prevalence in invasive candidiasis worldwide are C. albicans (52% of infections), C. glabrata (21%), C. tropicalis (14%), C. parapsilosis (9%) and C. krusei (2%) (calculated from data in [2]). Among these, C. krusei is the least-well studied. Although uncommon in the normal human flora, C. krusei is sometimes carried intestinally by healthy individuals and in one remote Amerindian community it was found to be present in over 30% of the population, much higher than C. albicans, and was probably acquired from food or the environment [3]. As well as being associated with humans, C. krusei has been detected in feral pigeons and other wild animals [3, 4]. In 1980, Kurtzman and colleagues proposed that C. krusei is the asexual form (anamorph) of a species whose sexual form (teleomorph) is Pichia kudriavzevii, which would make the two names synonymous. This proposal was initially made on the basis of DNA reassociation and mating tests [5], and later confirmed when the sequences of the D1/D2 regions of the 26S ribosomal DNA of the type strains of C. krusei and P. kudriavzevii were discovered to be identical [6]. When P. kudriavzevii was first formally described in 1960 [7], it was reported to be able to sporulate, but cultures of the type strain were later described as being unable to conjugate or sporulate [8, 9]. A third name, Issatchenkia orientalis, is obsolete [6] but continues to be used in the literature by some laboratories [10, 11]. Strain CBS5147 was deposited as the type strain of I. orientalis [7], but this species was later renamed P. kudriavzevii [6]; C. krusei has a different type strain, CBS573. P. kudriavzevii isolates are widely distributed in nature. They are often encountered in spontaneous fermentations and the species is used to produce several traditional fermented foods [9, 12]. Crucially, this yeast is not regarded as a pathogen. It has been given ‘generally recognized as safe’ status by the US Food and Drug Administration [13] because it has been used for centuries to make food products such as fermented cassava and cacao in Africa, fermented milk in Tibet and Sudan, and maize beverages in Colombia [13]. It is used in starter cultures for sourdough breads [14], and in starters (daqu) for Chinese vinegar production from wheat [15]. It also has potential as a probiotic [16]. P. kudriavzevii is exceptionally stress-tolerant and has a growing role in biotechnology, for production of bioethanol [17, 18] and succinic acid (a high-value platform chemical) [10]. It is also used for the industrial production of glycerol, under the name Candida glycerinogenes ([19]; see Discussion). Publications related to industrial applications generally use the species names P. kudriavzevii, I. orientalis or C. glycerinogenes in preference to C. krusei, possibly because of the negative safety connotations of using a pathogen in a biotechnological or food context. To date, relatively little genetic or genomic investigation has been carried out on isolates of C. krusei and P. kudriavzevii. Genome sequences have been published for four P. kudriavzevii strains ([10, 20–22]) and one C. krusei clinical isolate [23], but none of these provides a chromosome-level assembly or transcriptome-based annotation. Estimates of the number of genes range from 4949 to 7107 [10, 23], and only Cuomo et al. [23] discussed the genome organization and content. Moreover, the type strain of C. krusei has not been sequenced, and the only available sequence for the type strain of P. kudriavzevii is highly fragmented (Y. Takada et al., NCBI accession number BBOI01000000). Although an extensive study of genetic diversity in clinical isolates was conducted using multilocus sequence typing (MLST) [24], there has never been an analysis that compares both clinical (‘C. krusei’) and environmental (‘P. kudriavzevii’) isolates. As a result we do not know whether the clinical and environmental isolates are genetically distinct. It is important to understand the relatedness of these two types of isolate, because if there is no difference between them it could mean that environmental and industrial strains are capable of causing disease. For instance, Saccharomyces cerevisiae used in food products is capable of causing opportunistic infections [25], so the same could be true of P. kudriavzevii. There is also uncertainty about the ploidy of the species. The MLST study indicated that isolates are diploid [24], as did two genome analyses [10, 23], but evidence of triploid and aneuploid strains has also been reported [26]. C. krusei is of particular concern as a pathogen because of its intrinsic resistance to fluconazole, a drug commonly used for long-term antifungal prophylactic treatment of immunocompromised individuals [27, 28]. Fluconazole resistance in C. krusei is not fully understood but appears to have two causes: its ergosterol synthesis enzyme Erg11 has unusually low affinity for fluconazole, and the drug efflux pumps Abc1 and Abc11 are constitutively expressed [26, 27]. Echinocandins such as micafungin are the current drugs of choice for treatment of C. krusei infections, but echinocandin-resistant strains with point mutations in the FKS1 gene have been reported [29, 30]. Additionally, it is possible that drug resistance may differ between clinical and environmental strains, but this has not been investigated. The very large evolutionary distance between C. krusei and other pathogenic Candida species is often not appreciated in clinical settings. By phylogenetic analysis of rDNA and other genes, systematists have determined that P. kudriavzevii/C. krusei is a species in the genus Pichia which is in the family Pichiaceae, often called the methylotrophic yeasts [9, 31]. It uses the universal genetic code (CUG = Leu) [32]. C. krusei (family Pichiaceae), C. albicans (family Debaryomycetaceae) and C. glabrata (family Saccharomycetaceae) are as distantly related to each other as humans are to sea-squirts [33] and it is rather misleading that they are all named Candida (which simply means that a sexual cycle has not been observed in any of them). Apart from C. krusei, the only other known pathogens in family Pichiaceae are the rare species Pichia norvegensis (also called Candida norvegensis) and Pichia cactophila (also called Candida inconspicua) [34]. To better understand their genetics, phylogeny, and drug resistance, we sequenced the type strains of both C. krusei and P. kudriavzevii, as well as 30 other clinical and environmental isolates. We generated a high-quality reference genome for C. krusei CBS573T, using a combined PacBio/Illumina strategy to assemble complete sequences of its 5 nuclear chromosomes and its mitochondrial genome, and annotated it using RNAseq data to detect introns. We investigated genetic diversity, ploidy, and loss of heterozygosity, as well as centromere and mating-type locus structure. Our results show unequivocally that C. krusei and P. kudriavzevii are the same species, that clinical and environmental strains are not distinct, and that high levels of drug resistance are common in environmental isolates. Our work provides a resource for future molecular biology research on this yeast species that has four names and is both an emerging pathogen and an emerging workhorse for biotechnology.

Discussion Our results confirm that P. kudriavzevii and C. krusei are the same species and demonstrate that their genomes are collinear. The discovery that clinical and environmental isolates are interspersed in a phylogenetic tree of strains and do not form distinct clades indicates that there is no justification for continuing to use both names for this species. A third name, I. orientalis, is obsolete, having been formally replaced by the name P. kudriavzevii [6]. Furthermore, we found that the species has a fourth name, Candida glycerinogenes. Since its discovery by Zhuge in 1973, ‘C. glycerinogenes’ has been used in China for the industrial-scale production of glycerol by fermentation of plant carbohydrates [19]. Extensive research has been carried out into its osmotolerance, and genetic manipulation methods have been developed (e.g., [61, 62]). We find that 37 of the 38 C. glycerinogenes gene sequences available in NCBI are virtually identical to P. kudriavzevii sequences, including the 18S rDNA. The existence of multiple names for this species has almost certainly impeded research into it. In keeping with the One Fungus One Name principle [63], we suggest that P. kudriavzevii should be the only name used in future. One of the most unexpected features of the genome is the structure of its centromeres, which consist of a simple but large IR. The 99% DNA sequence identity of the 8–14 kb units that form the IRs means that centromere organization would have been difficult to deduce without long-read PacBio data. The structure of the centromeres most closely resembles those of Komagataella phaffii, another yeast in the methylotrophs clade. However, the K. phaffii centromeres are much smaller, consisting of just a 2-kb IR on each chromosome with a 1-kb central region [40]. The only other yeasts in this clade whose centromeres have been characterized are Ogataea polymorpha, whose centromeres contain clusters of Ty5-like retrotransposons and do not seem to have an IR structure, and Kuraishia capsulata which has been reported to have point centromeres [40, 64]. The P. kudriavzevii genome does not contain any Ty5-like elements. Its centromeres do contain pseudogenes of Ty3-like elements, and these are more abundant at the centromeres than elsewhere in the genome, but the only intact Ty3-like elements are not centromeric. Centromeres with similar IR structures also occur outside the family Pichiaceae, in C. tropicalis (family Debaryomycetaceae) [41] and Schizosaccharomyces pombe (subphylum Taphrinomycotina) [65]. The centromeres of Sch. pombe chromosomes 1 and 2 are similar in size and organization to those of P. kudriavzevii, whereas its third centromere is larger and more complex [40]. An important remaining question concerns the sexual cycle. When P. kudriavzevii was first described, it was reported to be able to sporulate, forming one spore per ascus [7]. Later studies by Kurtzman and colleagues reported that the type strain of P. kudriavzevii does not mate or sporulate [8, 9]. Our discovery that this strain is triploid provides a possible explanation for its failure to sporulate, or at least its failure to produce viable spores. It will be of interest to re-investigate the question of sporulation using strains that are diploid MATa/α heterozygotes. Of the 32 strains we studied, 20 have this status (Table 2). It will also be of interest to test if mating can be induced between strains with MATα/α and MATa/a genotypes. The genome of CBS573 appears to contain a complete repertoire of sexual cycle genes, including pheromone genes (MFa, MFα) and orthologs of most of the genes in the MAPK kinase pathway that controls mating in S. cerevisiae. It also contains orthologs of many genes involved in meiosis, although IME1, the master inducer of meiosis, has not been found in P. kudriavzevii nor any other species outside the family Saccharomycetaceae [66]. A possible explanation for the triploid isolates is that, for example, a MATa/α diploid underwent loss of heterozygosity to become MATα/α, and then mated with a MATa haploid spore to form a MATa/α/α triploid. Because clinical isolates were found to be closely related to environmental isolates, either infections are being acquired opportunistically from the environment, or yeast strains from infected humans are colonizing the environment. In view of the range of sources, the former possibility is more likely. The use of P. kudriavzevii in biotechnology therefore presents a potential hazard to the health of immunocompromised workers, and potentially also to consumers [67]. Moreover, high resistance to fluconazole is common in environmental isolates. The resistance to fluconazole is shared with P. fermentans (Table 2), P. norvegensis and other P. cactophila clade species [28, 34] and therefore seems to be a trait of the whole genus Pichia. C. krusei and P. kudriavzevii are both categorized as Biosafety Level 1 (BSL-1), which is the lowest level of precaution. In another case of a pathogen with a major biotechnological role, it was suggested that a harmless closely related species should be used as a replacement [68]. Similarly, it may be advisable to consider non-pathogenic Pichia species as possible alternatives for some industrial applications. It would also be advisable to set limits on the levels of drug resistance permissible in P. kudriavzevii strains that are used in industry, particularly the food industry.

Methods Yeast strains, DNA and RNA preparation, and sequencing Yeast strains were obtained from the laboratories and culture collections listed in Table 2. For clinical isolates obtained from hospital laboratories, all isolates came from different patients and where possible we obtained isolates that were taken prior to drug treatment. High molecular weight DNA for PacBio sequencing was purified using Qiagen Puregene Yeast/Bacterial Kit B. DNA for Illumina sequencing was harvested from stationary-phase cultures by homogenization with glass beads followed by phenol-chloroform extraction and ethanol precipitation. Purified DNA was concentrated with the Genomic DNA Clean & Concentrator-10 (Zymo Research, catalog D4010). Isolation of mRNA for RNAseq was done using the MasterPure Yeast RNA Purification kit (Epicentre, Madison, WI, USA) from CBS573 cultures grown to early log phase (OD 600 ~ 1) in YPD media at 30°C. PacBio DNA sequencing of strains CBS573 and CBS5147 was done at the Earlham Institute, UK, with 4 SMRT cells per strain. Illumina sequencing of these two strains was also done at the Earlham Institute using Low Input Transposon Enabled (LITE) libraries. Illumina sequencing of all other strains was done by the core facility of the University of Missouri, USA, using TruSeq libraries (coverage details are given in Table 2). Illumina RNA-seq (30 million reads) of CBS573 was done in-house at University College Dublin. Sequence assembly Assembly of the CBS573 PacBio data using HGAP3 software initially produced seven nuclear contigs (115x coverage) and the mitochondrial genome. Overlaps between the ends of two pairs of contigs were merged manually to obtain five near-complete chromosome sequences. Illumina sequencing of the same strain (35x coverage) was then used to error-correct the chromosome sequences, in particular to remove insertion/deletion errors in homopolymer tracts. Error correction was done using Pilon [69] and manual comparison to de novo Illumina contigs assembled by SPAdes version 3.10 [70]. Assembly of the CBS5147 PacBio data using HGAP3 yielded 13 contigs (94x coverage). This assembly appeared to have a higher level of indel errors than the CBS573 assembly, so we used CBS573 as the reference genome sequence for annotation and downstream analyses. De novo assemblies of the other 30 strains were made using SPAdes version 3.10 [70]. Nucleotide sequence identity of 99.6% between the reference chromosome sequences of CBS573 and CBS5147 was calculated from a MUMmer (v3.23) alignment of the whole genome [37]. Ploidy estimation by propidium iodide staining Ploidy was estimated with a modified version of the method of Popolo et al. [71]. Briefly, aliquots of exponentially growing cells in YM medium (3 g/L yeast extract, 3 g/L malt extract, 5 g/L peptone and 10g/L dextrose), were adjusted to OD 600 = 1 in 1 mL sterile ice-cold water, centrifuged (5 min, 5000 rpm) and fixed in 1 mL cold 70% ethanol for 24 hours at 4 °C. Fixed cells were next treated with 100 μL 1 mg/mL RNAse A for 90 min at 37 °C after centrifugation to remove the ethanol. RNase A treated cells were centrifuged and the pellet stained with 100 μL 0.05 mg/mL propidium iodide at 4 °C for 24 hours. Fifty μL of stained cells was diluted to 500 μL with ice-cold water, filtered with 50 μm celltrics filters (Sysmex, UK) and run on a BD Accuri C6 flow cytometer. Data were analysed using Flowjo software (Flowjo, LLC). SNP analysis BAM alignments of Illumina reads from each strain to the CBS573 reference genome were generated using the Burrows-Wheeler Aligner (BWA) with default parameters [46]. Unmapped reads were removed using SAMtools [72] and headers were added using the AddOrReplaceReadGroups program in Picard Tools [http://picard.sourceforge.net]. Variants against the reference were called with the GATK HaplotypeCaller tool in DISCOVERY genotyping mode with the following parameters: “-stand_emit_conf 10 -stand_call_conf 30—emitRefConfidence GVCF” [73]. The resulting set of 32 GVCF files defined an initial set of 169,789 SNP sites that were variable among the 32 strains, and this set was used for ploidy analyses (Fig 5A). To analyze patterns of LOH, we divided the genome into consecutive 50-kb windows and calculated the number of heterozygous SNPs in each window (with allele frequencies between 0.15 and 0.85, from the initial set of 169,789 sites). Windows containing <30 heterozygous sites were categorized as showing LOH (Fig 5B; S2 Table). The threshold of 30 heterozygous sites was chosen because it is a local minimum in the distribution of heterozygous SNP numbers among all windows in all strains. For phylogenetic and STRUCTURE analyses (Fig 6; S3 Fig), we first filtered the 32 GVCF files to remove low allele frequency sites (allele frequency <0.15). The filtered GVCFs were then jointly genotyped with the GenotypeGVCFs function of GATK to produce a single multisample SNP file containing data on every strain. This filtered dataset contained 150,306 variable sites. For phylogenetic analysis of SNP data, we used the program RRHS [74] to preserve the impact of heterozygous SNPs. This program generated 100 datasets in which, for each heterozygous site in each strain, one allele was chosen randomly. Each of the 100 datasets was used to build a phylogenetic tree by maximum likelihood using IQ-TREE v1.6.5 [53], with option “-m GTR+ASC” to account for ascertainment bias. A single unrooted consensus tree was then constructed from these trees (Fig 6). STRUCTURE [54] uses SNP data to infer the population structure of the genomes in the dataset, assuming that the individuals are drawn from k populations. It also provides a value of estimated log likelihood (ln Pr) for the model used. We ran STRUCTURE (v2.3.4) for values of k between 2 and 8, and present the results for the value that gave the highest log likelihood, k = 4. Drug assays Minimum Inhibitory Concentrations (MICs) of the four antifungal agents were determined using the EUCAST broth dilution method [58] with slight modifications. In brief, stock antifungal agents prepared with EUCAST recommended solvents were diluted to appropriate working concentrations in double strength RPMI-1640 2% G (RPMI-1640 supplemented with 2% w/v glucose). The working concentrations used were 128 mg/L for fluconazole and flucytosine, and 32 mg/L for amphotericin B and micafungin. A ten-series two-fold dilution starting with 200 mL working concentration of each agent was made row-wise in flat-bottomed 96-well plates using double strength RPMI-1640 2% G as diluent. Consequently, the wells of each dilution series yielded 100 mL of twice the recommended series of drug concentrations required for MIC determinations. The last two wells of each row containing an antifungal drug serial dilution were filled with 100 mL of drug-free RPMI-1640 2% G. Yeast inocula were prepared by growing three distinct colonies of each strain overnight on Sabouraud agar at 37°C and suspending them in sterile distilled water. To achieve final cell densities of 0.5–2.5 x 106 cfu/ml in the microtitre wells as recommended, these suspensions were adjusted to OD 600 0.1 and then further diluted 1/10 in sterile water. Cell densities were confirmed by plate counting. Wells of each dilution series as well as the 11th well containing drug-free RPMI-1640 2% G were inoculated with 100 mL of the prepared yeast suspensions. The last well was filled with 100 mL of sterile distilled water to serve as contaminant control. Inoculated plates were incubated without shaking at 37°C for 24 hours. Plates were read for OD 600 values using a Spectramax 190 microplate reader (Molecular Devices, Sunnyvale, California, USA). As recommended in the EUCAST protocol [58], we calculated MIC 90 for amphotericin B, and MIC 50 for the other three drugs. One of the recommended control strains for yeasts in the EUCAST protocol is the type strain of C. krusei (CBS573T, synonymous with ATCC6258T), and we used the type strain of C. parapsilosis (CLIB214T) as a second control. MICs for the two control strains were within EUCAST guideline ranges, except for C. krusei CBS573T in amphotericin B, which was 1 dilution point more resistant than the guideline. Accession numbers The sequence data reported in this manuscript has been submitted to the NCBI nucleotide database with the following accession numbers: P. kudriavzevii CBS573 PacBio/Illumina annotated reference genome sequence (CP028773-CP028778); P. kudriavzevii CBS573 RNAseq Illumina reads (SRA accession SRP139056); P. kudriavzevii CBS573 MATα allele region (MH260578); P. kudriavzevii CBS5147 PacBio genome sequence (CP028531-CP028535); Illumina genomic sequencing reads from 32 P. kudriavzevii strains (SRA accession SRP139299); P. fermentans strain fo/MP/02 (Pferm-PL1) Illumina WGS assembly (QAWB00000000); P. norvegensis strain CBS1922 (Pnorv-NO1) Illumina WGS assembly (QAWC00000000).

Supporting information S1 Fig. Collinearity of CBS5147 and CBS573 chromosomes. Dot matrix plots compare PacBio assemblies of CBS5147 chromosomes (Y-axis) versus CBS573 chromosomes (X-axis). Black diagonals indicate matches in the same orientation, and red diagonals indicate matches in opposite orientations. Plots were constructed using DNAMAN (www.lynnon.com), with a criterion of 50 matches per 50-bp window. Bars at the top of the plots show the locations of annotated protein-coding genes in the CBS573 genome, with an absence of genes at the centromeres. https://doi.org/10.1371/journal.ppat.1007138.s001 (TIF) S2 Fig. Complex relationship among P. kudriavzevii centromeres. 50-kb regions around the centromeres of the 5 chromosomes of CBS573 were concatenated and compared in a dot matrix plot. Black diagonals indicate matches in the same orientation, and red diagonals indicate matches in opposite orientations. Dashed lines mark the ends of the 50-kb section from each chromosome. The cyan grid marks the ends of the three sections of each centromere (IRL, left part of the IR; MID, middle region; IRR, right part of the IR). Locations of PkudTy3A pseudogenes (pink triangles) and PkudTy3B pseudogenes (blue triangles) are shown. The plot was constructed using DNAMAN (www.lynnon.com), with a criterion of 50 matches per 50-bp window. https://doi.org/10.1371/journal.ppat.1007138.s002 (TIF) S3 Fig. Population structure analysis of P. kudriavzevii isolates. The diagram was built from a filtered dataset of 150,306 SNP sites using STRUCTURE [54] with k = 4. Each column represents a strain, and the colors represent the proportion of sites belonging to each of the 4 inferred populations. Populations 1–3 form monophyletic clades in the tree in Fig 6, but population 4 does not. https://doi.org/10.1371/journal.ppat.1007138.s003 (TIF) S4 Fig. Drug resistance assays. Growth of strains after 24 hours (OD 600 ) is plotted versus drug concentration for four drugs: (A) fluconazole, (B) flucytosine, (C) amphotericin B, and (D) micafungin. Histograms (insets) show the distribution of Minimum Inhibitory Concentration (MIC) values for all strains. For P. kudriavzevii, only strains designated as relatively resistant (red) or relatively sensitive (green) are identified in the keys; other strains are plotted as gray lines. Two strains of P. fermentans (blue) and the EUCAST control strains of C. krusei (CBS573T; black circles) and C. parapsilosis (CLIB214; black triangles) are also plotted. MIC is defined as the concentration required to inhibit 50% of growth in fluconazole, flucytosine and micafungin, and 90% in amphotericin B [58]. Each data point is the average of three biological replicates. https://doi.org/10.1371/journal.ppat.1007138.s004 (TIF) S1 Table. Evaluation of genome annotation quality using BUSCO. Annotated protein datasets for C. krusei strain 81-B-5 [23], I. orientalis strain SD108 [10], and P. kudriavzevii strain 129 [21] were downloaded from the NCBI database. BUSCO version 3.0.2 (busco.ezlab.org) [36] was used to compare these annotations and our CBS573 annotation to two reference datasets of single-copy genes that are universally conserved in the Ascomycota lineage, or in the Saccharomycetales lineage. The BUSCO reports show the percentages of proteins in the reference datasets whose orthologs are complete (C), fragmented (F), or missing (M) in each annotation. Complete proteins are subdivided into those that are single-copy (S) or duplicated (D) in the annotations. https://doi.org/10.1371/journal.ppat.1007138.s005 (DOCX) S2 Table. SNP densities and extent of loss-of-heterozygosity (LOH) regions in sequenced strains. https://doi.org/10.1371/journal.ppat.1007138.s006 (DOCX) S1 File. Allele frequencies and sequencing coverage in all 32 strains. For each strain, the plots show: (Top left) Allele frequency of non-reference alleles at polymorphic sites along the genome, as in Fig 5A. (Top right) Sequencing coverage along each chromosome. The red points are segmental means. (Lower 6 panels) Histograms of non-reference allele frequencies in the whole genome, and separately for each chromosome. https://doi.org/10.1371/journal.ppat.1007138.s007 (PDF)

Acknowledgments We thank Francoise Dromer, Katie Dunne, Timo Hautala, Gary Moran, Katarzyna Rajkowska, Tom Rogers (St. James’s Hospital, Trinity College Dublin), Barbara Skerlavaj, and Mingfeng Zhao for strains, and Geraldine Butler, Florent Morio, and Tadeusz Krassowski for comments.