Previous studies have examined fungal communities largely in small disease centric cohorts, and information detailing the healthy human mycobiome in a large, well-studied cohort is lacking. In this study, extracted DNA from fecal samples from the Human Microbiome Project was used to investigate what constitutes a normal gut mycobiome. This study represents the first time the fecal mycobiome has been described in a large cohort of healthy individuals (over 100 volunteers), with longitudinal samples provided by each volunteer (up to three samples per volunteer, totaling 317 samples). Furthermore, this is the first study that includes ITS, 18S rRNA gene, 16S rRNA gene, and WGS metagenomic sequencing data on the same samples, thus enabling a validation of methods and correlative analyses. The results indicate that fungal diversity is lower than bacterial diversity in the gut, and that yeast genera such as Saccharomyces, Malassezia, and Candida are the most abundant genera present in this cohort. Candida spp. have commonly been identified as members of the healthy human mycobiome, not only in the gut [9, 20] but also at several other body sites, including the oral cavity [21, 22], vagina [24], and skin [23, 30]. Previous studies have observed high levels of Malassezia at different body sites, describing it as a prominent commensal of the skin and oral mycobiomes [21, 23]. Interestingly, a study by Hoffmann et al. examining the mycobiome of the gut in relation to diet in a smaller set of healthy volunteers recognized Saccharomyces and Candida as prevalent members of the gut mycobiome, but did not identify Malassezia as a member of the gut mycobiota [9]. The discrepancy between the Hoffmann study and the results in the current study are likely due to differences in study methodologies: while this study amplified the ITS2 region of the fungal rRNA operon, Hoffmann et al. amplified the Internal Transcribed Spacer 1 (ITS1) region. In data described in Additional file 4, amplification and sequencing of a fecal samples found that the primers used to amply the ITS1 region (ITS1F and ITS2 [31, 32], also used in the Hoffmann study) did not detect Malassezia, indicating that sequence mismatches in the primers may not allow for optimal amplification of Malassezia DNA. Alternatively, Malassezia may not have been identified in the Hoffmann study due to differences in cohort characteristics, such as diet or geographical location. While volunteers in this study were recruited from Houston, Texas, the volunteers in the Hoffmann study were recruited from Pennsylvania. Differences in climate may impact the fungi to which individuals are exposed, which may in turn impact the colonization of fungi in the gut.

We determined that the gut mycobiome is highly variable between individuals as well as within individuals over time. A similar trend was observed in a study following fungal communities in mice, where it was found that the gut mycobiome varied substantially over time in mice receiving antibiotics as well as untreated control mice [33]. Furthermore, it was observed that different cages of mice receiving the same treatment also varied in their dominant fungal lineage. These findings occurred in mice housed in the same animal facility and on a homogeneous diet. Additionally, a human gut mycobiome study comprised of 24 individuals with two sampling time points found that detection of the same fungus at both time points occurred less than 20% of the time [20]. While the gut mycobiome was found to be variable within individuals, others have shown that the oral mycobiome stays fairly stable over time within an individual [34]. These results prompt a fundamental unanswered question in the field: which, if any, fungi are truly colonizing the human gut? It is known that the human microbiome is greatly impacted by diet, environment, and lifestyle [9, 35,36,37]. However, a limitation to the current culture-independent techniques reported here is that they only assess DNA signatures. Thus, these data cannot distinguish between the DNA contributed from live or dead cells and do not differentiate microbes that are colonizing the gut from transients derived from our diet and/or environment. But culture-dependent studies have identified many of the same abundant fungi we have detected here, including Candida spp. [38,39,40,41,42,43], Saccharomyces cerevisiae [40, 43], Malassezia spp. [38, 39, 44], Penicillium spp. [38,39,40, 42], Cladosporium spp. [38, 42], and Aspergillus spp. [38,39,40, 42, 44]. Candida, Penicillim, and Aspergillus spp. have been identified in fecal samples from many different volunteers across several studies, but Malassezia and Saccharomyces spp. are cultured less consistently. Malassezia has more stringent growth conditions (i.e., it cannot be grown on common yeast-friendly medias like Sabouraud or Potato Dextrose), which could account for its lack of detection in many studies. Saccharomyces, on the other hand, is easily cultured, suggesting its high abundance and prevalence in ITS2 sequencing data may be originating from other sources, especially since it is a common component in many foods. This is also likely the case for Cyberlindnera jadinii, a food additive also known as “torula yeast,” which was found in high abundance in some volunteers. Mycologists Suhr and Hallen-Adams have proposed that the majority of fungal taxa detected in culture-independent studies are likely not viable in the gut due to growth constraints (e.g., several Penicillium species do not grow at 37 °C) or known ecological niches (e.g., Ustilago maydis is an obligate maize pathogen) [45]. Notwithstanding, colonization is not necessary to exert a biologically significant effect on the host (e.g., many proposed probiotics do not necessarily colonize the gut for prolonged periods [46, 47]). More research must be done to determine which fungi, if any, may be colonizing the human gut and how they may be impacting resident microbes and the host.

Comparing results between existing mycobiome studies presents many challenges. First, non-standardized approaches are used by various labs to explore the mycobiome, and analysis strategies are rapidly evolving. Many molecular and bioinformatics methods utilized by researchers were optimized for isolation and analysis of bacterial communities and may not always be appropriate for fungi. Although the extraction method used on HMP stool samples was optimized for bacterial community analysis, we determined that this did not have a significant effect on alpha diversity, beta diversity, or taxonomy compared to an extraction method utilizing harsher mechanical lysis that is similar to methods used in current mycobiome studies (Additional file 5). Furthermore, there is still debate on the optimal region of the rRNA operon to assay for fungal community profiling. While the ITS1 region is a common target for molecular studies, our laboratory and others have found ITS2 may be more suitable for detecting fungal commensals. A closer look at ITS1F and ITS2 primers revealed that these commonly used ITS1 region-targeting primers contain critical mismatches to common fungal taxa found in the human microbiome, including Galactomyces geotrichum, Yarrowia lipolytica, and fungi belonging to the Malasseziales and Tremellales orders [48]. Additionally, available fungal databases are quite sparse and less well-curated compared to bacterial databases, both in terms of the overall number of sequences and the accuracy of taxonomic information. Misidentifications in fungal databases occur frequently, a circumstance that is compounded by fungal dimorphism (the ability of some fungi to change morphologically between hyphal and yeast forms depending on environmental conditions). This phenomenon often results in different studies identifying identical ITS sequences as two different fungi. Moreover, database entries may contain insufficient taxonomic information to correctly identify fungi, leading to the “Fungi sp.” or “unclassified fungi” identifications seen in our and others’ data [20]. Our study found that approximately 17% of OTUs lacked taxonomic information. Finally, availability of fungal genomes is also lacking compared to bacteria, though there are efforts underway to change this [49]. This scarcity of complete fungal genomes makes identifying fungi in complex samples difficult and is compounded by the generally low relative abundance of fungi compared with other microbes. In the HMP samples used in this study, we found that fungal sequences constituted approximately 0.01% of the total number of metagenomic sequences. However, this number may increase as more fungal genomes are sequenced and more data may be mapped to these genomes.

To confirm that no major components of the mycobiome were being missed due to known ITS2 primer bias, a subset of samples were analyzed by broad eukaryotic 18S rRNA gene amplification and sequencing. Only one additional fungal genus, Tritirachium, was detected that was not among the named genera detected by ITS2 sequencing in the 89 shared samples. The discovery of this low abundance genus in a single sample was likely due to further sampling of a diverse sample rather than an ITS2 primer bias. The 18S rRNA gene results lend support to the completeness of the ITS2 fungal data but also demonstrate that fungi are not the only microeukaryotes present in the gut. In particular, the animal gut symbiont Blastocystis was present in 25% (11/44) of the volunteers examined, which is within the carriage range found in other developed countries. In contrast, Dientamoeba fragilis, another intestinal microeukaryote common in some healthy populations, was not detected in HMP samples [50]. The Blastocystis subtypes that were detected (ST1, ST2, ST3) are, together with ST4, the most frequently identified in humans [51]. Colonization by Blastocystis has been associated with increased bacterial diversity [52], and this held true for HMP samples. However, the detection of Blastocystis did not correspond to increased fungal diversity—yet another distinct attribute of the mycobiome. We also found that 18S rRNA gene sequencing data mapped to a variety of presumably dietary sources, such as fish, meat, fowl, and plants, raising the idea that perhaps 18S rRNA gene sequencing data could be used to validate, or as a surrogate for, dietary information collected by questionnaires.