Conservation and content of highly reduced genomes

We generated three high-quality genome assemblies, including the first P. murina genome assembly and new assemblies for P. carinii and P. jirovecii, all of which are at or near the chromosome level and contain the complete gene set except for a small number of msg (in all three species) or kexin genes (in P. carinii alone) in several subtelomeric regions that may remain unidentified. Compared with these assemblies, the previously reported assemblies for P. carinii7 and P. jirovecii8 are less complete and continuous, with some genes missing entirely or in part, as evidenced by the shorter average gene and protein lengths, the absence of ∼30% of exons in P. jirovecii (Supplementary Table 1), and the lack of both msg and kexin genes in P. carinii and msg genes in P. jirovecii.

The P. murina genome assembly is 7.5 Mb in size across 17 scaffolds, which range in size from 292 to 588 kb (Supplementary Fig. 1a), consistent with the number and size of chromosomes revealed by electrophoretic karyotyping and Southern blotting (Supplementary Figs 2 and 3; Supplementary Note 1). The P. carinii genome assembly is 7.66 Mb in size across 17 scaffolds, which range in size from 268 to 635 kb (Supplementary Fig. 1b), in line with previously reported chromosome number and size from electrophoretic karyotyping experiments14. The P. jirovecii genome assembly is 8.4 Mb in size across 20 scaffolds, ranging in size from 72 to 635 kb (Supplementary Fig. 1c). Previous studies have estimated the P. jirovecii genome to be 7.0 Mb in size distributed over 12–13 chromosomes15, though this estimate may be inaccurate given the poor quality of available P. jirovecii samples.

The genomes of P. murina and P. carinii are very similar in total length, chromosome structure and gene organization, and show limited rearrangements while the genome of P. jirovecii is highly rearranged (Fig. 1a; Supplementary Fig. 4; Supplementary Note 2). Between two P. jirovecii strains, there is ∼0.3% genetic variation, with subtelomeric regions showing the highest diversity (Fig. 1b) as observed in some other fungi16. Despite these rearrangements, conserved syntenic regions include nearly all genes of P. murina compared with either P. carinii (96.1%) or P. jirovecii (92.9%). In contrast, comparison to other phylogenetically related fungi identified only small blocks of three to seven genes with a conserved order in comparison of P. murina to Schizosaccharomyces pombe17 (8.1%) or Taphrina deformans18 (3.3%). The high number of inter-chromosomal rearrangements in Pneumocystis contrasts to the primarily intra-chromosomal pattern reported for Saccharomycetes19 and other ascomycetes.

Figure 1: Conservation of Pneumocystis genome structure. (a) Conserved synteny among three Pneumocystis genomes. Shared syntenic regions are depicted with grey boxes. Scaffold numbers are listed on the x-axis, and red dots indicate the location of msg genes. (b) Genome-wide SNP frequency between the P. jirovecii isolates from the United States (RU7) and Switzerland (SE8)8 with the same scaffold order as in top panel. The region beyond 8 Mb with a high number of SNPs is composed primarily of small scaffolds containing msg genes not assembled into the 20 large scaffolds. Using our genome assembly (RU7) as a reference, we identified a total of 24,902 SNPs, or 1 every 337 bases, between these 2 isolates, which are over-represented in subtelomeric regions where msg genes are found. Full size image

Compared with other fungi, all three Pneumocystis species have a small genome size and a reduced gene set; between 3,675 and 3,812 genes (including transfer RNA (tRNA) and ribosomal RNA (rRNA) genes) are predicted (Table 1) compared with an average of 5,044 genes in the four related Schizosaccharomyces genomes17. Other unusual features of the Pneumocystis genomes are their possession of only a single copy of rRNA genes (Supplementary Fig. 5), a minimal number of tRNA genes and an extremely low GC content, all of which are among the lowest in eukaryotes (Table 1). These features may reflect a slow transcription and translation machinery (Supplementary Note 3), which, together with the loss of many biosynthetic pathways as discussed below, may lead to the slow growth of Pneumocystis organisms as observed in animal models, where doubling times are estimated to be 5–8 days20, as well as to the failure to grow in vitro. The reduced genome size and content likely reflects adaptation to and dependence on human and other mammalian hosts, as has been previously noted for Microsporidia21.

Table 1 Comparison of Pneumocystis and related fungal genomes. Full size table

Gene family expansions and contractions

To examine the predicted functional impact of genome reduction, we examined gene gain and loss in Pneumocystis relative to seven closely related Ascomycete species (Supplementary Fig. 6) using protein homology and subcellular localization analysis tools (Supplementary Methods). To identify common features of reduced genomes, we also compared these changes in gene content with those in two microsporidial species21, both of which are intracellular organisms and have the smallest known genomes in the fungal kingdom (Fig. 2). Based on a phylogenetic tree inferred from 413 single copy core orthologues (Supplementary Methods), the three Pneumocystis species cluster with Taphrina deformans18 as a sister group to the Schizosaccharomyces species (Supplementary Fig. 6). As previously recognized based on mitochondrial genome sequences22, P. murina and P. carinii are more closely related to each other than either is to P. jirovecii, although all three species show substantial divergence.

Figure 2: Protein domains depleted or enriched in Pneumocystis. Significantly enriched (top panel) or depleted (lower panel) Pfam domains (Fisher’s exact test, q value<0.05) are included in the heat map if the domains appear at least twice in the following comparisons: Pneumocystis versus Schizosaccharomyces, Pneumocystis versus Schizosaccharomyces and T. deformans, Pneumocystis versus S. cerevisiae and C. albicans, Pneumocystis versus E. cuniculi and E. intestinalis, Pneumocystis versus all others shown. Broader functional categories of proteins are indicated on the left, while specific Pfam domains are listed on the right. The number of proteins containing each domain is indicated within each box for each species. The heat map is colour coded based on a Z score, as indicated by the key at the bottom right. Fungal species are ordered based on their phylogenetic relationship as indicated at the bottom. Full size image

Protein domains enriched in Pneumocystis include host-interacting cell-surface proteins and proteins required for basic cellular functions. The most significantly enriched domains in Pneumocystis are the Msg domains, which are shared among Pneumocystis species but absent in all other sequenced species. Msg is encoded by a large multi-copy gene superfamily (Fig. 3; Supplementary Table 3; Supplementary Data 1; Supplementary Note 4). Msg genes account for about 3–6% of an otherwise highly reduced genome, suggesting that the encoded proteins play an essential role in the organism’s survival. We found extraordinary diversity in genes encoding the Msg superfamily; there were 64 to 179 unique genes per species, which represent the largest surface protein family identified to date in the fungal kingdom23 and which show conservation among most Msg families or subfamilies across species, but also species-specific amplifications. Based on domain structure and phylogenetic analysis, the Msg superfamily is classified into five families, designated as Msg-A, -B, -C, -D and -E families (Fig. 3). The high diversity of these Msg families therefore could specify different proteins at the cell surface, both between strains and between species. Msg is thought to play an important role in host–pathogen interactions and potentially facilitates evasion of host immune responses through antigenic variation23,24. The fact that different P. jirovecii strains have unique msg repertoires2,25 dramatically expands the potential for antigenic variation, possibly against T-cell rather than B-cell host responses26. Msg antigenic variation by gene recombination is supported by the presence of homologoues to all the key genes involved in homologous recombination in Saccharomyces cerevisiae (Supplementary Data 2).

Figure 3: The Msg superfamily in three Pneumocystis species. (a) Phylogeny of 384 Msg proteins identified in P. murina (blue squares), P. carinii (pink circles) and P. jirovecii (green diamonds). They are classified into five families of Msg-A, -B, -C, -D and -E, as indicated by the vertical bars on the right side. The Msg-A family is further classified into three subfamilies of Msg-A1 (classical Msg genes), Msg-A2 (Msr genes) and Msg-A3 (other Msg-associated genes). (b) Schematic representations of conserved domains in five Msg families. (c) Sequence logos showing the frequency of amino acid composition in Msg domains. Previously identified Pfam MSG and Pfam Msg2_C domains are included for comparison. Additional information on the Msg domain analysis is provided in Supplementary Note 4. Full size image

Peptidases of the S8 and M16 families are also highly enriched (Fig. 2; Supplementary Data 3; Supplementary Fig. 7). A large difference between the Pneumocystis species is the expansion of the S8B peptidase subfamily (kexin) in P. carinii, which contains 39 copies compared with 0–1 copy in P. murina, P. jirovecii and the other analysed fungi. Kexin is potentially involved in the processing of Msg proteins in the Golgi27,28,29, though in P. carinii many of the kexin genes encode proteins predicted to be on the cell surface through a glycosylphosphatidylinositol anchor29 (Supplementary Fig. 7b). The protein encoded by the single-copy kexin gene in both P. murina and P. jirovecii has been predicted to localize to the Golgi apparatus27,28. Other enriched domains include another cell-surface family, the cysteine-rich CFEM (common in fungal extracellular membrane) domain (Supplementary Fig. 8), as well as proteins involved in regulation of transcription, translation and other cellular activities (histone deacetylase family, ATPase family and the RNA recognition motif (RRM), Supplementary Data 4–6), all of which potentially facilitate the survival of Pneumocystis organisms in the host as discussed below.

By contrast, all three Pneumocystis species show extensive reduction of multiple gene families, as expected from their small genome size (Fig. 2). The most significantly reduced Pfam domains in Pneumocystis comprise the following three major categories: (1) transporters, with <25 transporters in each Pneumocystis species compared with 131–217 in other ascomycetes for the 6 depleted Pfam domains; (2) transcription factors, with only 3 genes in each Pneumocystis species in the 3 depleted Pfam domains, an order of magnitude fewer than other fungi except for Microsporidia; (3) enzymes, including oxidoreductases, hydrolases, transferases and coenzymes. Most of the significantly depleted domains in Pneumocystis are also depleted in Microsporidia (Fig. 2), suggesting common dependencies of these obligate pathogens on their hosts to complement these biologic functions. While additional transcription factor and transporter-associated domains are conserved (Supplementary Data 7 and 8; Supplementary Fig. 9), the total number of transcription factors and transporters in Pneumocystis is among the lowest in fungi.

Substantial reduction and unique features of metabolic pathways

Since the chromosome-level genome assemblies we generated are more complete than previously reported assemblies7,8, we mapped all major metabolic pathways for all three Pneumocystis species (Figs 4 and 5; Supplementary Figs 10 and 11; Supplementary Data 9–19; Supplementary Note 5). Among the most significantly reduced pathways are those involved in amino acid metabolism. As previously noted8,12, each Pneumocystis species lacks ∼80% of genes involved in de novo amino acid synthesis in yeast (Supplementary Data 9). In addition, Pneumocystis has impaired capacity for assimilation of inorganic nitrogen and sulfur10. Consequently, none of the 20 standard amino acids can be synthesized de novo although a few can be synthesized from others. Moreover, in contrast to these earlier reports, we have identified only 1 potential amino acid transporter (Ptr2), which is predicted to localize to the plasma membrane in each species compared with over 20 such transporters in yeasts. Nevertheless, intracellular transport appears highly conserved; nearly half of the 26 mitochondrion- and vacuole-associated amino acid transporters in yeast are preserved. While it is believed that polyamines are ubiquitous in all organisms and serve diverse functions, none of the three Pneumocystis genomes encodes any of the enzymes necessary for de novo synthesis of polyamines, though there is one potential polyamine transporter in each species, consistent with previous in vitro studies of P. carinii30. All three Pneumocystis species have retained the genes required for de novo nucleotide synthesis but are missing nearly all the genes for nucleotide salvage pathways (Supplementary Data 10).

Figure 4: Reduction of carbohydrate and lipid metabolism in Pneumocystis. A condensed version of pathways highlights retained (green arrows) and lost (grey arrows) pathways. Enzymes and membrane transporters absent in all three Pneumocystis species are highlighted in red font; those retained in all three species are highlighted in blue. Yellow and grey boxes indicate metabolites present and absent, respectively. Some metabolites are included more than once as they interact with multiple pathways, in which case the yellow or grey colouring refers to their role in different pathways. Enzyme Sur2 (in pink) is present in only P. murina but not P. carinii or P. jirovecii. Enzyme Mct1 (in pink) is present in both P. murina and P. carinii but not P. jirovecii. The boxed question mark (‘?’) leading to inositol indicates a hypothetical enzyme. The names of enzymes, transporters and metabolites follow the standard abbreviated names for S. cerevisiae. The enzyme and transporter names containing two or more digits represent duplicated enzymes and transporters. Full size image

Figure 5: Summary of mechanisms of adaptation to host lungs by Pneumocystis. Different mechanisms are highlighted by different colours. Potential mechanisms of uptake of nutrients (which cannot be synthesized de novo) include the use of plasma membrane-localized transporters (indicated by T), conversion of other metabolites scavenged from hosts (indicated by C), endocytosis (indicated by E) and unknown (indicated by U). §Potential uptake of haem or haemoglobin from lungs by endocytosis mediated by CFEM domain-containing proteins. *Cholesterol biosynthesis pathway is retained in P. jirovecii but lost in P. murina and P. carinii. ¶β-glucan is present in cysts and absent in trophic forms. β-glucan as well as chitin and mannan in other fungal pathogens are known pathogen-associated molecular patterns (PAMP) involved in host immune recognition; none of these components is detected in Pneumocystis organisms except for the presence of β-glucan in the cyst form (Figs 6 and 7; Supplementary Fig. 13). Pneumocystis is the only fungus identified to date that cannot synthesize chitin. Full size image

For carbohydrate metabolism, loss of a subset of pathways further highlights nutritional dependency. All three Pneumocystis species have a full complement of genes necessary for uptake and catabolism of glucose via glycolysis and the tricarboxylic acid (TCA) cycle (Fig. 4; Supplementary Data 11). In addition, all three species have all the enzymes necessary to convert fructose and mannose to glucose, and to synthesize and utilize glycogen and trehalose. However, key enzymes that convert galactose and sucrose to glucose are missing. Additional notable losses include two enzymes for glyoxylation8, one key enzyme for gluconeogenesis, and all enzymes for pyruvate fermentation. These findings, together with the identification of almost all genes involved in oxidative phosphorylation (Supplementary Data 16), suggest that energy production in Pneumocystis largely relies on glucose through oxidative pathways.

Lipid metabolism genes in Pneumocystis are also greatly reduced in number, thus resulting in distinctive differences in predicted lipid content compared with other fungi (Fig. 4; Supplementary Data 12). All three Pneumocystis species are able to synthesize fecosterol and episterol, but unable to convert them to ergosterol, which potentially accounts for the resistance of Pneumocystis to classical antifungal agents as noted previously11. Moreover, only P. jirovecii but not P. murina or P. carinii can synthesize cholesterol13 (Supplementary Fig. 11). These findings support the hypothesis that P. murina and P. carinii, but not necessarily P. jirovecii, may scavenge cholesterol from their hosts31, though the genes involved in sterol uptake have not been identified. Utilization of cholesterol rather than ergosterol may be contributing to a less rigid cell wall, thus allowing development of the trophic form of Pneumocystis since cholesterol-containing membranes are more flexible than ergosterol-containing membranes. All three Pneumocystis species lack not only the de novo synthesis pathways for myo-inositol, choline, complex sphingolipids, ether lipids, phosphatidylinositol, phosphatidylcholine and fatty acids (the cytosolic pathway involving fas1 and fas2 genes) but also the transporters for direct uptake of these lipids from external sources (Fig. 4; Supplementary Data 12). Nevertheless, alternative mechanisms could supply cells with phosphatidylinositol, phosphatidylcholine, inositol and choline (Fig. 4; Supplementary Note 5).

Strikingly, most of the genes involved in fatty acid β-oxidation are missing in all three species, suggesting that fatty acids are not an energy source for Pneumocystis, further supporting a high reliance on glucose as the main energy source. Of note, all three Pneumocystis species lack the enzymes for synthesis of glycerol from glycerone-phosphate or monoacylglycerol but encode homologues of yeast proteins (Gup1 and Fps1) responsible for glycerol uptake and export (Fig. 4; Supplementary Fig. 9b). Although glycerol is the only non-sugar carbon source that can enter into the TCA cycle, and is also required for synthesis of glycerolphospholipids, these two transporters may also play an important role in maintaining osmotic balance given the loss of critical cell wall components as discussed below.

Cofactor metabolism is also largely reduced in Pneumocystis (Supplementary Data 13). All three Pneumocystis species are missing almost all enzymes required for de novo synthesis and membrane transport of pantothenate, but retain all enzymes needed to convert pantothenate to CoA, as well as a mitochondrial carrier protein (Leu5) for CoA (Supplementary Fig. 10), implying possible scavenging of pantothenate or its downstream metabolites from the host by other mechanisms (for example, endocytosis. Fig. 5; Supplementary Data 14). All three Pneumocystis species lack enzymes required for de novo synthesis of vitamins B1 and H, as well as ubiquinone and siderophores, but have a potential plasma membrane transporter for each of these cofactors (Supplementary Data 13; Fig. 4; Supplementary Fig. 9c,d). P. jirovecii also lacks enzymes for de novo synthesis of NAD, while retaining one salvage pathway using exogenous nicotinic acid mononucleotide imported by a transporter (Tna1). In contrast, both P. murina and P. carinii have all the enzymes and the transporter needed for de novo synthesis and salvage of NAD. All three Pneumocystis species show a near complete absence of proteins necessary for reductive iron assimilation and siderophore biosynthesis32 (Supplementary Data 13). Since each Pneumocystis species encodes five proteins with cysteine-rich CFEM domains (Supplementary Fig. 8), it is possible that, like Candida albicans33, Pneumocystis is able to scavenge iron from host haem and haemoglobin using one or more of these proteins as an extracellular haem receptor.

Our analysis suggests that endocytosis may serve as a mechanism for Pneumocystis to obtain nutrients in the absence of multiple different receptors. In support of this hypothesis, we found that each Pneumocystis species encodes nearly all proteins involved in clathrin-dependent endocytosis (Supplementary Data 14). In addition, genes encoding various degradative enzymes (including proteases, lipases, ATPase families and proteasome shown in Supplementary Data 3,5,12 and 15) and transporters localized in mitochondria and vacuoles are expanded or retained, and some of them are highly expressed (Supplementary Data 20), while biosynthetic pathways for many nutrients (including amino acids, lipids and cofactors, as noted above) and plasma-membrane-associated transporter families are completely lost or reduced. Loss of these transporters may reflect low concentrations of their targets in the host lung milieu.

Lack of chitin, outer chain N-mannans and α-glucan in cell wall

Fungal cell walls typically are composed primarily of chitin, chitosan, glucans, mannans and glycoproteins, which are covalently cross-linked together to protect the cell from changes in environmental stresses, while allowing the organism to interact with its environment. The structure and biosynthesis of such cell walls are unique to fungi and thus serve as an excellent target for antifungal agents. The formation and assembly of the Pneumocystis cell wall remain poorly understood, though numerous studies have found it to be rich in glycoproteins and, in cysts only, β-glucans34,35.

Remarkably, none of the three Pneumocystis species encodes the key enzyme chitin synthase required for chitin synthesis, or chitinases involved in chitin degradation during cell wall remodelling (Fig. 6; Supplementary Data 17). The absence of these two gene families strongly suggests that Pneumocystis does not contain chitin in its cell wall. While each Pneumocystis genome does encode homologues of a few accessory proteins not directly involved in chitin synthesis or degradation (such as Chs5 reported elsewhere36), none of these shares any signature protein domains found in chitin synthase or chitinase (Supplementary Fig. 12). In Saccharomyces cerevisiae, Chs5 functions as a component of the exomer complex involved in export of chitin synthase and other membrane proteins37; in Pneumocystis Chs5 could serve only the latter function, given the absence of chitin synthase. While the detection of chitin in P. carinii has been previously reported36,38 based on reactivity with non-specific lectins such as wheat germ agglutinin, this staining may be detecting other molecules based on the absence of chitin synthases in the genome.

Figure 6: Analysis of chitin in Pneumocystis and related fungi. (a) Enzymes and accessory proteins involved in chitin metabolism in fungi. Pneumocystis genomes do not encode any chitin synthase or chitinase, which are present in other fungi, but retain genes encoding four accessory proteins (Supplementary Data 17). (b) Gas chromatograms of partially methylated alditol acetates of P. carinii and S. cerevisiae (control) cell walls. Terminal and 4-linked N-acetylglucosamine signals (T-GlcNAc and 4-GlcNAc in red font) were detected in S. cerevisiae but not in P. carinii. Glucose (Glc) and mannose (Man) signals were detected in both species. (c–e) Detection of chitin with recombinant chitin-binding domain (Alexafluor 488) using P. murina-infected lung tissue (c) and C. albicans-infected kidneys (e) as a positive control. Pneumocystis organisms are demonstrated in d by dual staining with anti-Msg (red), which labels both trophic forms and cysts, and a dectin-Fc construct (green), which labels β-1,3-glucan in cysts. Chitin staining is absent in P. murina but readily detected in C. albicans, while β-1,3-glucan is easily seen in P. murina. Original magnification, × 400; scale bar, 10 μm. Full size image

We assayed for the presence of chitin in the Pneumocystis cell wall using a recombinant chitin-binding domain (CBD). While we found strong reactivity with Candida cell walls by in situ labelling, reactivity with Pneumocystis was totally absent (Fig. 6). In addition, we directly examined the cell wall content of partially purified Pneumocystis organisms and cultured S. cerevisiae cells using mass spectrometric analysis. No chitin-related oligosaccharides were identified in Pneumocystis, while they were detected in S. cerevisiae; glucan-related oligosaccharides were detected in both (Supplementary Note 6). The combination of these genomic and experimental results demonstrates that Pneumocystis is the first identified member of the fungal kingdom that does not have chitin.

β-glucans have been identified in the wall of the cyst form of Pneumocystis, but are absent from the more abundant trophic form34,39. Pneumocystis genomes encode all the enzymes required for β-1,3- and β-1,6-glucan synthesis and degradation34,39,40,41 (Supplementary Data 18). However, Pneumocystis species do not have any genes involved in synthesis and degradation of α-glucan, which has been found in many fungi, and which can block innate immune recognition by the β-glucan receptor42.

Although all Pneumocystis species have abundant surface glycoproteins, especially Msg, little is known about their glycosylation state and other post-translational modifications. We found that Pneumocystis species encode enzymes required for synthesis of the N- and O-linked glycan core structure (containing up to nine mannose residues), which are localized to the endoplasmic reticulum, but lack genes for enzymes residing in the Golgi apparatus, which add mannose outer chains, including all the enzymes comprising mannan polymerase complex I and complex II, and α-1,6-, α-1,2- and α-1,3-mannosyltransferase (Supplementary Data 19). The absence of these enzyme genes suggests that, unlike other fungi, Pneumocystis cell wall proteins including Msg are not highly mannosylated. N-linked profiling of PNGase-F-released N-glycans showed that M5N2 is the predominant N-linked glycan on P. carinii Msg proteins (Supplementary Fig. 13; Supplementary Note 7). Although trace amounts of M6N2 to M9N2 were detected as minor components, mannan type N-glycans with more than nine mannose residues were not detected. Moreover, glycopeptide mapping of Msg tryptic digest by using liquid chromatography-tandem mass spectrometry (LC-MS/MS) identified 31 N-linked glycans in 15 Msg isoforms, all of which carried M5N2 as the predominant component and only 1 of which carried M6N2 as an additional, minor component (Fig. 7; Supplementary Fig. 13; Supplementary Data 21). The lack of N-linked outer chain mannan may allow the organism to avoid recognition by innate immune responses, since in Candida such mannosylation is required for recognition by dectin-2, DC-SIGN and the macrophage mannose receptor of dendritic cells, while mutants that can synthesize only the core structure are poorly recognized43,44.

Figure 7: Lack of hyper-mannose (mannan) glycosylation in Pneumocystis. (a) Diagram of N-linked mannan structure in C. albicans, based on ref. 43. (b) Diagram of N-linked glycans in Pneumocystis, which lack the α-1,6-linked mannose backbone as well as α-1, 2- and α-1,3- linked mannose outer chains seen in C. albicans (square brackets). (c,d) Representative results of tandem mass spectrometry (MS/MS)-higher energy collisional dissociation (HCD) and electron transfer dissociation (ETD) analysis of an N-linked glycopeptide carrying Hexose5 HexNAc2 (M5N2) from one Msg isoform (T552_03736) in P. carinii. (c) MS/MS-HCD spectrum of glycopeptides showing the detection of glycan oxonium ions in the low mass region at m/z 163.0603, 204.0868, 366.1398 and 528.1929 (indicated in red font). A series of fragment ions dues to neutral loss of the glycan moiety were observed as the main fragment ions in the HCD spectrum. Trace amounts of y-type and b-type peptide fragment ions were detected, confirming the sequence of the peptide backbone. (d) MS/MS-ETD spectra of peptide fragment ions with minimal neutral loss of glycan moiety. All expected peptide c-type and z-type fragment ions were detected except c8 fragment ion, confirming the peptide sequence with high confidence, as well as the site and mass of the glycosylation modification. Full size image

Transcription enrichment during infection

The expression level of each annotated gene was estimated using RNA-Seq data from three heavily infected animals each for P. murina and P. carinii. Overall, we find evidence of expression for nearly all genes (99%, see Methods) but variation in expression level over 5 orders of magnitude. Using Gene Set Enrichment Analysis (GSEA45), we identified functional categories that were enriched in highly expressed genes in each species (Supplementary Data 20; Supplementary Fig. 14). The most highly enriched categories include Msgs and predicted secreted proteins (24–25 other than Msgs in each species). The enrichment of the latter suggests that additional secreted proteins may play an important role during infection. In addition, many functions involved in general metabolism of RNA and proteins, including the RNA RRM and LSM domain, are enriched among highly expressed genes in Pneumocystis.

Each Pneumocystis genome contains an exceptionally high intron density, with an average of 5 introns per gene, similar to only a few fungal genomes such as Cryptococcus with high levels of splicing and only slightly fewer than that in mammalian genomes (7–9 introns per gene)46. This is unusual for highly compacted fungal genomes where intron loss is usually observed, as in S. cerevisiae47 and Microsporidia48. The transcription and splicing process for genes with many introns requires higher energy and cellular resources; this appears correlated to the relative expansion and high-level expression of RRM domain-containing genes (Fig. 2; Supplementary Fig. 14; Supplementary Data 6) and genes involved in spliceosome and mRNA surveillance in all Pneumocystis genomes (Supplementary Data 22). Using the RNA-Seq data, we identified high-level alternative splicing events in P. murina and P. carinii, with intron retention being the most common (detected for 42–49% of introns) and other types being infrequent (≤3% of introns for each type) (Supplementary Data 23). We assembled full-length alternatively spliced isoforms without premature termination codons for 263 and 275 genes of P. murina and P. carinii, respectively, though there is no functional enrichment of these genes. The high rate of intron retention correlates with the presence of all the components of the nonsense-mediated mRNA decay machinery in each Pneumocystis species (Supplementary Data 22). Alternative splicing of intron-containing genes could increase transcript diversity and regulate gene transcription or mRNA stability in this otherwise reduced genome, as suggested in earlier studies of P. carinii49 and other organisms50.

Adaption to a host lung environment

Our genome analysis has revealed new insights into the dependence of Pneumocystis on mammalian hosts. Pneumocystis has been identified almost exclusively in the lungs of humans and other mammals, where it remains extracellular but preferentially attaches to type I pneumocytes. Although an environmental reservoir has been hypothesized, there is no convincing evidence of such a reservoir; the current genome data strongly suggest that the entire life cycle occurs in the host. Genes involved in mating and meiosis are present and transcribed in all three Pneumocystis species (Supplementary Data 24), suggesting that sexual reproduction is actively occurring in lung tissue, as postulated in previous studies9,11. Thus, Pneumocystis presumably must obtain nutrients and proliferate in the lung environment, and at the same time it must withstand host defenses for sufficient periods to allow direct transmission to another susceptible host.