The tree of life is arguably the most important organizing principle in biology and perhaps the most widely understood depiction of the evolutionary process. It explains to us how we are related to other organisms and where we may have come from. The tree has undergone some tremendous revolutions since the first version was sketched by Charles Darwin. A major innovation was the construction of phylogenetic trees using DNA sequence information, which opened the way for classification of microbial life. As implemented by Carl Woese and collaborators, this work enabled the definition of three domains: Bacteria, Archaea, and Eukaryotes (). More recently, the three-domain topology has been questioned, and eukaryotes—our own branch of life—potentially relocated into the archael domain (). Beyond this, and as described here, cultivation-independent genomic methods that access sequences from laboratory-intractable organisms have added many new lineages to the tree. Their inclusion completely clarifies the extreme minority of life’s diversity that is represented by macroscopic organisms and underscores that our place in biology is dwarfed by bacteria and archaea.

The second method, single-cell genomics, involves the sequencing of fragments of DNA amplified from a single cell or collection of cells with similar rRNA gene sequences (e.g.,). The method has been especially useful for study of eukaryotic cells (). The advantage of a method in which a single microbial cell is sequenced is that all reported nucleotides derive from one genome so that linkage patterns can be established; however, binning may be required to remove contaminating sequences. A comparative study that targeted both methods to one microbial community revealed that the single-cell method is of much lower throughput than genome-resolved metagenomics, and the genomes are, on average, significantly less complete. However, the sequences from single-cell genomics and metagenomics analyses generally agree well (). Single-cell sequencing can be implemented in a targeted way following screening of rRNA genes (). Thus, the resulting sequences directly augment the collection of genomes for organisms belonging to lineages that are unsampled or undersampled.

Cultivation-independent (e.g., 16S rRNA gene-based) survey methods uncovered evidence for vast microbial diversity that was not represented in the set of organisms available in pure culture (e.g.,). This motivated the development of new genomic approaches that can provide comprehensive metabolic insights without the requirement for laboratory growth. The first of these is referred to as community genomics or, more commonly, metagenomics. As metagenomics is sometimes considered to include rRNA gene surveys, we adopt the term “genome-resolved metagenomics” to describe the specific approach that can yield direct information about metabolic capacity on an organism-by-organism basis. In genome-resolved metagenomics, DNA is extracted from a whole community or an enrichment and shotgun sequenced, and the short sequences are assembled into larger genome fragments that are ultimately assigned to genome bins (draft genomes;). Typically, metabolic predictions are only undertaken for reasonably high-quality genomes (>70% complete with low contamination by fragments from other organisms). In a few cases, further assembly curation has generated complete (closed) genomes (e.g.,). Importantly, the approach is not limited to Bacteria and Archaea but can provide draft genomes for Eukaryotes and partial or complete genomes of phage, viruses, and plasmids (e.g.,). The disadvantage of this method is that the sampled cells comprise a natural population. Thus, the genome is, to differing extents, a composite of sequence variants distinguished by single-nucleotide polymorphisms and insertions/deletions. On the other hand, the reads can be mapped back to the genome sequence to provide a snapshot of the form and extent of variation within the population ().

If we are approaching full delineation of the major branches of the tree of life, one may wonder why it took so long and then happened so quickly. It is our perspective that this is a reflection of relatively little prior focus on subsurface environments (which constitute a huge part of the biosphere but are difficult to sample) and challenges associated with the application of cultivation-independent genomic methods to the most complicated ecosystems (e.g., soil). Many subsurface environments are anaerobic, and therefore may be refuges for organisms from lineages that diverged early from primitive life forms. These environments are also enriched for organisms not commonly associated with humans, animals, or crops (thus viewed as lower-priority research targets). Some subsurface environments are low in nutrients and thus harbor slow-growing organisms with complex interdependencies that ensure retention of resources within the ecosystem (). Overall, these more recently studied environments are enriched in organisms that have been difficult (or impossible) to cultivate, so their discovery and genomic characterization awaited the development of community-wide cultivation-independent methods.

After the explosive growth in the number of major branches either discovered or genomically resolved via cultivation-independent methods between 2012 and 2016, one may wonder if the appearance of new phylum-level groups will continue indefinitely. An analysis of the rate of discovery of new groups would suggest that this is not the case. Around 8,000 genomes were reconstructed de novo by exploiting the public short-read archive (circa mid-2016). The archive includes samples from a huge array of environment types (). The majority of genomes were for bacteria and archaea from clades already known to exist. A subset was assigned to 44 putative phyla that were defined by cultivation-independent genomic methods over the past 5 years. 3 potentially additional archaeal phyla and 17 possible bacterial phyla were identified (although some of these may have been independently described in the period between download of the archive and publication). In fact, 2017 saw reports of the first genomic sampling of several new groups, many of which are in the previously undersampled archaeal domain ( Table S1 ). Once found, the major phylum-level lineages of bacteria and archaea are often identified across multiple ecosystems (see below). We suspect that the phylum content of the bacterial and archaeal domains will soon approach saturation, although the topology of the tree may change as algorithms to analyze it improve. The scope for discovery of new classes and groups at finer scales of taxonomic resolution is immense.

The extent of diversity within the CPR is a topic of current debate. Analyses based on 16S rRNA or concatenation of ribosomal proteins suggest that the CPR may be comparable in breadth to all other bacteria (). However, it remains uncertain whether this is due to early divergence or rapid evolution (or both). Other estimates of the scale of the CPR predict that they could comprise a maximum of 26% () or 25% () of bacterial diversity. A reanalysis of this question using the full set of available genomes, genes universal to both groups, and improved phylogenetic methods is required.

presented a new version of the tree of life that captured information for organisms whose draft genomes had been reconstructed without prior cultivation. Although the deep branching order is not well supported, it is possible that the root of the bacterial domain is placed between well-known bacteria (including the commonly known major lineages, such as Proteobacteria, Actinobacteria, Firmicutes, and Cyanobacteria, with well-studied representatives) and the bacteria of a recently described and seemingly monophyletic group referred to as the candidate phyla radiation (CPR) (). Recent detailed phylogenetic analyses have considered the placement of the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota and Nanohaloarchaeota), a newly recognized superphylum that is described in more detail below, within the archaeal domain (). To root the archaeal tree, Williams et al. applied a new approach that is based on single-gene tree reconciliation with no need for the use of an outgroup. They proposed that the root of the archaeal domain is placed between the all other archaea and the DPANN. Their analyses suggest the monophyly of the DPANN superphylum and the Euryarchaeota phylum. However, due to low sampling of the DPANN superphylum at the time of their analysis, new sequences and approaches are needed to further evaluate this conclusion. Sequencing and bioinformatics methods are evolving rapidly, and new genomes have recently become available. Figure 1 presents an updated view of the three-domain tree that includes new sequences from DPANN archaea.

The trees were calculated using a maximum-likelihood algorithm (RAxML with PROTCATLG model) based on genome sequences containing 14 ribosomal proteins (ribosomal proteins L2, L3, L4, L5, L6, L14, L15, L18, L22, L24, S3, S8, S17, and S19). The concatenated ribosomal protein alignment was constructed as described previously (). The updated tree of life is available with full bootstrap values in Newick format in Data S1

(B) We reconstructed a phylogenetic tree with only the archaeal sequences (with bacteria as an outgroup) in which the monophyly of the DPANN clade is retrieved but the branching order of the deepest branches cannot be confidently resolved. Bootstrap support values in are indicated by circles on nodes (black for support of 89% and above and gray for support from 50% to 89%).

(A) Updated fromby addition of new DPANN archaeal sequences (many but not all phyla are indicated on the tree). Note that with the use of a significant number of bacterial sequences in a three-domain tree topology (Bacteria, Archaea, and Eukaryotes), the DPANN superphylum do not form a monophyletic clade. As previously discussed, the placement of nanosized lineages and the monophyly of the DPANN within the tree of life remain unclear and important open issues that need to be addressed with the development of new methodological approaches.

Archaea now include at least three other major supergroups: the Euryarchaeota, the TACK (Thaumarchaeota, Aigarchaeota, Crenarchaeota, Korarchaeota; proposed name Eocyta;), and Asgard archaea, all of which comprise potentially phylum- or superphylum-level clades (see below; Figure 1 ). In addition, 6 potential phylum-level groups have been proposed within the TACK superphylum, and 15 additional groups affiliate approximately with the Euryarchaeota. Although the phylum-level delineation of Archaea is especially complicated (), it appears that new archaeal candidate phyla (35 groups listed in Table S1 ) expand the scope of archaeal diversity by a factor similar to that noted for bacteria.

There were early indications for the existence of a group of novel archaea potentially analogous to the CPR organisms. These include the co-cultivation of Nanoarchaeum equitans with its host (), genomic description and imaging of what were initially referred to as archaeal Richmond Mine acidophilic nanoorganisms (ARMAN) archaea (), and the genomic characterization of extremely salt-adapted nanohaloarchaea (). Based on a collection of draft genomes from single cells (or concentrates), a segment of the archaeal domain referred to as DPANN was proposed (). Since then, new genomes and the definition of at least nine major groups have greatly expanded the DPANN lineages ().

Based on the current summary, 62 non-CPR bacterial phylum-level groups have representatives that have been genomically described via cultivation-independent methods ( Table S1 ). Given that around 30 traditionally have been described bacterial phyla (a handful of which were candidate phyla that now have recently isolated representatives), the scope of the bacterial domain exclusive of CPR may have approximately tripled. When CPR and non-CPR groups are taken together, we estimate that cultivation-independent genomic methods have expanded the scale of domain Bacteria by over five times.

It should be noted that a widely accepted definition of what constitutes a superphylum or even a new phylum is not available. In our experience, new phylum-level branches are distinct, apparently monophyletic, and have 16S rRNA genes that share <75%–80% identity with the most closely related groups. Ultimately, appropriate phylum and superphylum definitions require strong evidence for deep branch placements, and this is not yet available.

proposed the term Patescibacteria for a superphylum comprising Microgenomates (OP11), Parcubacteria (OD1), and Gracilibacteria (GNO2-BD1-5), but the meaning of the term has become confused and even misused as synonymous with the CPR (). The term CPR is simply a description of a huge monophyletic radiation of phyla and superphlya that includes the group that ultimately may be referred to as Patscibacteria and dozens of additional phyla and superphyla. Community-wide consultation should be undertaken before a formal name for this radiation is proposed.

16S rRNA gene surveys perfomed by Norman Pace’s group uncovered sequences from organisms from a “candidate division” (analogous to a candidate phylum) labeled OP11 that were detected in hot springs of Yellowstone National Park (). As the diversity of sequences expanded, it was apparent that the OP11 radiation includes multiple potentially phylum-level groups, including the OD1 superphylum () now referred to as Parcubacteria () and the OP11 superphylum now referred to as Microgenomates (). Metagenomics-derived rRNA and protein sequences grouped these into the CPR, and many new major lineages were added (). As discussed below, bacteria of the CPR consistently have small genomes and cell sizes, and most are predicted to have symbiotic lifestyles. Importantly, some groups within the CPR are not detected in rRNA gene surveys due to primer mismatch, introns, or both (). Currently, we estimate that 73 CPR groups have been identified ( Figure 1 and Table S1 ).

Overall, recent metagenomic studies have substantially expanded the diversity of archaea predicted to be involved in methane metabolism. The inference that Bathyarchaeota and Verstraetearchaeota, members of the TACK superphylum, are methanogens is important because previously, methanogenesis was unknown outside of the Euryarchaeota. New groups such as the Methanomassiliicoccales and Methanofastidiosa also expand the diversity of Euryarchaeotes predicted to be involved in methanogenesis. However, it should be noted that the genes for methanogenesis and methane oxidation via reverse methanogenesis are essentially the same. This observation underlines the importance of experimental (e.g., bioreactor) studies to appropriately define metabolism in situ.

Metagenomics has been critical for the study of methane oxidation. Anaerobic methane-oxidizing euryarchaea-2d (ANME-2d) are often detected in the sulfate-methane transition zone in marine sediments and use a modified and reverse-methanogenesis pathway for growth (). Certain ANME-2d from granitic groundwater were incubated withC-labeled methane to demonstrate that methane oxidation is linked to microbial sulfate reduction (). Others are predicted to conduct reverse methanogenesis using nitrate as the terminal electron acceptor, and this has been experimentally demonstrated using bioreactors andC andN labeling (). It has also been predicted that some ANME-2d oxidize methane using manganese () or iron () as the electron acceptor. The genomes of anaerobic Methanomassiliicoccales also encode key enzymes involved in methyl-dependent methanogenesis, unlike other members of Thermoplasmata-related lineages (). WSA2/Arc1, sibling to both the non-methanogenic Hadesarchaeota () and MSBL1 (), may be methanogens and were recently named “Candidatus Methanofastidiosa.” Based on metabolic predictions from genome sequences from a wastewater treatment bioreactor (), they lack pathways for CO-reducing and aceticlastic methanogenesis, but they may be capable of methane formation via methylated thiol reduction with H

Sibling to Thaumarchaeota and Aigarchaeota are the Bathyarchaeota, which appear to be key players in the global carbon cycle in terrestrial and marine anoxic sediments. Some Bathyarchaeota possess the archaeal Wood-Ljungdahl pathway (), suggesting a capacity for COfixation via acetogenesis, a process thought to be unique to bacteria (). Importantly, some Bathyarchaeota may be methanogens, given key genes that could produce methane from methanol, methyl sulfides, and methylated amines (). Verstraetearchaeota (TACK superphylum) are also found in anoxic environments with high methane fluxes and, like some Bathyarchaeota, are predicted to conserve energy via methylotrophic methanogenesis ().

An important early result of metagenomics was the discovery of homologs of ammonia monooxygenase genes in archaeal genome fragments reconstructed from a Sargasso Sea metagenome (). The existence of an unknown organism responsible for much of the ammonia oxidation in the ocean had been suspected based on in situ measurements that indicated that ammonia oxidation often proceeds at substrate concentrations significantly below the growth threshold of cultured ammonia-oxidizing bacteria. Subsequently, autotrophic ammonia-oxidizing archaea were isolated and their critical role in the global nitrogen cycle demonstrated (). In 2008, they were proposed as the third archaeal phylum and named Thaumarchaeota ().

Genomic data for uncultivated archaea resulted in key discoveries related to their roles in the carbon, nitrogen, and sulfur geochemical cycles. Excellent recent reviews have been published on the phylogeny and ecological roles of archaea in diverse ecosystems (). Thus, we focus here on a few examples related to ammonia and methane metabolism to illustrate the overarching principle that genome-resolved metagenomics can inform our understanding of evolution and biogeochemistry.

A potential phylum-level clade represented by the Altiarchaeales (formerly SM1;) appears to branch deeply within the DPANN superphylum based on recent studies () and our analyses ( Figure 1 ). However, this placement requires further analysis. Altiarchaeales dominate some cold subsurface anaerobic groundwater habitats (). Unlike most archaea, they have an outer membrane and unique surface-attached grappling hooks known as hami. They form biofilms and appear to grow autotrophically on carbon monoxide, acetate, or formate (via a modified archaeal Wood-Ljungdahl pathway;).

Understanding of the diversity, distribution, and metabolisms of nano-sized archaea that are members of the DPANN superphylum has been substantially advanced via cultivation-independent genomics studies. DPANN archaea are found in various extreme (e.g., hot, acidic, hypersaline) and temperate (e.g., lake and marine sediments) ecosystems, although they have been relatively rarely detected in soil and the open ocean. The first ARMAN archaeal groups were discovered in acid mine drainage biofilms via genomics-based approaches (). The coverage of these groups and the diversity of the environments in which they occur has expanded (), and the phyla have been renamed as Parvarchaeota and Micrarchaeota (). Both lineages are found in association with other archaea (see below). Single-cell genomics and metagenomics uncovered and defined additional phylum-level lineages within the DPANN superphylum. These include Diapherotrites and Aenigmarchaeota (), Nanoarchaeota (which includes the previously described N. equitans;), Nanohaloarchaeota (), Woesearchaeota, and Pacearchaeota (). Common features unifying DPANN archaea are their small genomes, small cell sizes and limited metabolic repertoires: many lack core biosynthetic pathways for nucleotides, amino acids, and lipids. Most DPANN archaea depend on other microbes to meet their biological requirements. However, some appear to have the genetic potential to be free living with a heterotrophic aerobic and/or fermentative lifestyle.

Another important discovery involves bacteria of the candidate phylum Tectobacteria. The first genomically described members belong to the candidate genus Entotheonella and were described using metagenomic and single-cell sequencing methods targeted at microbial communities in the marine sponge Theonella swinhoei (). Entotheonella are predicted to produce a huge variety of bioactive compounds that may mediate ecological interactions. As sponges are well known as rich sources of diverse natural products, the research likely addressed the question of which organisms are the source of these compounds. The results of this study underline the potential of genome-resolved metagenomics targeted to candidate phyla groups to uncover biosynthetic pathways for a vast treasure trove of secondary metabolites that could address the pressing need for new antimicrobial compounds and other pharmaceuticals.

Another distinct bacterial group is the candidate phylum Rokubacteria (), first genomically described via metagenomic analysis of sediments collected at the same site as Zixibacteria and some Melainabacteria. These bacteria had been previously detected in soil 16S rRNA gene surveys. Based on metabolic predictions, it was suggested that the sediment-associated Rokubacteria are acetoclastic heterotrophs that likely use beta-oxidation of fatty acids for energy generation and produce butyrate that would be consumed by other community members. These Rokubacteria were also predicted to contribute to sulfur cycling (via oxidation of thiosulfate to sulfide and its reduction to hydrogen sulfide) and nitrogen cycling (via nitrite oxidation) and can probably oxidize carbon monoxide. Subsequently, it was proposed that Rokubacteria, which are relatively abundant in grassland soil, play a key role in carbon turnover via methanol oxidation (). Rokubacteria were recently described as “genomic giants,” detected in the rhizosphere, volcanic mud, oil wells, aquifers, and the deep subsurface ().

A long-standing mystery is related to the apparent detection of Cyanobacteria in human fecal samples, suggesting the unexpected existence of these bacteria in gut microbiomes. Adult fecal samples that were relatively enriched in these organisms (based on 16S rRNA gene sequencing) were targeted by genome-resolved metagenomics methods, and a complete (closed) genome was reconstructed (). The study included an additional genome for a distinct group that was recovered from an acetate-amended sediment metagenome (). Metabolic predictions for these genomes indicated the absence of photosynthetic machinery, leading to the conclusion that these bacteria have a fermentation-based metabolism. Based on phylogenetic analyses, the bacteria were assigned to the candidate phylum Melainabacteria (). Genomes from a distinct but related lineage, Sericytochromatia, were subsequently detected in a coal-bed methane well, a laboratory bioreactor biofilm, and acetate-amended sediments (). Soo et al. propose both Melainabacteria and Sericytochromatia as classes of Cyanobacteria, a disagreement that reflects the common challenge of achieving consensus regarding taxonomic designations. Due to the absence of photosynthetic machinery in lineages sibling to Cyanobacteria, Soo et al. confirm and extend the earlier conclusion () that photosynthesis evolved after the divergence of Cyanobacteria. Further, their analyses suggest that aerobic respiration arose after the evolution of photosynthesis.

A previously unknown bacterium was found to dominate aquifer sediments. A complete curated genome for one population was reconstructed from a metagenome, and the lineage was named Zixibacteria (). Gene-by-gene analysis yielded a detailed metabolic prediction for this organism, an overview of which is presented in Figure 2 . Notably, the genome encodes an extensive repertoire of redox enzymes that likely indicate roles in iron and arsenic oxidation/reduction, nitrogen compound transformations, hydrogen metabolism, and fermentation. This mixture of aerobic and anaerobic pathways likely confers metabolic versatility, enabling this bacterium to proliferate under changing conditions close to the water table ().

ArrA, arsenate reductase; ArxA, arsenite oxidase; NXR, nitrite/nitrate oxidoreductase; PPP, pentose phosphate pathway, PEP, phosphoenoylpyruvate; I, complex I or NADH dehydrogenase; I ∗ , 11-subunit NADH dehydrogenase; II, complex II or succinate dehydrogenase; Alt III, alternative complex III; IV, complex IV or heme-copper oxygen reductase; ETQ:QO, electron transferring quinone oxidoreductase; NiR, nitrite reductase; Mtr, extracellular respiratory pathway that is essential for the reduction or oxidation of iron via multihemes cytochromes; MvhADG/HdrABC, cytoplasmic complex composed of the [NiFe]-hydrogenase MvhADG and the heterodisulfide reductase HdrABC, which is an iron-sulfur flavoprotein; Fd, ferredoxin. NfnAB complex is an iron-sulphur flavoprotein complex. All the red symbols represent c-type cytochromes.

This is a cell cartoon providing a simplified metabolic potential of Zixibacteria, a group of bacteria first reported from sediment by. Note the presence of pathways and complexes involved in aerobic growth (beta-oxidation of fatty acids and the terminal oxidase)—either oxidation or reduction of ferric/ferrous iron and arsenate/arenite and nitrate/nitrite, hydrogen metabolism, and anaerobic respiration via nitrite reduction, as well as fermentation to propionate, acetate, ethanol, and butyrate. Thus, overall, versatile metabolism enables Zixibacteria to thrive in a changing redox environment.

Many bacterial groups have new genomically described representatives ( Table S1 ), but it is beyond the scope of this Perspective to provide details for all of these. Many of the bacterial genomes used to define these putative candidate phyla were reported by, who provided an extensive table in the Supplemental Information that predicts the biogeochemically relevant capacities for each genome. Here, we review a few select examples from the literature to illustrate the types of insights into ecological roles and evolutionary histories that have been obtained.

In a new analysis reported here, we used a set of high-quality draft genomes and some complete genomes to provide a first glance into the metabolic variation within the CPR and DPANN simultaneously. As is evident from the descriptions above, limited metabolic capacities are common across both radiations and gaps in metabolic features are often shared ( Figure 3 ). As noted previously, typically bacterial genes (e.g., bacterial transcription factors) occur in some DPANN archaea (). The distribution of isoprenoid-related genes is discussed further below.

Although the first draft genomes for CPR were first reported only in 2012 (), there are now thousands of sequences on hand. Based on the first ∼800 genomes,proposed the existence of at least 35 candidate phyla within the CPR. Parallel cryogenic transmission electron microscope images that targeted post-0.2 μm filtrates collected from the same samples verified small cell volumes for these bacteria (see below;). Interestingly, some candidate phyla within the radiation (Gracilibacteria, BD1-5 and Absconditabacteria, SR1) use an alternative genetic code (). The repurposing of the UGA stop codon to code for glycine was confirmed for an enrichment that contained Gracilibacteria by metaproteomics (). Building upon prior work by, and others,extended the early observation that CPR genomes are small and that most lack numerous biosynthetic pathways ( Figure 3 ). Many are predicted to be unable to produce nucleotides de novo and have minimal amino acid and cofactor biosynthetic capacity. No CPR genomes analyzed to date contain the components necessary to synthesize membrane lipids required for the cell envelope, so further research is needed to determine the nature and sources of these components. The CPR bacteria have unusual ribosome compositions, and whole lineages (groups of putative candidate phyla) are missing what were considered to be universal ribosomal proteins (). Parcubacteria have been reported to lack ribosomal small subunit methyltransferase G and ribosome-silencing factor ().

A schematic tree of DPANN (left, top) and CPR (left, bottom) and corresponding overview of presence or absence (ø) of certain biological traits across both radiations. Numbers shown next to the candidate phylum name indicate the number of genomes used in the analysis. It should be noted that although certain capacities are shown as “present,” they may not be found in the genomes of all members of the listed candidate phylum. Variation in shading in each column indicates the overall frequency of each capacity, with stronger colors signifying that the trait is widespread in the group.

A few CPR and DPANN species are associated with eukaryotic hosts (). However, most are likely symbionts of bacteria or archaea, given their abundance and diversity in samples that have few, if any, eukaryotes. Based on strong enrichment in post-0.2 μm filtrates, it was predicted that many may be episymbionts—i.e., symbionts that associate with the surfaces of host cells rather than being contained within them. Cryogenic transmission electron microscopy (cryo-TEM) data show pili-like structures that extend from CPR cell surfaces ( Figure 4 A) and, in some cases, contact other microbial cells ( Figures 4 B and 4C;). These might provide access to nucleic acids or other metabolites. A few CPR and DPANN organisms have been directly shown to be episymbionts. For ARMAN (either Micrarchaetoa or Parvarchaeota), cryoelectron tomographic data revealed penetration of their cell interiors via cytoplasmic extensions from larger cells without a cell wall ( Figure 4 C;). Based on the overall microbial composition, the larger cells were identified as Thermoplasmatales archaea. In other cases, cells are attached via short bridges to Thermoplasmatales cells ( Figure 4 D)—probably Gplasma ()—which were renamed Cuniculiplasma divulgatum by. These points of contact resemble those associated with N. equitans cells attached to host Ignicoccus hospitalis archaea in a co-culture of these organisms (). Similarly, a co-culture of a parasitic Saccharibacterium (TM7) and its Actinomyces odontolyticus (Actinobacteria) host has been obtained from the human mouth (). Transmission electron microscope images indicate host-episymbiont interaction via a region of cell-cell contact similar in form to those seen for ARMAN archaea and host Thermoplasmatales.

The “lipid divide” has been a central feature used to establish the distinction between bacteria and archaea. The newly reported distribution of the MVA and the MEP pathways in both domains reopens the question of their evolutionary origin and alters the prior conclusion that the MEP pathway is restricted to bacteria.

Previously, the presence of the MVA pathway in a few bacteria was explained by potential acquisition via horizontal gene transfer from archaeal or eukaryotic donors (). Later, in-depth phylogenomic analyses indicated that the MVA pathway may have been ancestral in all three domains of life (). We conducted phylogenetic analyses using two key enzymes (MVA kinase and PMK) from the MVA pathways to evaluate whether the new CPR and DPANN sequences throw light on the evolutionary origin of isoprenoid biosynthesis ( Figure 5 ). The resulting phylogenetic tree resolves a novel intermediate clade for members of the Microgenomates superphylum that is placed at the base of both families ( Figure 5 ). Of note, the Microgenomates that harbor the unusual new form of MVA kinase also possess the regular MVA kinase ( Figure 5 ). The tree topology rules out recent horizontal gene transfer(s) from other archaea or eukaryotes as the explanation for the MVA pathway in CPR bacteria, with a few important exceptions (transfers from DPANN to Dependentiae [TM6] and from Firmicutes to Woesearchaeota; Figure 5 ).

(A) Maximum likelihood tree of two key enzymes of the MVA pathway (with the MVA kinase on the left and the PMK on the right, both from the same GHMP kinase superfamily) involved in isoprenoid precursors production and present in some CPR and DPANN genomes. Bacteria are highlighted in blue and archaea are in yellow. Also included is the gene cluster organization of the MVA pathway found in some CPR organisms from the Microgenomates superphylum.

As expected for Archaea, the MVA pathway occurs in organisms from several DPANN phyla, including Diapherotrites, Micrarchaeota, and Aenigmarchaeota. Similar to the CPR and distinct from other archaea, the DPANN MVA pathway includes the full set of enzymes found in eukaryotes. The MVA pathway is rarely found in Woesearchaeota. Instead, they have the bacterial MEP pathway. This is important because the MEP pathway has not been reported in archaea.

Isoprenoids are metabolites that are essential in all living organisms in the three domains of life. Isoprenoids have diverse metabolic functions, including as quinones, chlorophylls, bacteriochlorophylls, rhodopsins, and carotenoids. Archaeal membranes are composed of isoprenoid-based lipids, the isopentenyl pyrophosphate and dimethylallyl diphosphate precursors of which are typically synthesized via the mevalonate (MVA) pathway. Bacteria use a nonhomologous pathway—the methylerythritol phosphate (MEP) pathway—to synthesize the precursors. Previously, only few bacteria (mostly Gram positive) were known to possess the MVA pathway (). Interestingly, most of these bacteria predate other bacteria. In new analyses, we revisited the question of the distribution of the MVA pathway in bacteria, making use of hundreds of new genomes from the CPR and DPANN radiation. The results show that some CPR bacteria (mostly Microgenomates and a few Peregrinibacteria) possess the MVA pathway rather than the bacterial MEP pathway ( Figure 3 ). Intriguingly, the MVA pathway of the CPR is of the type found in eukaryotes. The eukaryote MVA pathway includes three enzymes that do not normally occur in in the archaeal MVA pathway (phosphomevalonate kinase [PMK], diphosphomevalonate carboxylase [MDC], and isopentenyl diphosphate isomerase [IDI1]).

How the CPR and DPANN May Change Our View of Evolution

As noted above, the availability of genomes for a much more comprehensive set of organisms revealed major new features of the tree of life ( Figure 1 ). The question of whether the tree topology involves two major subgroups within bacteria and archaea—the CPR and DPANN, respectively—is unresolved. Are the apparent groupings of CPR and DPANN artifacts of rapid evolution or reflections of early origins and a long history of diversification?

Nelson and Stegen, 2015 Nelson W.C.

Stegen J.C. The reduced genomes of Parcubacteria (OD1) contain signatures of a symbiotic lifestyle. van Ham et al., 2003 van Ham R.C.H.J.

Kamerbeek J.

Palacios C.

Rausell C.

Abascal F.

Bastolla U.

Fernández J.M.

Jiménez L.

Postigo M.

Silva F.J.

et al. Reductive genome evolution in Buchnera aphidicola. A phenomenon that could drive fast evolution is genome reduction, which is an obvious consideration for CPR and DPANN, given their small cell and genome sizes. Largely symbiotic lifestyles are linked to rapid gene loss and fast accumulation of mutations in well-studied symbioses (e.g., those that involve partnerships of bacteria and eukaryotic hosts), although these may not be appropriate models for associations in which the host is bacterial or archaeal. The early stages of genome reduction in symbionts of eukaryotes are characterized by proliferation of mobile elements, formation of pseudogenes, multiple genomic rearrangements, and deletion of chromosome fragments. To date, such phenomena have not been described as prominent features of CPR and DPANN genomes (). In more anciently evolved symbionts, such as Buchnera aphidicola, mobile elements and most pseudogenes have been eliminated (). Thus, if the CPR and DPANN experienced radiation-wide genome reduction, it may have occurred long ago.

It is important to note that the small genomes of bacterial symbionts of insects and those of human-associated pathogens, such as Chlamydia, do not cluster within the CPR. This suggests that the phylogenetic placement of the CPR is not an artifact of genome streamlining. That said, metabolisms predicted for CPR and DPANN vary dramatically, and while some have sufficient metabolic capacities to suggest that they may be free living, others lack numerous biosynthetic capacities. Thus, we predict that genome reduction may be an important phenomenon in some lineages.

The alternative explanation for the existence of CPR and DPANN as distinct major radiations is that both arose from very early-evolving organisms (possibly with small genomes), and the long branches of the CPR and DPANN are due mostly to undersampling (rather than rapid evolution). Although their apparent deep branching phylogenetic placement remains in question, other observations make an early origin of both groups from a common ancestral pool worth considering. Most important, perhaps, is the similarity in the suite of biosynthetic capacities that both groups possess (extensive genes of the information system) and in missing pathways (e.g., lack of the electron transport chain and tricarboxylic acid [TCA] cycles that are widely distributed in other bacteria and archaea). If, as one might predict, extensive use of oxygen-based respiration arose relatively late (following the advent of O 2 -generating photosynthesis around 2 billion years ago), it would make sense that groups with ancient metabolic platforms would be anaerobes. The metabolisms of CPR and DPANN are consistent with such a world, as these organisms appear to be almost exclusively anaerobes, lacking a full TCA cycle and electron transport chain required for aerobic growth. Similarly, almost all are incapable of dissimilatory nitrate reduction, and none are predicted to have metabolisms based on dissimilatory sulfate reduction. Both oxidized nitrogen and sulfur-based compounds would have been at low abundance prior to an oxidized atmosphere. Obviously, it is speculation, but we imagine that nucleotide and amino acid biosynthesis capacities may have been present in ancestral populations but lost during genome reduction as (most) CPR and DPANN adopted lifestyles that depended on organisms with more recently evolved capacities (e.g., aerobic growth, photosynthesis, and other forms or chemoautotrophy). The alternative explanation that cannot be ruled out at this time is that the many similarities in the genomic features of CPR and DPANN may have arisen by convergent evolution, possibly involving late radiation-wide adaption to anaerobic habitats.

Williams et al., 2017 Williams T.A.

Szöllősi G.J.

Spang A.

Foster P.G.

Heaps S.E.

Boussau B.

Ettema T.J.G.

Embley T.M. Integrative modeling of gene and genome evolution roots the archaeal tree of life. Wrighton et al., 2016 Wrighton K.C.

Castelle C.J.

Varaljay V.A.

Satagopan S.

Brown C.T.

Wilkins M.J.

Thomas B.C.

Sharon I.

Williams K.H.

Tabita F.R.

Banfield J.F. RubisCO of a nucleoside pathway known from Archaea is found in diverse uncultivated phyla in bacteria. Schönheit et al., 2016 Schönheit P.

Buckel W.

Martin W.F. On the Origin of Heterotrophy. Although what we see now in CPR and DPANN genomes must be a faint echo of what once was, the presence of specific genes in widely divergent groups may hint at features of ancestral organisms that were lost from many modern groups. Included in this list are capacities such as modified glycolysis (reliant on pentose phosphate pathway enzymes), hydrogen metabolism (as suggested for the ancestor of Archaea;), and ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO). In fact, one of the most striking cases of widely but sparsely distributed enzymes found in both CPR and DPANN organisms are the form II/III- and III-like RuBisCOs that apparently function in a nucleotide-based pathway that feeds into lower glycolysis and fermentation (). This enzyme occurs in some extremely minimal genomes (Dojkabacteria [WS6] and Pacearchaeota), consistent with its central role in the metabolism of these organisms. Thus, RuBisCO that functions in nucleotide metabolism and central carbon metabolism may have been key to the physiology of ancestral CPR and DPANN. A bacterial or archaeal cell is, on average, 20% RNA, and RNA is 40% ribose by weight (i.e., a cell is about 8% ribose;). Therefore, it has been suggested that ribose was likely an abundant sugar available on early Earth for fermentation. Hence, type III- and II/III-based RuBisCO pathway of nucleoside monophosphate conversion to 3-phosphoglycerate may be a relic of ancient heterotrophy.