After 6 weeks in labeling chambers, Avena fatua shoots were highly labeled (~ 94 atom% 13C). DNA was extracted from the rhizosphere and bulk soil samples. By comparing the density separation of the rhizosphere community DNA to the bulk community DNA, we were able to define un-enriched (light), partially 13C-enriched (middle), and highly 13C-enriched (heavy) fractions (Fig. 1). Based on the cutoff values for these fractions, 32 density-separated fractions for the rhizosphere sample were then combined, generating light, middle, and heavy fractions (the bulk sample only contains light and middle fractions due to the absence of 13C-enriched DNA) (Fig. 1).

Fig. 1 Stable isotope fraction determination. This plot shows the distribution of densities and concentrations of DNA extracted from week 6 rhizosphere and bulk soil following density centrifugation. The black circles on the curves represent individual fraction measurements. The three fractions are designated as light (blue shading), middle (yellow shading), and heavy (red shading). The top numbers indicate the normalized coverage of the T. rhizospherense genome in each fraction. The T. rhizospherense genome had < 1× coverage in each bulk soil fraction Full size image

The light, middle, and heavy density separated fractions from the rhizosphere and bulk samples were sequenced and subjected to genome-resolved metagenomic analyses (Additional file 1: Table S1). From the rhizosphere middle fraction, we assembled 210 Mbp of scaffolds larger than 1 kbp. One especially large scaffold was assembled de novo and could be circularized. Local assembly errors were identified and corrected and three scaffolding gaps were filled by manual curation. Manual curation made use of unplaced paired reads that were mapped back to the gap boundaries to fill gaps. The complete, closed genome is 1.45 Mb in length with a GC content of 49.95%. We were able to recover a single chromosome and we detected no integrated phage or plasmids. The genome was most abundant in the rhizosphere middle fraction at 15× coverage, but was also present in the rhizosphere heavy and light fractions at ~ 3× normalized coverage. The genome had less than 1× coverage in the non-rhizosphere soil (Additional file 1: Table S1).

DNA buoyant density in a cesium chloride solution is a function of both the extent of isotopic enrichment and the GC content. Low GC has a lower buoyant density as compared to higher GC DNA [23]. The completed, closed genome has a lower GC content (49.95%) than the rest of the rhizosphere middle fraction assembly (average GC content of scaffolds larger than 1000 bp is 66%), which indicates 13C may have been incorporated into the DNA. The rhizosphere middle fraction where the genome was mainly detected had a density of 1.737–1.747 g/ml. Given that natural abundance DNA with 49.95% GC content would have a density of ~ 1.71 g/ml [24], we estimate that the DNA from which the genome was assembled was at least 50% enriched in 13C.

The genome has 1531 protein coding sequences (Additional file 2: Table S2) and a full complement of tRNAs (46 in total). The 5S rRNA, 23S rRNA, and 16S rRNA genes are in a single locus that also includes Ala and Ile tRNA genes. Based on the sequence of the 16S rRNA gene, the genome was assigned to be a member of the Saccharibacteria phylum. The closest 16S rRNA gene sequences in NCBI are from the rhizosphere of Pinus massoniana (Fig. 2) [25]. The most closely related genomically described organism is Candidatus Saccharimonas aalborgensis from activated sludge with 84% identity across the full-length 16S rRNA gene [16]. We propose the name Candidatus “Teamsevenus rhizospherense” for the organism described here, given the derivation of the genome from the rhizosphere. In accordance with the phylogenetic analysis, we renamed Subdivision 1 to Candidatus Soliteamseven because this genome is the first described Candidatus species of this clade. The representatives of this clade are mostly found in soil, therefore, we propose the complete taxonomic descriptor: Phylum: Candidatus Saccharibacteria, Class: Candidatus Soliteamseven, Order: Candidatus Teamsevenales, Family: Candidatus Teamsevenaceae, Genus: Candidatus Teamsevenus, Species: Candidatus rhizospherense.

Fig. 2 Phylogeny of Saccharibacteria based on 16S rRNA gene sequences. The maximum-likelihood tree shown was constructed from an alignment containing representative Saccharibacteria. Symbols indicate the environmental origin of the NCBI sequence. Named branches indicate the complete genomes included in this study. The tree scale bar indicates nucleotide substitutions per site. Bootstrap values ≥ 50% are indicated by black dots Full size image

We calculated the GC skew and cumulative GC skew across the closed T. rhizospherense genome and found the symmetrical pattern typical for bacteria, with a single peak and trough indicative of the terminus and origin of replication (Additional file 3: Figure S1). This result both validates the accuracy of the circularized genome and confirms that Saccharibacteria use the typical bacterial pattern of bi-directional replication from a single origin to the terminus (as do some Peregrinibacteria, another group of CPR bacteria [26]). The start of the genome was adjusted to correspond to the predicted origin, which lies between the DNA polymerase III subunit beta and the chromosomal replication initiator protein. It has a full set of ribosomal proteins, except for L30, which is uniformly absent in CPR bacteria [1].

Biosynthetic pathways

The T. rhizospherense genome encodes a number of enzymes for the conversion of nucleotides to NMP, NDP, and NTP and formation of RNA. In addition, we identified genes to phosphorylate G, C, and U. However, the organism lacks the genes required to synthesize 5-phospho-alpha-d-ribose-1-diphosphate (PRPP). Further, it lacks essentially all of the steps for synthesis of nucleotide bases and the pathways that would convert PRPP to inosine monophosphate or uridine monophosphate. T. rhizospherense may have a novel nucleotide biosynthesis pathway, but this is unlikely as nucleotide biosynthesis remains highly conserved across domains [27] and the pathway can be recognized in some CPR bacteria [26]. Thus, we infer that T. rhizospherense did not de novo synthesize its nucleotides but rather acquired them from an external source. The genome encodes several nucleases, an external micrococcal nuclease, and an oligoribonuclease for the breakdown of externally derived DNA and RNA (Fig. 3). The mechanism for DNA and RNA import is unknown, as we did not identify nucleotide transporters. However, there are a number of transporters with unidentified specificity that could be involved in DNA or nucleotide uptake or the type IV pili could be responsible for this function. A large portion of the genome is dedicated to DNA and RNA repair mechanisms. There are 20 8-oxo-dGTP diphosphatase genes that prevent the incorporation of oxidized nucleotides. These enzymes may be required given that the nucleotides may be scavenged from dead cells and could have accumulated extensive DNA damage. Access to damaged DNA may be a consequence of life in a mostly aerobic environment, a seemingly unusual condition for members of the Saccharibacteria phylum. All other complete genomes were found in mostly anaerobic environments and encode fewer genes with this function.

Fig. 3 Cell diagram of T. rhizospherense. (A) fusaric acid resistance machinery, (B) unidentified importer, (C) glucan 1,3-beta-glucosidase, (D) NADH dehydrogenase II, (E) blue-copper protein, (F) cytochrome bo 3 ubiquinol terminal oxidase, (G) F-type H+-transporting ATPase, (H) peptidase, (I) nuclease, (J) root hair, (K) cellulosome, (L) type IV pilus, (M) salicylate hydroxylase, (N) zeatin production, (O) removal of oxidized nucleotides, (P) various antibiotic resistance mechanisms, (Q) intercellular attachment, (R) scavenging of lipids, (S) production of phosphatidyl myo-inositol mannosides, (T) DNA repair machinery Full size image

The T. rhizospherense genome does not encode the ability to synthesize any amino acids de novo from central metabolites. However, it encodes genes to generate amino acids from precursors (e.g., valine and leucine from 2-oxoisovalerate, isoleucine from 2-methyl-2-oxopentanoate, and histidine from l-histidinol phosphate) and to interconvert some amino acids (serine and glycine). We identified genes for proteases that could breakdown externally derived proteins. No amino acid-specific transporters were annotated, but several transporters of unknown function could import the amino acids. There is little evidence to suggest that externally derived amino acids are broken down for use in the TCA (the only TCA cycle gene identified is a fumarate reductase subunit) or other cycles.

T. rhizospherense appears unable to synthesize fatty acids, yet it encodes three copies of the 3-oxoacyl-(acyl-carrier protein) reductase, five copies of acyl-CoA thioesterase I, and two copies of SGNH hydrolase indicating the capacity for fatty acid hydrolysis and conversion. We found that the genome contains a number of genes for sequential steps in the glycerophospholipid metabolism pathway. T. rhizospherense may incorporate phosphatidylcholine (possibly derived from eukaryotes), 1,2 diacyl sn-glycerol-3P (from bacteria), or phosphatidylethanolamine (the main bacterial phospholipid) and may be able to interconvert the compounds using a gene annotated as phospholipase D. We identified a putative phosphatidate cytidylyltransferase that could add a head group to 1,2 diacyl sn-glycerol-3P forming CDP-diacylglycerol. This may be able to be converted into three products: phosphatidylglycerophosphate (via CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyltransferase), or to cardiolipin (via cardiolipin synthase), or to phosphatidyl-1D-myo-inositol (via CDP-diacylglycerol-inositol 3-phosphatidyltransferase).

Interestingly, phosphatidyl-1D-myo-inositol is the precursor for generation of phosphatidylinositol mannosides, glycolipids that are decorated by a chain of mannose molecules and that are found in the cell walls of Mycobacterium [28]. There are several genes for the first step in phosphatidylinositol mannoside biosynthesis, which involves modification of phosphatidyl-1D-myo-inositol by addition of mannose. These include phosphatidylinositol alpha-mannosyltransferase (three copies) and a single copy of alpha-1,6-mannosyltransferase. Subsequently, a polyprenol-P-mannose α-1,2-mannosyltransferase (CAZy glycosyltransferase family 87) adds another mannose group. Other mannose additions may involve the three copies of dolichol-phosphate mannosyltransferase, which transfer mannose from GDP-mannose to dolichol phosphate a mannose carrier involved in glycosylation [29]. Thus, although T. rhizospherense appears to be unable to synthesize fatty acids, it appears to encode a number of genes that may be involved in the interconversion of membrane lipids, including phosphatidylinositol mannosides, if provided 1,2 diacyl sn-glycerol-3P.

Central metabolism and energy generation

Interestingly, the T. rhizospherense genome encodes a simple, two-subunit cellulosome that may be used to attach to and degrade plant or microbially derived cellulose to cellobiose. Cellobiose is likely converted to d-glucose via one of 14 different glycosyl hydrolases. The genome also encodes genes for the production of d-glucose via breakdown of starch/glycogen and trehalose. The genome contains several genes for hydrolysis of 1,3-β-glucan, one of the most common fungal cell wall polysaccharides [30], to d-glucose. A significant portion of plant root exudation are sugars that could be fed directly into the T. rhizospherense metabolism [31]. d-glucose can be converted to d-glucose-6P and d-fructose-6P and fed into the glycolysis pathway. The genome lacks a key step in the glycolysis pathway, the 6-phosphofructokinase enzyme, which takes fructose-6P to fructose 1,6-bisphosphate. However, the missing step is compensated for by the genes in the pentose phosphate pathway that convert fructose-6P to glyceraldehyde-3P which can then continue in the glycolysis pathway. This is a common workaround strategy in many members of the CPR, which frequently lack 6-phosphofructokinase [32]. The product of the glycolysis pathway with the pentose phosphate pathway workaround is pyruvate. The pyruvate is then likely converted to d-lactate by the predicted d-lactate dehydrogenase protein, which would replenish the NAD+ pool under fermentative conditions. Pyruvate appears not to be converted to acetyl-CoA, since the genome lacks pyruvate dehydrogenase and pyruvate ferredoxin oxidoreductase. Acetyl-CoA metabolism also appears to be lacking in the Saccharibacteria represented by the six other publicly available complete genomes.

The T. rhizospherense genome encodes a xylulose-5-phosphate/fructose-6-phosphate phosphoketolase that may convert d-xylose-5P derived from the pentose phosphate pathway to acetyl-P. The neighboring gene is annotated as an acetate kinase, which may convert acetyl-P to acetate with the production of ATP. Thus, we predict growth via fermentation under anaerobic conditions. ATP also can be produced by a F-type H+ transporting ATPase.

In glycolysis, NAD+ is consumed to form NADH that can be regenerated via what appears to be a linear electron transport chain that includes a single subunit NADH dehydrogenase (ndh). We identified several predicted active site residues expected for function (Additional file 4: Figure S2A). Modeling revealed a close secondary structure match between the T. rhizospherense protein and the Ndh characterized from Caldalkalibacillus thermarum [33]. The large evolutionary distance between T. rhizospherense and C. thermarum likely accounts for differences in some active site residues.

The genome encodes a few genes in an incomplete pathway for production of quinone-based molecules, two of which are in multicopy (five copies of genes annotated as ubiG and two copies as ubiE). We suspect that quinone is scavenged from an external source. We identified genes for a cytochrome bo 3 ubiquinol terminal oxidase (cyo). Subunit I contains some functional residues as well as the residues that distinguish it from cytochrome-c oxidases (Additional file 4: Figure S2B), again this discrepancy in the functional residues may be due to the evolutionary distance between Escherichia coli and Saccharibacteria [34]. These genes were also reported in a previous study [8]. The terminal oxidase requires heme to function; however, only the final step of heme biosynthesis is predicted in the genome with a protein annotated as protoheme IX farnesyltransferase. We believe T. rhizospherense may scavenge heme from the environment as we propose it does for nucleotides, amino acids, and quinones. Five FNR family transcriptional regulators may serve to detect O 2 . If O 2 is available, it may be possible for electrons to be passed linearly from the Ndh to the cytochrome bo 3 ubiquinol terminal oxidase, which pumps four protons with the reduction of oxygen [35].

Several predicted proteins such as a blue-copper protein and NADH-quinone oxidoreductase subunit L (nuoL)-related protein also may be involved in electron transfer; NuoL is known to contribute to membrane potential in Ndh systems [36]. The nuoL gene was found in the same region as the ndh and ATPase. Three cytosolic NADPH:quinone oxidoreductase genes were identified. These may reduce semiquinone (SQ) formed under conditions of high O 2 availability to prevent reaction of SQ with O 2 to form oxygen radicals [37]. We identified a novel protein that we predict may be involved in production of an electrochemical gradient. It contains two of the domains found in separate subunits of the Na+-translocating NADH-quinone reductase (na(+)-NQR), contains an FeS cluster domain, and is likely associated with the cytoplasmic membrane. Although speculative, we hypothesize that this protein is a part of the electron transport chain, converting NADH to NAD+. One domain of the protein is similar to na(+)-nqr subunit F-like domain which may oxidize NADH and transfer electrons to the iron-sulfur domain, and the na(+)-nqr subunit B-like domain could form the Na+ translocating channel [38].

Other predicted capacities indicative of lifestyle

The genome lacks a CRISPR-Cas defense system, though one has been noted in the genome of a separate Saccharibacteria [10]. Additionally, we did not find any associated phage or mobile elements, although there were a number of labeled phage contigs in the same sample which will be detailed in a separate publication.

We predict the capacity for twitching motility due to the presence of genes required for type IV pilus assembly and pilT, the twitching motility gene. Type IV pili may be involved in DNA uptake or attaching to other cells, root surfaces, or solids. There are several genes for pseudo-pili, and an autotransporter adhesin that may also be involved in cellular attachment. Also, annotated were two CAZy carbohydrate-binding module family 44 genes for binding to cellulose and the capacity for biosynthesis of cellulose that could be used to attach to plant surfaces [39].

We identified genes encoding for laccase and pyranose 2-oxidase, which may be used for lignin breakdown or detoxification of phenolics, and genes for the detoxification of lignin byproducts, including 3-oxoadipate enol-lactonase and 4-carboxymuconolactone decarboxylase [40].

Interestingly, despite the small size of the genome and lack of many core biosynthetic pathways, we identified genes whose roles may be to modulate plant physiology, consistent with a close relationship with the plant. For example, we identified genes for the production of cis-zeatin (a plant hormone) from isoprenoid precursors (but a pathway for formation of the precursor isopentenyl-PP was not present, so the precursors are likely scavenged). The genome encodes a protein that appears to be cytokinin riboside 5′-monophosphate phosphoribohydrolase also known as “Lonely Guy,” a cytokinin (a plant growth hormone)-activating enzyme. We also found a gene encoding salicylate hydroxylase, which breaks down salicylic acid, a plant defense signaling molecule.

In addition to genes involved in plant interaction, we predict the capability to interact with other soil microbes. The genome contains N-acyl homoserine lactone hydrolase, a gene for quorum quenching of other soil microbes and a gene to form 3′,5′-cyclic-AMP, which may be involved in intracellular signaling. We predict the presence of genes that confer resistance to bacterially produced antibiotics (beta-lactams, streptomycin, oleandomycin, methylenomycin A, vancomycin, and general macrolides) based on sequence similarity. A gene annotated as phosphatidylglycerol lysyltransferase may produce lysylphosphatidylglycerol, a membrane lipid involved in cationic antimicrobial peptide resistance [41]. We also found a gene that may confer resistance to fusaric acid, an antibiotic made by a common fungal pathogen of grass [42]. This fungus was found to be growing in the rhizosphere (data not shown). Interestingly, there is a possible secreted toxin gene that encodes a 2487 amino acid protein; it contains domains found in polymorphic toxins, rearrangement hotspot repeats, YD repeats, a PA14 domain, and a galactose-binding domain and the neighboring gene encodes an immunity protein [43].

Comparative genomics

The T. rhizospherense genome is 28% larger than the largest reported complete Saccharibacteria genome, which is from an anaerobic bioreactor. It is 32.4% larger than the average size of all complete Saccharibacteria genomes (Table 1). By re-annotating and analyzing these previously reported complete genomes, we found that the T. rhizospherense genome encodes nearly 40% more unannotated genes as the other Saccharibacteria.

Table 1 Genome statistics for T. rhizospherense and other complete Saccharibacteria genomes Full size table

There are 130 T. rhizospherense functional annotations that were not found in any other Saccharibacteria (152 genes have these annotations, with some annotated to have the same function). A few appear to be involved in amino acid metabolism, others in transcriptional regulation, DNA repair, sugar metabolism, transport, non-homologous end-joining, three genes annotated as NADPH:quinone reductase, and a dihydropteroate synthase for use in folate synthesis.

The genome appears to encode a nickel superoxide dismutase that may be used for oxidative stress response. The 49_20 scnpilot genome also encodes a gene with this function (but it is a Fe-Mn family superoxide dismutase). T. rhizospherense may have the ability to convert methylglyoxal to lactate, a two protein pathway (lactoylglutathione lyase and hydroxyacylglutathione hydrolase) that is absent in the other analyzed Saccharibacteria. This pathway is important in detoxification of methylglyoxal.

T. rhizospherense has a notably larger repertoire of carbohydrate active enzymes than occurs in the other Saccharibacteria, including 58 genes with 27 unique annotations. Included in the set and not found in other Saccharibacteria genomes are genes predicted to confer the ability to hydrolyze hemicellulose (AG-oligosaccharides and mannooligosaccharides), amino sugars (galactosaminide), and pectin (oligogalacturonides). The genome encodes a gene for transaldolase, a protein in the pentose phosphate pathway, which is absent in all other analyzed Saccharibacteria genomes.

T. rhizospherense is the only Saccharibacteria with a complete twin-arginine protein translocation system (tatBC). Four genes were predicted to have a TAT motif, two genes of unknown function, a phosphatidylglycerol lysyltransferase (discussed above and not present in any other analyzed genomes), and a MFS transporter, DHA2 family, methylenomycin A resistance protein.

Of the T. rhizospherense genes involved in phosphatidylinositol mannoside production, phosphatidylinositol alpha-mannosyltransferase is found only in T. rhizospherense and 47_87 scnpilot and alpha-1,6-mannosyltransferase is found only in Candidatus T. rhizospherense and Candidatus S. aalborgensis. None of the other genomes encode the plant hormone-related genes: “Lonely Guy” or salicylate hydroxylase.

The T. rhizospherense genome lacks several annotated genes found in most other Saccharibacteria. T. rhizospherense is unable to make (p)ppGpp, an alarmone that downregulates gene expression. Also lacking is dihydrolipoyl dehydrogenase, a subunit of pyruvate dehydrogenase (the function of which is unclear in these bacteria). Additionally not found are a recJ gene that encodes for a single-stranded DNA exonuclease, a glutamine amidotransferase involved in cobyric acid synthase, the murC gene involved in peptidoglycan synthesis, and the pilW gene involved in pilus stability.