Sample collection and enrichment of thermophilic consortia

The sample collection and enrichment procedures have been described previously15. Briefly, compost samples were collected from Jepson Prairie (JP) Organics, located in Vacaville, CA, in 2008. The compost-derived microbial consortium was initially grown aerobically with unpretreated switchgrass and then switched to grow on microcrystalline cellulose (1% wt/vol; Sigma) as the sole carbon source in liquid M9 medium augmented with vitamins. The enrichments were grown at 60 °C and 200 r.p.m. under aerobic conditions in an aerial rotary shaker and serially passed every 14 days with 4% vol/vol inoculum, referred to as passages. Cultivation of the 50 ml culture after passage 80 was scaled at the Advanced Biofuels Process Demonstration Unit, Lawrence Berkeley National Laboratory. A 500 ml culture was inoculated with a 2.5% vol/vol sample from the 50 ml culture and incubated at 60 °C for 14 days at 150 r.p.m. on a rotary shaker. This culture was inoculated into a 19 l bioreactor (Bioengineering USA) to a total volume of 15 l. The 15 l culture was grown at 60 °C, 150 r.p.m. and 0.26 volume gas (sterile air) per volume liquid per minute (VVM) for 14 days. A 400 l bioreactor (ABEC) was inoculated with 7.5 l of culture from the 19 l bioreactor to a volume of 300 l and incubated for 14 days at 60 °C, 150 r.p.m. agitation and 0.25 VVM air sparging. After 14 days, the final fermentation broth was centrifuged using an Alfa Laval Disc Stack centrifuge at 9,000g at 100 l h–1 with 30 min cell discharge interval and 125 kPa back pressure. The clarified broth was collected in the 300 l holding tank bioreactor and the pelleted biomass collected and stored at –80 °C. The supernatant was concentrated through a tangential flow filtration system with a 10 kDa Biomax filter membrane (EMD Millipore). The transmembrane pressure was set at 13 p.s.i. with feed pressure of 30 p.s.i. The concentrated supernatant was freeze-dried under vacuum for 24 h using a lyophilizer (Labconco) and the resulting powder was stored at –20 °C. CMCase and xylanase activities were measured daily for the 15 l and 300 l bioreactor cultivations by removal of samples each day from the bioreactors and the supernatant assayed as described below.

Sequencing, assembly and binning of metagenomics reads

DNA purification from samples extracted from the 15 l and 300 l bioreactors was performed as previously described13. Illumina sequencing (250 bp × 2) of the metagenomic samples was carried out by the Joint Genome Institute (JGI) and performed as previously described40. The sequencing reads of the DNA samples recovered from the 15 l bioreactor (days 1–14 were trimmed using Trimmomatics (with parameter ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36)41 and co-assembled using IDBA-UD42 with the --pre-correction parameter. The 300 l samples (days 4, 5, 7, 10, 12 and 14) were also co-assembled, using the same settings as the 15 l samples. The co-assembled samples for the 15 l and 300 l bioreactor experiments were then binned using MaxBin 2.043 with default parameters, yielding population genomes. The completeness and contamination ratios of these population genomes were assessed using CheckM44. Genomes with >10% of contamination rates were re-binned using MaxBin 2.0 by setting the input contig to the genome file and the input abundance to the extracted abundance files during the whole metagenome binning. The output bins were re-examined using CheckM and the produced bins with higher completeness were chosen to replace the original genome, while the other bins with higher levels of contaminants were discarded. The most likely taxonomic ranks of the recovered genomes were predicted by searching the predicted proteins against the NCBI non-redundant (NR) database, collecting and processing the hits using the least common ancestor (LCA) algorithm proposed by MEGAN445 and assigning the most probable taxonomic rank to the recovered genomes according to the LCA results. GH genes present in the recovered population genomes were identified by using Prodigal46 and then annotated using dbCAN47.

DNA was isolated from a 50 ml aerobic shake flask culture with a bacterial consortium that was adapted from green waste compost obtained from Newby Island Sanitary Landfill in Milpitas, CA, by growth on crystalline cellulose. The enzymatic activities and microbial community membership of this consortium have been described previously13. Metagenomic sequencing was performed by the JGI as described above, and the sequenced reads were assembled as previously described40. Population genomes were recovered by automated binning with Maxbin43 and checked for completeness and contamination with CheckM44.

16S and 23S rRNA gene analysis

A partial 16S rRNA gene (706 bp) was recovered from the ‘Ca. R. cellulovorans’ metagenomic bin. This fragment was used to identify a nearly full-length 16S rRNA gene (1,597 bp; 99.7% identical JGI gene ID Ga0074251_1085371) that was recovered from the initial assembled metagenome (JGI taxon ID 3300005442) obtained for this consortium when it had been adapted to switchgrass40. This observation indicated that ‘Ca. R. cellulovorans’ was present in the consortium when it was adapted to switchgrass, before transferring the consortium to grow on microcrystalline cellulose. Sequences from three clones (GenBank accessions KC978751, KC978760 and KC978763) were >99% identical to portions of the full-length rRNA ‘Ca. R. cellulovorans’ sequence. These clones were recovered from DNA samples isolated from a time series (14 days) of an adaptation of compost from Newby Island Landfill (Milipitas, CA, USA) to grow with microcrystalline cellulose as the sole carbon source at 60 °C13. As described above, a partial genome was recovered from this cultivation that was >99% identical at the amino acid level to the ‘Ca. R. cellulovorans’. A partial 23S rRNA gene (1,444 bp) was recovered from the ‘Ca. R. cellulovorans’ metagenomic bin. The 16S and 23S rRNA gene sequences were aligned using MUSCLE48 trimmed using Gblocks49, and the phylogenetic tree was constructed using MEGA550 with the Tamura–Nei model. Bootstrap values were calculated with 1,000 replicates.

Reconstruction of ‘Ca. R. cellulovorans’ GH gene cluster

Six small contigs were identified in the ‘Ca. R. cellulovorans’ population genome, which contained partial genes containing a catalytic domain (GH9, GH48, GH6/5, 2×GH10, AA10) linked to at least one CBM3. The clustering of these genes was confirmed by PCR amplification of DNA isolated from the cellulolytic consortium. PCR primers (Supplementary Table 8) were designed using the CLC Main Workbench (Qiagen) and PCR products were cloned into pJET1.2/blunt Cloning Vector (Fermentas) and sequenced with an ABI system according to the manufacturer’s instructions. Assembly of gene sequences into a gene cluster and annotation of genes was performed with the CLC Main Workbench and checked for chimerae using the Bellerophon algorithm51.

Phylogenetic analysis

Alignments of protein sequences were performed using the CLUSTALW multiple alignment accessory application in the CLC Main Workbench (Qiagen). In brief, phylogenetic trees were constructed using the CLC Main Workbench applying the maximum likelihood method based on the Whelan and Goldman protein substitution model52. Bootstrap values were calculated with 1,000 replicates.

To build the concatenated protein tree, genes were first searched against the PFAM profiles53 using HMMER354. Genes with PFAM annotations that appear once and only once across all involved genomes were aligned separately using MUSCLE48. After the alignments were concatenated and trimmed using Gblocks49, the concatenated maximum-likelihood protein tree was constructed using MEGA550 with the JTT (Jones, Taylor, Thorton) model. Bootstrap values were estimated with 1,000 replicates.

Protein purification

Lyophilized supernatant (170 mg) obtained from the 300 l cultivation was dissolved in 5 ml H 2 O and passed through a 0.2 µm filter. The supernatant was desalted by dialysis against the buffer (20 mM Tris, pH 8.0) for 24 h with three buffer changes, followed by a 30 ml NaCl gradient fractionation (0–2 M NaCl) using a 5 ml HiTrap Q HP column on an ÄKTA Protein Purification System (GE Healthcare).

Cellulases in the supernatant from the 300 l cultivation were also enriched by binding to phosphoric acid swollen cellulose (PASC), an adaptation of a procedure previously described for cellulosome purification from Clostridium thermocellum 26. Briefly, 250 mg of lyophilized PASC produced from Avicel PH-105 was added to 500 mg of supernatant dissolved in 10 ml H 2 O and mixed at room temperature with a magnetic stir bar for 30 min. After a binding step at 4 °C for 2 h, the amorphous cellulose was centrifuged for 10 min at 3,000g and rigorously washed for 6 cycles with 25 ml reaction buffer (25 mM 2-(N-morpholino)ethanesulfonic acid (MES), pH 6.0). Washed PASC was resuspended in 10 ml reaction buffer and transferred into dialysis membranes (SnakeSkin and Slide-A-Lyzer; Fisher Scientific) with a 3.5–10 kDa cutoff and dialysed at 60 °C against 4 l reaction buffer at 55–60 °C for up to 48 h with three buffer exchanges per day to prevent possible product inhibition. Dialysis membranes used in this study consisted of regenerated cellulose and were destabilized by cellulases of the substrate, and thus needed to be exchanged every 24 h to prevent membrane rupture. The reaction was considered complete after no visible changes to the substrate were observable. By centrifugation for 20 min at 3,000g the enrichment was split into residual biomass (in the pellet) and the affinity digestion protein fraction (in the supernatant, AD).

Measurement of protein concentration and GH activity

Protein concentrations were determined using the bicinchoninic (BCA) assay (Pierce BCA Protein Assay Kit, Thermo Scientific) method using a 96-well plate (200 μl reaction volume) with bovine serum albumin as the standard. CMCase and xylanase activity assays were conducted as described previously55. Enzyme activity units (U) were defined as µmol of sugar liberated per min. Enzyme activity units for supernatant preparations were calculated as U per ml of supernatant volume. CMC activity units of purified heterologously expressed proteins were reported as U per mg, representing specific activity measurements.

Soluble substrates (p-nitrophenyl (pNP)-labelled) with cellobiohydrolase (pNPC), β-d-dglucosidase (pNPG), β-D-xylosidase (pNPX) and α-l-arabinofuranosidase (pNPA) activities were used to determine enzyme activities on their respective substituents56. The p-nitrophenyl substrate (90 μl) was incubated with 10 μl of diluted enzyme, incubated for 30 min, and quenched with 50 μl of 2% cold sodium bicarbonate. The absorbance of released p-nitrophenyl was measured at 410 nm. Activities using p-nitrophenyl substrates were calculated as U ml−1.

Saccharification of cellulose substrates

Saccharifications were performed in the presence of 2% (wt/vol) Avicel (Sigma) and PASC. Each mixture was prepared in 50 mM MES, pH 6.0 with 10 mg protein per g glucan in biomass to a final volume of 625 µl in a 2 ml screw-cap vial. Saccharifications were carried out at 70 °C in a shaker for 72 h, with 50 µl samples taken every 24 h. All hydrolysates were collected via centrifugation at 21,000g for 5 min and 0.45 μm filtered to remove large biomass particles prior to sugar analysis. After filtration, samples were kept frozen at −20 °C and thawed before analysis. Glucose concentrations were measured on an Agilent 1200 Series HPLC system equipped with an Aminex HPX-87H column (Bio-Rad) and refractive index detector. Samples were run with an isocratic 4 mM sulfuric acid mobile phase. Sugar concentrations were determined using standards containing cellotriose, cellobiose, glucose, xylose and arabinose.

PAGE and zymograms

SDS–PAGE was performed with 8–16% Protean TGX protein gradient gels (Bio-Rad) with the Tris-glycine-SDS buffer57. Blue Native (BN)–PAGE58 was performed with 3–12% NativePAGE Bis-Tris protein gradient gels (Thermo Scientific) in presence of 0.02% Coomassie Blue G-250. For subunit analysis of native complexes, individual lanes from the BN–PAGE were excised, incubated in 2% SDS and 160 mM dithiothreitol (DTT), and denatured at 95 °C for 10 min, unless otherwise indictated (Fig. 3e). Proteins were separated with 8% polyacrylamide gels, which were hand cast. Protein bands were stained with SimplyBlue SafeStain Coomassie Blue dye (Thermo Scientific) according to the manufacturer’s instructions.

Protein bands with activity on CMC and xylan were visualized using modification of the zymogram technique, as described previously15. Gels were incubated in 2% wt/vol CMC or 2% wt/vol birchwood xylan solutions followed by incubation at 60 °C for up to 2 h in reaction buffer (25 mM MES, pH 6.0). In-gel enzymatic activities were visualized by incubating gels with a 0.5% Congo Red solution for 15 min and subsequent multiple washing steps with 20% NaCl.

Glycosylation analysis

Protein glycosylations were visualized in-gel by the periodic acidic Schiff stain27 using a Pierce Glycoprotein Staining Kit (Thermo Scientific) according to the manufacturer’s instructions.

N-glycan analysis was performed as described previously59. However, no N-linked glycans were detected. Total glycosyl compositional analysis was performed by combined gas chromatography/mass spectrometry (GC/MS) of the per-O-trimethylsilyl (TMS) derivatives of the monosaccharide methyl glycosides produced from the sample by acidic methanolysis60. O-linked glycans were released by β-elimination and permethylated59. The permethylated O-linked glycans were analysed by matrix assisted laser desorption/ionization-time of flight (MALDI–TOF) and electrospray ionization tandem mass spectrometry (ESI MS/MS)61 and gas chromatography/mass spectrometry (GC/MS) for linkage analysis62.

Proteinase K digestion

The E. coli-expressed CelA and CelC and the AD fraction were digested at 50 °C for 60 min in reaction buffer (20 mM Tris-HCl, 400 mM NaCl and 0.3% SDS, 5 mM EDTA containing 75 µg of respective enzyme and 3.75 µg proteinase K). After heat inactivation of proteinase K at 95 °C, the reaction mixture was analysed by SDS–PAGE (8–16% gradient).

Proteomic analysis

Proteins were digested from SDS–PAGE gels as previously described63. Samples were analysed on an Agilent 6550 iFunnel QTOF mass spectrometer coupled to an Agilent 1290 UHPLC system, as described in ref. 64. Briefly, peptides were loaded onto an Ascentis Express Peptide ES-C18 column (10 cm length × 2.1 mm internal diameter, 2.7 µm particle size; Sigma Aldrich) operating at 60 °C and at a flow rate of 400 µl min–1. A 13.5 min chromatography method with the following gradient was used: the initial starting condition (95% Buffer A (0.1% formic acid) and 5% Buffer B (99.9% acetonitrile, 0.1% formic acid)) was held for 1 min. Buffer B was then increased to 35% in 5.5 min, followed by an increase to 80% B in 1 min, where it was held at 600 µl min–1 for 3.5 min. Buffer B was decreased to 5% over 0.5 min, where it was held for 2 min at 400 µl min–1 to re-equilibrate the column with the starting conditions. Peptides were introduced into the mass spectrometer from the UHPLC by using a Dual Agilent Jet Stream Electrospray Ionization source operating in positive-ion mode. The source parameters used include a gas temperature of 250 °C, drying gas at 14 l min–1, nebulizer at 35 p.s.i.g, sheath gas temp of 250 °C, sheath gas flow of 11 l min–1), V Cap of 5,000 V, fragmentor V of 180 V and OCT(octopole) 1 RF (radio frequency) V pp of 750 V. The data were acquired with Agilent MassHunter Workstation Software, LC/MS Data Acquisition B.06.01 (Build 6.01.6157). The resultant data files were searched against a data set containing reconstructed population genomes from the 300 l bioreactor, with common contaminants appended, with Mascot version 2.3.02 (Matrix Science), then filtered and refined using Scaffold version 4.6.1 (Proteome Software).

Heterologous protein expression

Constructs for the CelABC genes were obtained both by PCR amplification from metagenomic DNA with specific primers (Supplementary Table 8) and synthesis of codon-optimized versions for expression in E. coli (Gen9). Genes were cloned into the modified bacterial expression vector pET39b(+) vector with a T7/lac promoter and a TEV-cleavable C-terminal 6xHis tag but lacking the DsbA secretion sequence (Novagen) using Gibson assembly65. All reagents were purchased from New England Biolabs. The desired genes without their signal sequences and the expression vector were PCR-amplified, DpnI-digested and incubated with 1× Gibson assembly Master Mix for 15 min at 50 °C. The product was then transformed into chemically competent E. coli DH10α cells for storage and for heterologous protein expression into chemically competent E. coli BL21 (DE3). Starter cultures (50 ml) of E. coli BL21 (DE3) harbouring plasmids were grown overnight in LB medium containing 25 μg ml–1 kanamycin at 37 °C and shaken at 200 r.p.m. in rotary shakers. Expression was performed in Terrific broth with 2% glycerol, 25 μg ml–1 kanamycin and 2 mM MgSO 4 . Starter cultures were used to inoculate 1 l of expression medium in a 2 l baffled Erlenmeyer flask and incubated at 18 °C while shaking (200 r.p.m.), and induced with 500 µM isopropyl β-d-thiogalactopyranoside (IPTG). Following induction, cultures were again incubated at 18 °C. At 22 h, cultures were centrifuged at 15,500g for 30 min. Cell pellets were resuspended in 25 ml lysis buffer (50 mM NaPO 4 , 300 mM NaCl, 5 mM imidazole; pH 7.4) and homogenized with an EmulsiFlex-C3 instrument (Avestein). After incubation at 60 °C for 30 min, lysates were collected via centrifugation at 75,000g for 30 min and 0.45 μm filtered to remove large particles before purification. Polyhistidine-tagged proteins were purified on Cobalt-NTA resin (Thermo Scientific). To cleave the 6xHis Tag, 1 g purified protein was incubated with 50 mg His-tagged TEV-protease and simultaneously dialysed against 4 l reaction buffer (50 mM NaPO 4 , 300 mM NaCl; pH 7.4) for 24 h and three reaction buffer exchanges. After a second purification step via Cobalt-NTA resin, the flow-through fractions contained the purified and untagged proteins. Proteins were stored at 4 °C until ready for use. The proteins were >90% pure as visualized by SDS–PAGE (Fig. 4c).

Life Sciences Reporting Summary

Further information on experimental design and reagents is available in the Life Sciences Reporting Summary.

Data availability

Metagenomic sequencing data can be accessed at the JGI IMG website (http://img.jgi.doe.gov/) or the JGI Genome Portal (http://genome.jgi.doe.gov/), and the specific IMG genome IDs are listed in Supplementary Table 12. The draft genome sequence for ‘Candidatus Reconcilbacillus cellulovorans’ has been deposited at GenBank (MOXJ00000000). The gene sequences and plasmid constructs for the ‘Ca. Reconcilbacillus’ cellulases CelA (JPUB_007824), CelB (JPUB_007826) and CelC (JPUB_007828) are available from the public version of the JBEI Registry (https://public-registry.jbei.org) and are physically available from the authors and/or Addgene (http://www.addgene.org) upon request.