Field experiment and sample collection

The research site near the town of Rifle, northwestern Colorado (USA), has been described previously42. Briefly, the site is located on a 9 ha floodplain in northwestern Colorado that is underlain by an aquifer comprised of 6–7 m of unconsolidated sands, silts, clays and gravels deposited by the Colorado River. Amendment of acetate to the aquifer occurred through five boreholes oriented orthogonal to groundwater flow direction and spaced at 1.5-m intervals. Cross-well mixing was used to disperse the injectate across the width of the injection zone.

Groundwater samples were taken prior to (GWA1) and following acetate amendment (GWB1). Acetate-amended groundwater was injected upgradient 3.5 and 5.5 m below the surface to achieve aquifer concentrations of 15 mM (acetate; Sigma-Aldrich, Saint Louis, MO, USA) and 2 mM (bromide; Sigma-Aldrich). Prior to acetate amendment 140 l, and on 03 September 2011 and 05 September 2011, 7 days (GWB1) and 9 days after the start of acetate amendment, 100 l of groundwater were pumped and filtered sequentially through a 1.2-μm pore size pre-filter (293-mm diameter Supor-1200 hydrophilic polyethersulfone membrane disc filter; Pall Corporation, Ann Arbor, MI, USA), with biomass retained on a 0.2-μm pore size (293-mm diameter Supor-200 hydrophilic polyethersulfone membrane disc; Pall Corporation) and a 0.1-μm pore size sample filter (142-mm diameter Supor-100 hydrophilic polyethersulfone membrane disk filter; Pall Corporation). Filters were immediately frozen in an ethanol–dry ice mix, stored at −80 °C and shipped overnight to the University of California, Berkeley, for DNA extraction. For cryo-TEM, 500 ml of 0.2-μm filtrate was concentrated with Vivaspins (cutoff 30 kDa; GE Healthcare, Pittsburgh, PA, USA) to ~500 μl and cryo-plunged immediately (see below). For molecular, metagenomic and cryo-TEM correlation analyses, the same groundwater sample (GWB1) was used.

DNA extractions

Approximately 1 g of each filter was used for DNA extraction using the PowerMax Soil DNA Isolation kit (Mo Bio Laboratories Inc., Carlsbad, CA, USA, Cat# 12988). Manufacturer’s protocol was followed, with the exception of adding a freeze/thaw step and vortexing bead tubes for 3.5 min after addition of the SDS reagent, followed by 30 min at 65 °C with intermittent shaking. DNA in the 5-ml eluted volume was concentrated by sodium acetate/ethanol precipitation with glycogen followed by resuspension in provided elution buffer.

Preparation of clone libraries and sequencing

Full-length, bacterial 16S rRNA sequences were amplified by utilizing a gradient PCR using general bacterial primers 27F (5′-AGAGTTTGATCMTGGCTCAG-3′) and 1492 R (5′-GGTTACCTTGTTACGACTT-3′)43. For PCR, the thermocycler reaction conditions were as follows: initial denaturation at 94 °C for 1 min, 25 cycles of denaturation at 94 °C for 30 s, annealing across an eight-step gradient from 48–59 °C for 30 s, extension at 72 °C for 1 min and a final extension at 72 °C for 7 min. Correct amplicon size was verified with gel electrophoresis and the PCR product was cleaned up using the UltraClean PCR Clean-up Kit (Mo Bio Laboratories Inc., CA Cat# 12500). Clone libraries were generated using a TOPO TA cloning kit and electrocompetent cells (Life Technologies Corp., Grand Island, NY, USA). One hundred transformants from the 0.1- and 0.2-μm clone libraries were verified by colony PCR using the M13 forward (5′-GTAAAACGACGGCCAGT-3′) and reverse (5′-CAGGAAACAGCTATGAC-3′) primers and gel electrophoresis. The colony PCR thermocycler amplification conditions were as follows: E. coli cell lysis and initial denaturation at 95 °C for 10 min, 25 cycles of denaturation at 95 °C for 30 s, annealing at 53 °C for 30 s and extension at 72 °C for 1.5 min and a final extension at 72 °C for 7 min. Successful transformants were Sanger sequenced using the M13 forward and reverse primers (only for the 0.1-μm filter). Sequences were primer and vector screened using cross_match ( http://www.phrap.org) and NCBI VecScreen ( http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html), quality scored using Phred ( http://www.phrap.org) and assembled into contigs using Phrap ( http://www.phrap.org). Sequences were trimmed to retain only bases Phred ≥q20 and high-quality contigs were tested for chimeras using USEARCH 64 ( http://www.drive5.com). Sequences were identified utilizing BLAST44 against the Arb-Silva Database ( http://www.arb-silva.de).

16S rRNA gene phylogenetic analysis

16S rRNA gene sequences from cells retained on the 0.2 μm filter (50 clones, resulting in 21 operational taxonomic units (OTUs) after chimera checking and clustering as described previously) and 0.1-μm filter (108 clones, resulting in 24 OTUs) were obtained by sequencing of the clone libraries. The individual clone sequences were clustered at 97% using UCLUST (part of USEARCH 64). We also used EMIRGE20 to reconstruct 16S rRNA gene sequences after trimming the Illumina reads using sickle to remove low-quality bases ( https://github.com/najoshi/sickle). For EMIRGE, paired-end reads, where both reads were at least 60 nucleotides in length after trimming, were used as inputs. For each sample, EMIRGE was run for 100 iterations. Reconstructed sequences for all sampled taxa were combined with database sequences representing the most closely related taxa for subsequent analysis. EMIRGE reconstructions generated 26 and 36 OTUs for the 0.2- and 0.1-μm filters, respectively. EMIRGE, clone library and Arb-Silva database WWE3-OP11-OD1 16S rRNA gene sequences were aligned with MUSCLE45 using default parameters. The alignment was used to generate a maximum likelihood tree with RAxML46 using the GTRCAT model of nucleotide substitution and 200 bootstrapped replicates and E. coli as an outgroup. The tree was edited using iTOL47. Poorly aligned or lower-quality sequences from the Arb-Silva database were removed prior to further analysis. The environments from which each sequence was obtained were pulled from the Arb-Silva database using the Arb software package.

Metagenomics methods

A total of 9,781,022,700 bp of Illumina data (150 bp paired reads) was generated for GWA1 and 369,257,200 bp was generated for GWB1at the Joint Genome Institute, Walnut Creek, CA. The same GWB1 sample (0.1-μm filter fraction) was used for cryo-TEM characterization. Sequence data sets were assembled (after trimming to remove low-quality bases) using idba_ud48 using the default settings. Open-reading frames were predicted using Meta-Prodigal49 and assigned a preliminary annotation using USEARCH44 against the Uniref90 database ( http://www.uniprot.org/). Community composition was profiled primarily using single-copy ribosomal protein S3 genes carried on scaffolds >5 kb in length (detection limit ~0.01%). Organism abundance levels were determined based on sequence coverage. Detailed genome reconstructions for the organisms in these samples will be reported separately.

Because sequences from the most abundant populations (high sequence coverage) often assemble poorly, the analysis also used two data subsets per sample (1/10th and 1/50th of the data for the GWB1 sample and 1/9th and 1/27th of the data for the GWA1 sample). Community composition analysis used results reconciled from these subassemblies. Genomic data from the subassemblies were binned to specific populations based on GC content, coverage and phylogenetic profile. Each genome was either near-complete or well sampled in one or multiple data sets. Phylogenetic profiling-based binning was helpful because many organisms on the filtrates were relatively similar to organisms that are represented in our in-house candidate phyla genomic data set (WWE3, OP11, OD1 and archaea: reported in refs 14, 16, and data to be published elsewhere). Abundances are reported as coverage and/or DNA representation. Coverage was determined based on read mapping statistics. DNA representation used coverage statistics, approximate genome size and total data size (as above).

Cryo-TEM specimen preparation in the field

For cryo-TEM and synchrotron infrared (SIR) spectromicroscopy (see below), 200 mesh lacey carbon-coated formvar Cu-grids (Ted Pella Inc., Redding, CA, USA) were used. For correlative FISH and TEM, a lacey or a continuous formvar support film was laid on TEM nickel finder grids (Maxtaform Finder Grid Style H7, 63-μm pitch 400 mesh) and grids were carbon coated. All TEM grids were treated by glow discharge to improve sample deposition onto the grids. Ten and 250 nm colloidal gold particles (BBInternational, Cardiff, UK) were put on TEM grids for cryo-TEM and SIR spectroscopy, and for correlative FISH and TEM, respectively, and allowed to dry prior to sample addition. Aliquots of 5 μl 0.2-μm-filtered groundwater sample were deposited onto the grids, manually blotted with filter paper and plunged into liquid propane at liquid nitrogen temperature using a portable cryo-plunge device on site17. Grids were stored in liquid nitrogen until further analysis.

Clone fluorescence in situ hybridization

Subcloning for construction of the positive controls, E. coli cells each carrying the 16S rRNA gene sequence of one of the three bacterial types (WWE3, OP11 and OD1) was performed using the Novagen AccepTor Vector Kit (EMD Millipore, Merck KGaA, Darmstadt, Germany). Subclones with OP11-WWE3-OD1 16S rRNA gene sequences present were identified by sequencing using pETBlueT7UP forward (5′-TCATAACGTCCCGCGAAA-3′) and pETBlueDown reverse (5′-GTTAAATTGCTAACGCAGTCA-3′) primers and BLAST44 against the Arb-Silva Database. Plasmids containing WWE3-OP11-OD116S rRNA sequences were isolated from subclones used to transform into the NovaBlue (DE3) strain for the subsequent Clone-FISH steps (EMD Millipore, Merck KGaA).

Clone-FISH E. coli strains transformed with WWE3-OP11-OD1 sequences were fixed for FISH by centrifuging at 15,000 r.p.m. for 2 min at 4 °C, resuspending in 1 ml PBS (pH 7), centrifuging again and resuspending in 250 μl PBS and 750 μl 4% paraformaldehyde. Cells were allowed to fix for 3 h at 4 °C before centrifuging at 15,000 r.p.m. for 2 min at 4 °C, resuspending in a 1:1 mixture of ethanol and PBS. FISH runs were performed at a range of formamide concentrations between 20 and 50% to establish the optimum concentration that allowed proper hybridization but reduced apparent nonspecific binding.

CARD-FISH

For correlative cryo-TEM and CARD-FISH two approaches were performed. For the first approach, frozen samples on Ni-Finder TEM grids were imaged and then the CARD-FISH protocol was applied50. For the second approach, frozen samples on Ni-Finder TEM grids were freeze-dried and embedded in low-gelling point agarose (0.1% final concentration), dried at room temperature, then fixed in paraformaldehyde solution (2% final concentration), washed in sterile Milli-Q water, dehydrated in 50, 80, 90 and in 100% ethanol and air dried. Three different oligonucleotide probes (Supplementary Table 8), targeting rRNA genes, were applied to cells on TEM grids. Hybridization was performed following a method previously described in ref. 50, with a formamide concentration of 50%, incubation at 46 °C for 3 h and washing at 48 °C for 10 min. The subsequent amplification was performed at 46 °C for 10 min. Samples were counterstained with 4',6-diamidino-2-phenylindole DNA stain (1 μg ml−1 final concentration).

Confocal laser scanning microscopy was performed on a Carl Zeiss Inc. LSM 710 Zen 2010, Release Version 6.0 software (Carl Zeiss MicroImaging Inc., Thornwood, NY, USA), equipped with argon (458 nm, 488 nm and 514 nm) and He–Ne (594 nm, 543 nm and 633 nm) lasers and a diode 45–30 (405 nm). The diode (405 nm) was used for 4',6-diamidino-2-phenylindole signals (BP filter 410–585). Positively labelled cells (fluorochrome Alexa Fluor 546) were detected using the He–Ne 543 nm laser line (BP filter 548–680). A Plan-Apochromat × 100/1.4 oil differential interference contrast (DIC) (Zeiss) lens was used.

2D and 3D cryo-TEM

Cryo-TEM images were acquired on a JEOL–3100-FFC electron microscope (JEOL Ltd, Akishima, Tokyo, Japan) equipped with a field emission gun electron source operating at 300 kV, an Omega energy filter (JEOL), cryo-transfer stage and a Gatan 795 4 × 4 K charge-coupled device camera (Gatan Inc., Pleasanton, CA, USA) mounted at the exit of an electron decelerator held at a voltage of 200–250 kV51. The stage was cooled with liquid nitrogen to 80 K during acquisition of all data sets.

Over 100 2D images were recorded at different magnifications giving a pixel size of 0.375, 0.28 or 0.22 nm at the specimen. Underfocus values ranged between 3.6 μm±0.25 μm and 12 μm±0.5 μm, and energy filter widths were typically around 30 eV. The survey of the grids and the selection of suitable targets were done in low-dose defocused diffraction mode to minimize radiation damage.

Thirteen tomographic tilt series were acquired under low-dose conditions, typically over an angular range between +65° and −65°, ±5° with increments of 2°. Between 61 and 66 images were recorded for each tilt series, acquired semi-automatically with the program Serial-EM ( http://bio3d.colorado.edu/)52 adapted to JEOL microscopes. For tilt series data sets, all images show a pixel size of 0.56 or 0.746 nm at the specimen. Underfocus values ranged between 3.6 μm±0.25 μm and 9 μm±0.5 μm, and energy filter widths were ~30 eV. The average dose used per complete tilt series was ~113 e− Å−2. All tomographic reconstructions were obtained with the program Imod ( http://bio3d.colorado.edu/)52. The software ImageJ 1.38 × (NIH, http://rsb.info.nih.gov/ij/)53 was used for analysis of the 2D image projections. All movies were created with the open-source package ffmpeg ( http://www.ffmpeg.org/). Adobe Photoshop CS5.1 was used to adjust contrast in the images and to insert calibrated scale bars into images.

SIR spectromicroscopy

Cryo-TEM grids were placed onto the BaF 2 infrared windows (International Crystal Laboratories, NJ, USA) under liquid nitrogen. They were then allowed to air dry at ambient temperature on the BaF 2 windows.