Sampling

Sampling dates and locations are detailed in Supplementary Table 10. Triplicate soil cores were collected using a push corer from palsa (intact permafrost), bog (partially thawed) and sedge-dominated (Eriophorum spp.) fen in Stordalen Mire, northern Sweden (68°21′N, 19°03′E, 359 m a.s.l.) on 30th September and 1st August 2010, and 15th June, 12th July, 16th August and 16th October 2011. Cores were subsampled by depth (see Supplementary Table 10), avoiding 1 cm around the edge of the corer, placed in cryotubes, saturated with ~3 volumes LifeGuard solution (MoBio Laboratories, Carlsbad, CA, USA) and stored at −80 °C until processing.

Pore water measurements

Pore water samples were collected from 35 to 40 cm below the peat surface using a syringe connected to a stainless steel tube. Samples were filtered with 25-mm diameter Whatman Grade GF/D glass microfiber filters (2-μm particle retention) and injected into 30 ml evacuated vials sealed with butyl rubber septas. Samples were frozen and shipped to Florida State University for analysis. After thawing, samples were acidified with 0.5 ml of 21% H 3 PO 4 and the headspace was brought to atmospheric pressure with helium. The sample headspace was analysed for concentrations, δ13C of CH 4 and CO 2 on a continuous-flow Hewlett-Packard 5890 gas chromatograph (Agilent Technologies) at 40 °C coupled to a Finnigan MAT Delta V isotope ratio mass spectrometer via a Conflo 2 interface system (Thermo Scientific, Bremen, Germany). The headspace gas concentrations were converted to pore water concentrations based on their known extraction efficiencies, defined as the proportion of formerly dissolved gas in the headspace. An extraction efficiency of 0.95 (based on repeated extractions) was used for CH 4 , and the extraction efficiency for CO 2 relative to DIC was determined based on CO 2 extraction from dissolved bicarbonate standards30. The carbon isotope fractionation factor (α c ) was calculated as in Whiticar et al.31:

The standard errors for α C were propagated from the δ13C errors as follows:

where , and are the s.e. for α c , δ13C DIC and δ 13 C CH 4 , respectively. pH of the pore water was collected in the field with a Cole-Parmer portable pH meter.

Methane flux measurements

The autochamber system at Stordalen Mire has previously been described in detail8. Briefly, a system of eight automatic gas-sampling chambers made of transparent Lexan was installed in the three habitat types at Stordalen Mire in 2001 (n=3 each in the palsa and bog habitats, and n=2 in the fen habitat). Chambers cover an area of 0.14 m2 (38 cm × 38 cm), with a height of 25–45 cm. Each chamber is closed once every 3 h for a period of 5 min. The chambers are connected to the gas analysis system, located in an adjacent temperature controlled cabin, by 3/8 inch Dekoron tubing through which air is circulated at ~2.5 l min−1. During the 2011 season, the system was updated with a new chamber design similar to that described by Bubier et al.32 The new chambers cover an area of 0.2 m2 (45 cm × 45 cm), with a height ranging from 15 to 75 cm depending on habitat vegetation. At the palsa and bog sites, the chamber base is flush with the ground and the chamber lid (15 cm in height) lifts clear of the base between closures. At the fen site, the chamber base is raised 50–60 cm on lexon skirts to accommodate large stature vegetation.

Starting July 1st 2011, methane fluxes were measured using a Quantum Cascade Laser Spectrometer (QCLS, Aerodyne Research Inc.). The QCLS instrument deployed at Stordalen Mire is a modification of the technology described in detail by Santoni et al.33 We connected the QCLS to the main autochamber circulation using ¼ inch Dekoron tubing and a solenoid manifold that enables selection between the autochamber flow and an array of calibration tanks. During measurement periods, filtered (0.45 μm, teflon filter) and dried (Perma Pure PD-100 T-24MSA) sample-air flows at 1.4 SLPM through the 2-l QCLS sample cell volume at 5.6 kPa. A downstream solenoid controls the QCLS return flow so that air only recirculates during autochamber measurement periods; during calibration periods tank air is vented to the room. Calibrations were done every 60 min using three calibration gases spanning the observed concentration range (1.5–10 p.p.m.). For each calibration period a linear calibration curve was fitted and the fit parameters were linearly interpolated between calibration periods.

For each autochamber closure fluxes were calculated using a method consistent with that detailed by Bäckstrand et al.8 using a linear regression of changing headspace CH 4 concentration over a period of 2.5 min. Eight 2.5-min regressions were calculated, staggered by 15 s and the most linear fit (highest r2) was then used to calculate flux. Average fluxes were calculated for the week leading up to and including the sample collection dates based on indivdual chambers as the unit of replication (n=3 for palsa and bog, n=2 for fen). For sampling dates before the installation of the QCLS, CH 4 fluxes were estimated from the CH 4 flux data published by Bäckstrand et al.7 Characteristic fluxes for the week leading up to and including the August/September 2010 sampling (day of year 238–244) and June 2011 sampling (day of year 160–166) were calculated by averaging flux measurements for those dates from the 2002 to 2007 data set using individual chambers as the unit of replication.

SSU rRNA gene amplicon sequencing

Total nucleic acids were extracted from ~2 g peat sample using the PowerMax Total Nucleic Acid extraction kit (MoBio), retaining the LifeGuard preservation solution during lysis. DNA was purified by RNaseA digestion, phenol-chloroform-isoamyl alcohol purified and ethanol precipitated. Approximately 15 ng DNA from each sample was used as a template in PCR reactions. The universal primers, 926F (5′-CCTATCCCCTGTGTGCCTTGGCAGTC TCAG AAACTYAAAKGAATTGRCGG-3′, sequencing adapter in bold, key underlined and SSU-specific primer following) and 1392wR (5′-CCATCTCATCCCTGCGTGTCTCCGAC TCAG XXXXXACGGGCGGTGWGTRC-3′, as above but also included a variable length multiplex identifier unique to each sample (Xs) listed in Supplementary Table 11), were used to amplify an ~500 bp (V6–V8) region of the SSU rRNA gene from community members (similar to a primer set tested in Engelbrektson et al.34). These primers were confirmed to match exemplar strains from each of the currently known35 seven orders of methanogens (IMG 4.1 (ref. 36)36 identifier 2518645582, IMG 4.0 identifiers 637000162, 649633067, 2512564055, 638154507, 640753014 and 638154506) using iPCRess 2.2.0 (ref. 37)37. Template DNA was amplified in duplicate 50 μl reactions containing 1 U Taq DNA polymerase (Fisher), 0.2 mM dNTP mix (Fisher), 2 mM MgCl 2 (Fisher), 2 μM of each primer and 10 μg μl−1 BSA (NEB). PCR was in a Veriti thermocycler (AppliedBiosystems, Carlsbad, CA, USA) with an initial denaturation step of 95 °C for 3 min, 30 cycles of dissociation at 95 °C for 30 s, annealing at 55 °C for 45 s, extension at 74 °C for 30 s and final extension of 10 min at 74 °C. Amplicons were sequenced using the reverse primer on the 454 GS FLX (Roche) with samples unrelated to this study using equal volumes. Informatic analysis methods are detailed in Supplementary Methods. Samples were multiplexed over five separate runs.

Metagenome sequencing

For metagenomic sequencing, ~100 ng DNA was sheared using a Covaris S2 (Covaris Inc.) according to the methods outlined in the Ion Fragment Library Kit protocol (publication 4467320 Rev. B). The library was prepared using the Ion Plus Fragment Library Kit and a modified version of the method described in the corresponding user guide (publication 4471989 Rev. B). After the size and concentration of the libraries was determined using the Agilent 2100 Bioanalyzer (Agilent Technologies) with the High Sensitivity DNA Kit (Agilent Technologies), the library was diluted and the Ion OneTouch system was used to prepare the template, using the Ion OneTouch Template Kit and corresponding user guide (publication 4468007 Rev. E). Sequencing of three 316 chips was performed using the Ion Sequencing Kit and associated user guide (publication 4469714 Rev. C). The Ion Torrent Suite version 2.0 and 2.0.1 were used for analyses and the SFF was subsequently downloaded for analysis.

Genome assembly and binning

A total of 533 Mb of single-ended 100 bp Ion Torrent PGM shotgun data were generated. Fastq and XML files were extracted from the SFF using sff_extract ( http://bioinf.comav.upv.es/sff_extract) version 0.2.12 using parameters ‘-Q -s metagenome2.fastq -x metagenome2.xml 1.sff 2.sff 3.sff’. To determine SSU rRNA sequences detected from this set, reads were mapped using BWA-MEM (v0.7.5a) against the GreenGenes 2013 (ref. 38)38 database 97% representative set. 98% of primary Archaeal hits that were >95% identical over at least 95 bp were assigned to Candidatus ‘M. stordalenmirensis’. Extracted sequences were assembled using MIRA39 3.4.0 with parameters ‘--project=metagenome2 --job=denovo,genome,accurate,iontor -MI:sonfs=no’. Contigs with coverage between 22 and 36 were considered for further analysis based on the method by Teeling et al.40 and implemented here as a biogem41 called bio-kmer_counter ( https://github.com/wwood/bioruby-kmer_counter) and visualized using ggplot2 (ref. 42) (Supplementary Fig. 2). Reads included in the assembly in the remaining 154 contigs were extracted and re-assembled using sffinfo 2.3 and newbler 2.3 (454 Life Sciences). Mate-pair sequencing and scaffolding methods are detailed in the Supplementary Methods.

Interrogation of gag errors in assembled contigs

Strand-specific errors similar to those recently reported43 introduced frame-shift single-nucleotide deletion errors into ~10% of open-reading frames. These were corrected using a purpose-built algorithm, ‘bio-gag’. To investigate the properties of gag errors, Ion Torrent sequencing was carried out on isolate cultures of Bacillus amyloliquefaciens and Sulfolobus tokodaii. These data are described in a separate report44. De-novo assemblies using newbler 2.3 were generated and compared with their respective reference genomes (GenBank identifiers NC_014551.1 and NC_003106.2, respectively) using dnadiff (included with MUMmer, http://mummer.sourceforge.net) version 3.22. The four bases surrounding each single-nucleotide deletion were tabulated and those contexts that contained a deletion of one base from a two-base homopolymer were considered as potential gag errors (Supplementary Fig. 3). Plots were generated using Tablet45 and ggplot2 (ref. 42)42. Gag errors in the Candidatus ‘M. stordalenmirensis’ genome were corrected with a generally applicable algorithm, presented in Supplementary Methods.

Genome validation

Of the 104 ortholog groups in AMPHORA2 (ref. 11)11, 99 were found to be single-copy groups using the ‘MarkerScanner.pl’ script of AMPHORA2 (slightly modified for implementation reasons, taking the presence of only a single peptide in the respective output fasta files to mean single copy). In addition, ndk was found using BLASTP 2.2.26+ (ref. 46) using the Methanocaldococcus jannaschii protein (GenBank ID NP_248261.1) as a query sequence. Two genes were interrupted by errors that appear to be gag-like. For the genes miaB and pelA, two Candidatus ‘M. stordalenmirensis’ peptides were reported by AMPHORA2, but in each case inspection of BLASTP against the NCBI ‘nr’ database indicated one belonged to a separate orthologous group. Thus, all 104 AMPHORA2 marker genes were found to be single copy in the Candidatus ‘M. stordalenmirensis’ genome. Genes thought to be single copy in Euryarchaea were also used as validation (see ‘Assessment of the Mackelprang et al. 2011 contigs’ below).

Genome tree

The genome tree was constructed using a concatenated protein-sequence approach using the single-copy genes in AMPHORA2 (ref. 11)11, with a custom Ruby script ( https://github.com/wwood/bbbin/blob/master/yagenome.rb) git commit 3b8a124. For each of the 104 genes outlined in the Supplementary Data 1 of Wu and Scott11, the corresponding hidden Markov model (HMM) was queried against the protein sequences of Candidatus ‘M. stordalenmirensis’ using HMMER’s hmmsearch program47 with default parameters version 3.0. The best hit protein sequence was parsed from the ‘tblout’ format using a custom biogem bio-hmmer3_report ( https://github.com/wwood/bioruby-hmmer3_report) version 724862b and aligned to the HMM using hmmalign with parameters ‘--allcol --trim’ and the resulting stockholm format file then converted to FASTA using seqmagick git commit 6816f9d ( http://fhcrc.github.com/seqmagick). Lowercase (unaligned) portions of the aligned sequence were then removed. These aligned sequences from each of the 104 HMMs was then concatenated into an overall Candidatus ‘M. stordalenmirensis’ sequence. Where no blast hit was identified, a custom biogem41 bio-hmmer_model ( https://github.com/wwood/bioruby-hmmer_model) version 0.0.2 was used to parse out the length of the HMM and an equivalent number of gap characters was added to the overall alignment instead. The same procedure was repeated on all finished archaeal proteomes available from IMG version 4 (ref. 36). A FASTA file of the overall sequences for each genome (Supplementary Data 2) was used to construct a phylogenetic tree using FastTree48 version 2.1.3 with default parameters. Sequence identifiers were then converted to a more human-readable form using the newick utils nw_rename program49, and visualized using Archaeopteryx50, ARB51 and Inkscape ( http://inkscape.org).

Assessment of the Mackelprang et al. 2011 contigs

The contig sequences for the Hess Creek genome were downloaded from NCBI (GenBank accession AGCH01000000.1). Single-copy gene analysis was carried out using AMPHORA2 (ref. 11)11 as for the Candidatus ‘M. stordalenmirensis’ genome. Manual inspection of single-copy genes occurring in multiple copies confirmed the presence of multiple distinct orthologues (Supplementary Table 7). Genes thought to be single copy in Euryarchaea were also used as validation, using CheckM 0.3.1 ( https://github.com/Ecogenomics/CheckM). Out of the 136 PFAM domains found to be single copy in at least 95% of Euryarchaeal genomes, 21 were zero copy, 47 were single copy, 68 were dual copy and 1 was triple copy (estimated genome completion 85%, estimated genome contamination 50%). In contrast, the Candidatus ‘M. stordalenmirensis’ genome had 7 zero copy, 127 single copy and 2 dual copy (estimated genome completion 95%, estimated genome contamination 1%).

To determine how many of the Hess Creek contigs were most similar to the genome of Candidatus ‘M. stordalenmirensis’, representative archaeal strains from IMG 4.0 ( ftp://ftp.jgi-psf.org/pub/IMG/img_core_v400/)36 were chosen at random using the img_metadata_scanner.rb script of a custom-built rubygem img_scripts ( https://github.com/wwood/img_scripts version 0.0.1) with parameters ‘--sample Species Status=Finished’. This script in turn relied on another custom rubygem bio-img_metadata, version 0.0.3 ( https://github.com/wwood/bioruby-img_metadata). The Hess Creek sequences were queried against a BLAST database made from the randomly selected concatenated genome nucleotide sequences and the Candidatus ‘M. stordalenmirensis’ genome using BLASTN 2.2.26+ (ref. 46) with default parameters. Of the 174 Hess Creek contigs, 139 showed highest similarity to the Candidatus ‘M. stordalenmirensis’ genome (identities 72–94%, aligned region lengths 110–10,290 bp), two were weakly (e-value >1e-6) similar to other genomes and the remaining 33 did not show similarity to any sequence in the database. On a per length basis, 31% of the Hess Creek contigs showed significant similarity to the Candidatus ‘M. stordalenmirensis’ genome and the reciprocal comparison showed 22% significant similarity (BLASTN e-value <1e-5, assessed using a custom script https://github.com/wwood/bbbin/blob/master/blast_overlap_percentage.rb git version 2cfaec2). The partial mcrA gene was identified in the Hess Creek contigs with TBLASTN using the Candidatus ‘M. stordalenmirensis’ McrA protein sequence as a query. Attempts to locate an SSU rRNA gene were performed by querying both the Candidatus ‘M. stordalenmirensis’ and Methanocella paludicola (IMG gene identifier 646465173) SSU rRNA gene sequences against the Hess Creek contigs using BLASTN through SequenceServer ( http://sequenceserver.com/).

Genome annotation

Genome annotation was carried out using prokka 1.5.2 (Prokka: prokaryotic genome annotation system, http://bioinformatics.net.au/software.prokka.shtml). Genes of interest were further investigated using KEGG52, MetaCyc53, FastTree48, BLASTP+ (ref. 46) against IMG 4.0 proteomes36, UniRef90 (ref. 54)54 and PFAM55.

Metaproteomics

Triplicate soil cores were collected in a sedge-dominated (Eriophorum spp.) fen on September 1st 2010 (locations 68° 21.203 N, 19° 02.799 E; 68˚ 21.202 N, 19° 02.808 E; and 68° 21.196 N, 19° 02.808 E.). Further metaproteomic methods are detailed in the Supplementary Methods.

Distribution of Candidatus ‘M. stordalenmirensis’

Global distribution of Candidatus ‘M. stordalenmirensis’ was surveyed by searching the NCBI ‘nt’ database. An overview of studies where Candidatus ‘M. stordalenmirensis’ was found is provided in Supplementary Tables 8 and 9. Searching of the ‘nt’ database used BLAST 2.2.22 (ref. 56)56 with the following parameters: ‘-v 200000 -b 200000 -p blastn -m 8’. The resultant tab-separated values file was parsed to extract hits with >97% identity using bio-table ( https://github.com/pjotrp/bioruby-table). Hits were then downloaded from NCBI using genbank-download git version 292a2f8 ( https://bitbucket.org/simongreenhill/genbank-download/, Greenhill unpublished) and individually used as queries to search a BLAST database consisting of the merged GreenGenes/Silva database as above, as well as the Candidatus ‘M. stordalenmirensis’ SSU rRNA gene region. This search was conducted using BLASTN 2.2.26+ (ref. 46) using the parameters ‘--max_target_seqs 1 -outfmt 6’. Those sequences that hit Candidatus ‘M. stordalenmirensis’ with identity >97% and could be associated with a peer-reviewed report were considered Candidatus ‘M. stordalenmirensis’ phylotypes. GenBank entries were linked to peer-reviewed publications using the PubMed57 identifier present in the GenBank entry, or failing that found manually using Google Scholar ( http://scholar.google.com) or PubMed using a combination of the ‘TITLE’ and ‘AUTHOR’ fields of the GenBank entry.

Description of Candidatus ‘M. stordalenmirensis’

‘Methanoflorens’ (Me.tha.no.flo.ren’s. N.L. n. methanum (from French n. méth(yle) and chemical suffix -ane), methane; N.L. pref. methano-, pertaining to methane; N.L. masc. substantive from L. part. masc. adj. florens, flourishing, to bloom; N.L. masc. adj. ‘Methanoflorens’, methane producer that blooms). ‘stordalenmirensis’ (stor.da.len.mir.en'sis. N.L. masc. adj. ‘stordalenmirensis’, of or belonging to Stordalen Mire, Sweden from where the species was characterised). Methanoflorentaceae (Me.tha.no.flo.ren.ta.ce'a.e. N.L. n. ‘Methanoflorens’ -entis, type genus of the family; suff. -aceae, ending to denote a family; N.L. fem. pl. n. Methanoflorentaceae, the family of the genus ‘Methanoflorens’).