Experimental design

Study population

CFI subjects were recruited at five different sites within the United States; Miami, Salt Lake City, Boston, New York City, and Sierra, Nevada and informed consent was obtained as described in Klimas et al. [22]. Cases were included in the study based on confirmed diagnosis by an expert clinician and based on either or both of the recognized ME/CFS case definitions, namely, the “1994 Fukuda criteria” and the “Canadian criteria”. The cohort was deliberately oversampled for patients with acute onset of the illness in the past 3 years. Gender, ethnicity, age (within 5 years)-matched healthy controls were specifically recruited for this study and were only included if they were both physically and mentally healthy. Time of sampling was also matched within 12 weeks from the sampling of cases and controls. Within a week of the questionnaire completion, blood was drawn, at which time the subjects underwent a full physical exam as well as filling out additional questionnaires to assess the severity of the illness on the day of sampling. All samples were processed similarly. Of the 405 subjects who participated in the CFI project, we received DNA from 193 patients with ME/CFS and 196 age and gender-matched healthy individuals from the CFI Biobank at the Duke Human Vaccine Institute (DHVI). DNA was prepared at DHVI from whole blood collected in Paxgene tubes (Qiagen, Germantown, MD).

Health questionnaires

In addition to demographic information, the Medical Outcomes Survey Short Form-36 (SF-36) and DePaul Symptom Questionnaire (DSQ) were administered to both cases and controls by the CFI physicians [22]. The SF-36 is a self-report general health survey of 36 items designed to compare the burden of disease on eight different aspects of well-being [23]. The DSQ is a self-report survey of the frequency and severity of 54 symptoms specifically associated with CFS, and covers various case definitions including the Fukuda et al. CFS criteria [24], Canadian Clinical criteria [25], and ME International Consensus criteria [26]. Only cases were surveyed by the DSQ. In the DSQ, frequency and severity of symptoms are scored from least to most on a scale from 0–4. An overall “distress” score is also calculated for each symptom on a scale from 0–16, by multiplying the frequency by severity scores [2, 27, 28]. In the CFI study, distress scores for the 54 symptoms were grouped into 10 functional clusters, and average cluster scores were calculated for each case [22].

DNA preparation and sequencing

Upon receipt from the Duke biobank, aliquots of the DNA samples were placed in 96-well plates before further processing. To avoid amplification of short nuclear mitochondrial copies, high-specificity long range PCR was performed, using NEB LongAmp® Hot Start Taq DNA polymerase (New England Biolabs®, http://www.neb.com). The 50 uL PCR reaction consisted of the following mixture: 10 uL 5X LongAmp Taq Reaction Buffer, 1.5 uL of dNTPs (1 mM), 4 uL of each primer pairs (10 uM each, see below for details), 2 uL of template DNA, 2 uL of LongAmp Hot Start Taq DNA polymerase (2500 U/mL) and 30.5 uL of nuclease-free water. Amplification was performed with a thermocycler PTC-100 (MJ Research Inc., Waltham, MA, USA), starting with 30 s at 94 °C, followed by 30 cycles consisting of denaturation (20 s at 94 °C), annealing (1 min at 55 °C) and extension (10 min at 65 °C) and a final extension at 65 °C for 10 min.

The 16,569 bp mitochondrial genome was amplified using two primer pairs (F402: ATCTTTTGGCGGTATGCACTTT; R11428: GGCTTCGACATGGGCTTT and F8940: CCCCATACTAGTTATTATCGAAACC; R2818: GCCCCAACCGAAATTTTTAAT, from Lyons et al. [29] leading to two PCR products of 11,026 bp and 10,447 bp respectively, overlapping by almost 2500 bp. The overlap was necessary to prevent loss of end sequence information due to the Mu technology used in the subsequent library preparation.

The success of the PCR amplification was confirmed by electrophoresis and 5 uL of post-PCR reaction product was treated to degrade unused primers and nucleotides using USB® HT ExoSAP-IT® High-Throughput PCR Product Cleanup (Affymetrix, http://www.affymetrix.com) according to the manufacturer’s instructions.

The treated PCR products were diluted with 50 uL of nuclease-free water before Quant-IT™ PicoGreen® dsDNA Assay Kit quantification (http://www.lifetechnologies.com) on a plate reader. The final mix of overlapping fragments was performed to obtain a final concentration of 0.2 ng/uL of each overlapping PCR product per sample before Illumina library preparation (http://www.illumina.com).

Library preparation was implemented by the Cornell Biotechnology Resource Center (BRC, http://www.biotech.cornell.edu/biotechnology-resource-center-brc) following the manufacturer’s instructions for the Nextera® XT DNA Sample Preparation Kit (96 samples, FC-131-1096, http://www.illumina.com) and the Nextera XT Index Kits V2 Set A, B, C and D (FC-131-2001, FC-131-2002, FC-131-2003 and FC-131-2004, respectively, http://www.illumina.com) to be able to multiplex 384 samples during the sequencing reaction. After quality control for library size and amplification using Cornell BRC’s Fragment analyser, the Illumina MiSeq instrument generated paired-end reads (2 × 300 bp) with the MiSeq v3 kit. Three consecutive and identical runs were executed on the same library to increase coverage depth. Average mitochondrial DNA sequencing depth was 1497.3 ± 488.8 across patients and controls (Additional file 1: Figure S1).

Bioinformatics and statistical analysis

Read mapping

Sequencing reads were mapped to the entire human reference genome (GRCh37) using the BWA-MEM alignment algorithm [30]. A number of methods were applied to filter out variation due to sequencing artifacts and errors. After mapping, a custom bash function was used to remove reads with more than three mismatches compared to the mitochondrial DNA reference sequence; the revised Cambridge Reference Sequence (rCRS) [31]. Duplicate reads were identified and removed using the Mark Duplicates function in the Picard tools suite. To avoid including reads sequenced from regions of the nuclear genome sharing high similarity with mitochondrial DNA (NUMTs), reads that did not map uniquely to the mitochondrial genome were discarded using SAMtools [32, 33]. SAMtools was also used to remove reads not mapping in a proper mate-pair to the mitochondrial genome using the command “samtools view –f 3 [bam file] chrM:1-16,569”. Strand-specific sequence alignment files were used to generate corresponding forward and reverse-strand pileup files with the SAMtools “mpileup” function. While generating pileup files, reads and bases with a PHRED quality scores less than 30 (0.001 chance of sequencing error) were discarded from the dataset using the “–q 30” and “–Q 30” flags. Strand-specific pileup files were then used to identify mtDNA consensus sequences and heteroplasmies as described below.

Consensus sequence identification

In order to measure associations with mtDNA SNPs and haplogroups, mtDNA consensus sequences were identified for each individual. From the pileup files, the major allele at each mtDNA base-pair position was called as the allele in the consensus sequence. Positions which did not have at least 10× sequencing coverage on each strand were not included.

Haplogroup classification

Individuals were classified into mtDNA haplogroups using the online Haplogrep tool [34] which infers mtDNA haplogroups based on the Phylotree database of observed human mtDNA variation [35]. A custom python script was used to convert the consensus sequences into the HSD input file format used by Haplogrep. All positions where the consensus allele for an individual did not match the revised rCRS were included in the HSD file [31]. In cases where the set of mtDNA mutations for an individual matched multiple haplogroups, the individual was assigned to the haplogroup with the highest overall Haplogrep ranking.

Haplogroup associations

Individual clinical phenotypes were measured using regression models in PLINK [36]. Only the five haplogroups that reached a minimum 5 % frequency threshold within our study population were tested for associations. In the PLINK regression analysis, each haplogroup was represented as a separate binary covariate where a value of one indicated that the given individual belonged to that haplogroup and a value of zero indicated they did not. For phenotypes that had more than two values, a linear regression was applied, while a logistic regression was applied to those with only two. A dummy SNP was included in order to perform the PLINK analysis, but had no effect on the significance values of each haplogroup-phenotype association. Multiple-test correction was applied to results using the Benjamini-Hochberg approach to calculate q values and significance of associations was determined using a 5 % FDR [37]. P values were sorted in ascending order and each divided by its percentile rank to obtain an estimated FDR.

Single-Marker analysis

Single-marker association testing was used to measure the association between SNPs and ME/CFS case/control status, as well as all symptoms surveyed in the SF-36 and DSQ as well as symptom clusters. Association testing was performed using the PLINK whole genome analysis [36]. Allele information at each position was obtained from the consensus sequence described previously. Before testing, mtDNA SNPs were filtered based on missingness, if data were missing from more than 10 % of individuals (the “—geno 0.1”), and allele frequency, excluding SNPs with frequency less than 5 % were (“—maf 0.05”). The sex, collection site, and age of each individual were included as covariates in the analyses. Sex was included using the “—sex” flag. In a separate covariate file, each collection site was assigned a binary variable indicating whether an individual’s information had been collected at that site. Age was included as a single variable in the covariate file. A logistic regression model (–logistic) was used to measure associations between mtDNA SNPs and the binary case–control status, while a linear model (–linear) was used to measure associations with quantitative SF-36 and DSQ symptoms. False Discovery Rate (FDR) was used, following the method of Benjamini and Hochberg [37] to correct for multiple testing by using the “—adjust” flag.

Heteroplasmy calling

Mitochondrial heteroplasmies can be present at very low frequencies where they are difficult to distinguish from sequencing error. A multi-step approach, based on previous studies with experimental and simulated data, was used to improve the chance of identifying true heteroplasmies. After read-mapping and quality control steps were applied to the data, as described in the “Read Mapping” section, data in the strand-specific pileup files was used to detect heteroplasmies. Minor alleles had to be present at a frequency of 1 % on both sequencing strands, and only mtDNA positions with at least 500× coverage on each strand were included. After using SAMtools to generate strand-specific pileup files, a custom Python script was used to collect and parse coverage and allele data. Double-strand validation was also used to check that the heteroplasmic alleles were present on both DNA strands at a frequency of at least 1 % [20, 38].