Abstract Increasing clinical and biochemical evidence implicate mitochondrial dysfunction in the pathophysiology of Autism Spectrum Disorder (ASD), but little is known about the biological basis for this connection. A possible cause of ASD is the genetic variation in the mitochondrial DNA (mtDNA) sequence, which has yet to be thoroughly investigated in large genomic studies of ASD. Here we evaluated mtDNA variation, including the mixture of different mtDNA molecules in the same individual (i.e., heteroplasmy), using whole-exome sequencing data from mother-proband-sibling trios from simplex families (n = 903) where only one child is affected by ASD. We found that heteroplasmic mutations in autistic probands were enriched at non-polymorphic mtDNA sites (P = 0.0015), which were more likely to confer deleterious effects than heteroplasmies at polymorphic mtDNA sites. Accordingly, we observed a ~1.5-fold enrichment of nonsynonymous mutations (P = 0.0028) as well as a ~2.2-fold enrichment of predicted pathogenic mutations (P = 0.0016) in autistic probands compared to their non-autistic siblings. Both nonsynonymous and predicted pathogenic mutations private to probands conferred increased risk of ASD (Odds Ratio, OR[95% CI] = 1.87[1.14–3.11] and 2.55[1.26–5.51], respectively), and their influence on ASD was most pronounced in families with probands showing diminished IQ and/or impaired social behavior compared to their non-autistic siblings. We also showed that the genetic transmission pattern of mtDNA heteroplasmies with high pathogenic potential differed between mother-autistic proband pairs and mother-sibling pairs, implicating developmental and possibly in utero contributions. Taken together, our genetic findings substantiate pathogenic mtDNA mutations as a potential cause for ASD and synergize with recent work calling attention to their unique metabolic phenotypes for diagnosis and treatment of children with ASD.

Author Summary Mitochondria contain their own genome, the mitochondrial DNA (mtDNA), and are abundant in the brain where they produce energy and intracellular signals required for normal brain function and cognition. Mitochondrial dysfunction has been proposed as a cause of Autism Spectrum Disorder (ASD), but the genetic basis has not been established. By analyzing mtDNA sequences from 903 ASD children along with their unaffected siblings and mothers, we found a unique pattern of heteroplasmic mtDNA mutations (the coexistence of both mutant and wide-type mtDNA molecules in a cell) associated with increased risk of ASD. A large fraction of these mtDNA heteroplasmies are pathogenic and can be either inherited maternally or acquired during development. Our findings unveil an important issue about pathogenic mtDNA heteroplasmy transmission between mothers and children, which could underlie many neurodevelopmental disorders in children that are shown to involve mitochondrial dysfunction. Evaluating mtDNA heteroplasmies in high-risk families could help diagnosis and treatment of these diseases.

Citation: Wang Y, Picard M, Gu Z (2016) Genetic Evidence for Elevated Pathogenicity of Mitochondrial DNA Heteroplasmy in Autism Spectrum Disorder. PLoS Genet 12(10): e1006391. https://doi.org/10.1371/journal.pgen.1006391 Editor: Santhosh Girirajan, Pennsylvania State University, UNITED STATES Received: June 12, 2016; Accepted: September 28, 2016; Published: October 28, 2016 Copyright: © 2016 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All relevant data are within the paper and its Supporting Information files. Funding: This work was supported by various funds from Cornell University, NSF MCB-1243588, NIH 1R01AI085286, and a grant for ENN science and technology development to ZG. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Autism Spectrum Disorder (ASD) refers to a broad collection of complex, neurodevelopmental disorders, characterized by impairment of communicative and social interactions [1]. ASD usually manifests at an early stage of development in pre-pubertal children [2]. The current prevalence of ASD in the United States according to the 2014 annual report from The Centers for Disease Control and Prevention (CDC) is 1 in 68 children (1.47%), with a skewed gender ratio of 4 affected boys to 1 affected girl [3]. Population-based studies suggest an even higher prevalence of 2.24% in the United States [4]. Treatment for ASD mostly relies on behavioral interventions with specialized training to lessen social, verbal and cognitive deficits. However, the effectiveness varies widely among children with ASD and mild forms of symptoms such as attention deficit and social difficulties may persist throughout life [5]. Although ASD is traditionally described as a developmental disorder of the central nervous system, emerging evidence suggests that systemic physiological abnormalities, including dysregulated inflammation and immune system [6,7], elevated oxidative stress [8], and mitochondrial dysfunction [9–12], are present in peripheral tissues as well as in brains of autistic patients [13]. Epidemiological studies also identified a staggeringly high comorbidity between ASD and mitochondrial disorder (MD) [14]. MD is a heterogeneous group of diseases due to maternally inherited or sporadic defects in genes encoding the mitochondrial oxidative phosphorylation (OXPHOS) system [15], the core functional component of mitochondria responsible for generating energy. MD usually results from genetic mutations on nuclear DNA or mitochondrial DNA (mtDNA) genes [15]. It has an extremely low birth prevalence of 6.2 in 100,000 children [16] and only about 1 in 4,300 adults is affected or at risk of developing MD [17]. However, the incidence of MD among autistic patients is estimated up to 5% by some studies [14], over 200-fold the incidence of MD in general populations. In addition, expression of OXPHOS genes has been shown to decrease in brains of autistic patients [18], suggesting a biological overlap in the pathogenesis of MD and ASD. Moreover, decreased activity of the five OXPHOS complexes has been observed in leukocytes, buccal cells, muscle biopsies and brains of autistic patients [11,14,19], suggesting the increased MD incidence among autistic patients is due not to a specific protein defect but to defects across components of the OXPHOS system. Notwithstanding ever-growing evidence implicating mitochondrial dysfunction in the pathophysiology of ASD, little is known about the biological basis of this connection [19]. Polymorphisms in genes in mitochondria-related pathways, such as PARK2 [20] and SLC25A12 [21], were previously found to be associated with ASD in genome-wide or candidate gene association studies, suggesting a genetic basis. However, in recent comprehensive genetic studies among over two thousand affected families, few of the identified risk loci for ASD harbor genes showing a direct link with mitochondrial function [22–25]. The results from mtDNA studies are also mixed. A number of studies demonstrated elevation of mtDNA deletions [10,26], point mutations [9,26,27], or copy numbers [10,28] in autistic patients. Interestingly, increased mtDNA copy number can occur secondarily to mitochondrial defects [29], further indicating the presence of genetic and/or functional defects in ASD. But few of these findings have been confirmed in large populations [30]. Because there are thousands of mtDNA molecules in a single cell, new mutations can arise and coexist with the wide-type mtDNA in a state called heteroplasmy, which has been implicated in a wide range of diseases [31,32]. To evaluate mtDNA heteroplasmy among healthy individuals, next-generation sequencing technologies such as mtDNA-targeted sequencing [33–36] or computational approaches, which directly leverage (off-target) reads from existing datasets produced by whole-genome sequencing [37,38] or whole-exome sequencing [39,40] have been widely harnessed in recent years. These approaches largely outperform Sanger sequencing and microarray-based sequencing in their ability to detect mtDNA heteroplasmies, and have proven practical for large population-based studies [37–39]. In the current study, we investigated the connection between mtDNA variation, especially heteroplasmy, and ASD by leveraging off-target reads from exome-sequencing data sets among simplex families from the Simons Foundation Autism Research Initiative (SFARI) Simons Collection deposited at the National Autism Research Database [23]. Each simplex family is composed of one autistic proband, his or her non-autistic sibling and their parents. Because the proband and the sibling have the same mtDNA background, share family environment, and are of comparable age, our study avoids genetic and environmental confounding factors that have limited the interpretation of studies among unrelated case-control pairs or child-parent pairs [30]. Given that the mtDNA is maternally inherited, we analyzed data for mother-proband-sibling trios from 903 simplex families and utilized stringent criteria to determine mtDNA homoplasmies and heteroplasmies. We then compared mtDNA variation between autistic probands and non-autistic siblings, along with the transmission pattern of mtDNA mutations between the mother and the child. We showed that autistic children have overrepresented pathogenic mutations on mtDNA distinct from their unaffected siblings. Our results offer an important angle to explain mitochondrial dysfunction in ASD patients, pointing towards the accumulation of mtDNA mutations of high pathogenic potential during development as one possible mechanism underlying the metabolic pathophysiology of ASD.

Discussion By employing a set of stringent criteria in calling heteroplasmies, we found a prevalence that 21.1% of individuals carry at least one medium-to-high fraction heteroplasmy with MAF of at least 5%. In order to draw meaningful comparisons with previous studies, we re-calculated the prevalence of heteroplasmy by applying the same MAF detection threshold at 5% in studies using blood specimens. In Rebolledo-Jaramillo et al.’s study, the prevalence of heteroplasmy is 30.8% among 39 children and 41.0% among their mothers [36]. Li et al. also identified a 41.0% prevalence of heteroplasmy in the blood specimens from 139 individuals with an average age of 58 [34]. Both studies employed mtDNA-targeted deep sequencing to sequence mtDNA to an extremely high coverage, yielding sufficient power for detecting heteroplasmies with MAF down to 1%. Using a similar design to our study, Ding et al. attained a 33.2% prevalence of heteroplasmy among 2,120 individuals with an average age of 47 years by analyzing the whole-genome sequencing data of SardiNIA Project participants [38]. A 180X depth of average coverage was obtained on mtDNA. Ding et al. also noticed a considerable elevation of heteroplasmy incidence among elderly Sardinians [38]. Hence, the prevalence of heteroplasmy among children and young adults could be lower than 33.2%, which was close to our estimate of heteroplasmy prevalence at 21.1%. A 20–30% prevalence of medium-to-high fraction heteroplasmies in blood was also verified by Sanger-based sequencing, showing that both the prevalence and frequency spectrum of mtDNA heteroplasmies were largely comparable to results generated by massively parallel sequencing [50]. The average depth of mtDNA coverage was significantly lower in mothers (132X [interquartile range, IQR:96X-150X], t-test, P<0.0003) than that in siblings (142X [IQR:103X-162X]) and probands (148X [IQR:105X-166X]). This effect could be related to age-dependent decrease of mtDNA copy number in blood [38], and could theoretically have influenced some results in the study. However, the difference of mtDNA coverage between probands and siblings in this dataset was minor and was not statistically significant (t-test, P = 0.074). To ensure such differences in mtDNA sequencing coverage did not confound our results, we performed a proof-of-concept analysis by down-sampling the complete dataset to harmonize sequencing coverage in the mother-proband-sibling trio of each family. In brief, we down-sampled reads of individuals to the lowest depth sequenced in the mother-proband-sibling trio at each mtDNA site. This procedure ensured that individuals from the same family had equal sequencing coverage at every mtDNA site, thus eliminating the possible influence of distinct sequencing coverage on heteroplasmy calling between different groups. Results from this analysis showed that a similar number of heteroplasmies were detected from down-sampled reads compared to total reads in all groups (S1 Text), indicating that lowering read counts in some individuals did not decrease power to identify heteroplasmy differences among family members. The difference between the complete and the down-sampled datasets among mothers, siblings, and probands was also comparable to each other (Fisher’s exact test, P>0.34 for pair-wise comparisons, S1 Text). Moreover, the enrichment of nonsynonymous and predicated pathogenic mutations in probands remained consistent in independent down-sampling analyses, close to the enrichment level that we computed based on total reads (S9 Fig). Down-sampling reads in each family also did not qualitatively alter the transmission pattern of heteroplasmies between mothers and children (S1 Text). Overall, this test indicated that sequencing coverage differences between mothers and children did not significantly bias heteroplasmy calling and comparisons in the current study. An extremely high transition-to-transversion ratio was among de novo mutations in children: only 4 out of the 210 de novo heteroplasmic mutations were transversion mutations (three detected in probands and one detected in siblings). Such a mutation pattern is indicative of mtDNA replication errors and thus early developmental effects, rather than mtDNA damage caused by elevated oxidative stress associated with ASD [51]. Cell and animal studies [52,53] as well as mitochondrial pathophysiology [31] studies have demonstrated that genetic defects in OXPHOS genes culminate in elevated oxidative stress, inflammation, and metabolic abnormalities. However, the impact of oxidative stress on mtDNA may prevail among low-fraction heteroplasmies and mtDNA deletions. Further studies using mtDNA-targeted deep sequencing will enable us to systematically evaluate mtDNA mutation patterns in autistic patients. We found that autistic probands carried overrepresented heteroplasmic mutations at non-polymorphic mtDNA sites, resulting in a ~1.5-fold enrichment of nonsynonymous mutations (P = 0.0028) as well as a ~2.2-fold enrichment of predicted pathogenic mutations (P = 0.0016) compared to those in non-autistic siblings (Fig 2C). The enrichment of predicted pathogenic mutations in autistic probands was not affected by the pathogenicity scores used and was further confirmed by the analysis of disease-associated mutations (S4 Fig). These results point to significant elevation of pathogenicity on mtDNA for autistic patients and thus provide a molecular basis for mitochondria-related pathophysiological phenotypes, as well as the high prevalence of MD among autistic patients [14,19]. In addition, we observed an overrepresented transmission of mtDNA mutations of high pathogenic potential from the mother to the autistic proband (Fig 3 and S2 Table). Due to the complete maternal inheritance of mtDNA [46,54], purifying selection in the maternal line may retain pathogenic mutations that possess a high mitochondrial threshold in females but a low mitochondrial threshold in males [55]. A recent study showed a male-biased expression profile for mitochondria-related genes during early childhood and puberty in the human brain [56]. Accordingly, pathogenic mtDNA mutations may adversely impact brain function when transmitted from the mother to the male proband, possibly contributing to the observed sexual dimorphism whereby males are more susceptible to neurodevelopmental disorders [57]. Of interest, such a transmission pattern of ASD-associated variants is also evidenced by the maternal transmission disequilibrium of nuclear LGD SNVs and rare CNVs in male autistic probands, but is inconclusive in female probands [24]. We assume that there can be a similar sex-dependent effect of pathogenic mtDNA heteroplasmies on ASD. Of the 903 families, there were 815 and 88 families with male and female probands, respectively, for a ratio of male:female of 9.3:1. A significant difference in transmission pattern was identified for nonsynonymous and predicted pathogenic mtDNA heteroplasmies in male probands compared to that in their siblings (ratio>1.3, one-tailed paired t-test, P<0.02, S2 Table), but no difference was present in families with female probands (ratio = 1, S2 Table). In contrast, comparable transmission patterns of mtDNA heteroplasmies to autistic probands were found when we stratified families according to the sibling sex. Given that the sample size of female probands was rather small in the current dataset, additional data on more ASD families with female probands would be required to confirm this finding and potentially reveal other differences in mtDNA heteroplasmy between families with male probands and families with female probands. The majority (>65%) of mtDNA mutations detected in the current study had derived allele fraction in the range of 5% to 20%, below the clinical threshold for most disease-causing heteroplasmies [42]. It is still elusive how low-to-medium fraction heteroplasmies affect mitochondrial function. Picard et al. found that cells harboring the mtDNA m.3243A>G heteroplasmic mutation undergo a biphasic phenotypic change in response to the rise of heteroplasmy level; a medium heteroplasmy level (10%-30%) is presumed to cause mild OXPHOS deficiency in Diabetes [58] and Autism [59], while a high heteroplasmy level (50%-90%) gives rise to diverse, severe forms of neurodegenerative diseases such as mitochondrial encephalomyopathies [60]. Hence, individuals carrying a high degree of mtDNA heteroplasmies are prone to manifest systematic symptoms of MD rather than behavioral signs of ASD. Indeed, it is estimated that 10%-20% of MD patients have some autistic endophenotypes, further suggesting a possible, shared etiology for these two diseases [19]. Some neurological and developmental pathologies of MD [61] are present in ASD patients, including seizure, eating disorders, impaired motor ability and developmental delay [45,62,63]. Among the 33 autistic children detected with predicted pathogenic and/or disease-associated private mtDNA mutations in the current study, whose medical records were also available at SFARI phenotype database, six were reported with febrile or non-febrile seizures and two were diagnosed with epilepsy (S3 Table). One autistic child was found to have anorexia nervosa and the other two were reported with movement abnormalities (S3 Table). Early signs of metabolic or developmental defects related to MD [64] in these 33 autistic children included neonatal hypotonia (n = 3) and hypoglycemia (n = 3) (S3 Table), both of which were identified as risk factors for ASD [65,66]. Medical records also revealed a significantly higher incidence of mental retardation among the 33 autistic children than that in the rest of the database (Fisher’s exact test, P = 0.048, S3 Table), in line with our findings on the negative association between pathogenic mtDNA mutations and IQ. The overall prevalence of aforementioned abnormalities was 1.2 times higher in these 33 autistic children than other autistic children included in this study as well as in the rest of the SFARI database (36.4% vs. 16.7%, Fisher’s exact test, P<0.008, S3 Table). Furthermore, a decrease in height was found among these autistic children compared to other autistic children in the SFARI database (Mann-Whitney test, P = 0.026, S3 Table). Shorter stature was previously proposed as one of the major symptoms of MD [61,67] and was discovered in patients with mtDNA(m.3243A>G) heteroplasmy-caused diabetes (i.e. maternally inherited diabetes and deafness, MIDD) compared to non-MIDD diabetic patients such as patients diagnosed with type 1 or type 2 diabetes [68], suggesting a common underlying pathophysiology of mtDNA origin. Taken together, these evidence indicate that pathogenic mtDNA mutations may be responsible for a wide range of developmental abnormalities in children which confer susceptibility to ASD. It also calls attention to ASD children with one or more developmental abnormalities or related co-morbid clinical conditions for further testing on mtDNA and mitochondrial defects. In sum, our findings support the connection between mtDNA variation and ASD, and call on further mtDNA-targeted deep sequencing and functional studies in more ASD families, which are of great importance to understand mtDNA mutation patterns and their contribution to ASD-related phenotypes. Indeed, mitochondrial dysfunction has been implicated in many childhood disorders, especially neurodevelopmental disorders [69]. These diseases manifest metabolic and physiological phenotypes that converge upon mitochondrial dysfunction, and may have mtDNA defects as a common harbinger. Further understanding the genetic architecture of mitochondrial genome in these phenotypes will provide crucial insights into pathogenesis, diagnosis, and treatment of these diseases.

Materials and Methods Study samples and sequencing data We downloaded the whole-exome sequencing data for 1,905 simplex families from the National Database for Autism Research (NDAR) under the study DOI:10.15154/1149697 (last accessed Nov. 2015). The raw sequencing data were generated by three genome centers including CSHL (n = 933), YALE (n = 599) and UW (n = 373). Only reads generated by CSHL had sufficient coverage on mtDNA (S1 Fig). We obtained similar results using alignment files generated from the same exome sequencing data set by Krumm et al. [24]. A possible explanation for such discrepancy of mtDNA coverage across different genome centers might lie in the different exome capturing protocol and sequencing read length used by each center [23,40]. Hence, we analyzed the 933 families sequenced at CSHL. We retrieved pertinent sample phenotypes including sex, ethnicity, age, full-scale intelligence quotient scores, Social Responsiveness Scale scores, and medical records from the supplementary data of previous publications on the SFARI Simons Collection [23,70], the sample information sheet available at NDAR, and/or the SFARI phenotype database (version 15). Sequencing data processing We first extracted sequencing read pairs mapped to mtDNA from the downloaded alignment files. In order to filter out reads resulting from nuclear mitochondrial sequences (NUMTs), extracted reads were realigned to the complete human reference genome consisting of both human nuclear DNA and mtDNA sequences by using BWA (version 0.7.2) [71]. The reference sequences of the GRCh38 human genome assembly were obtained from the 1000Genomes project ftp site [41]. Reads mapped to the nuclear DNA were removed. Next, we performed a series of data processing steps to ensure the quality of remaining reads for calling mtDNA homoplasmies and heteroplasmies, including (1) local realignment around indels; (2) base quality recalibration; (3) detection and exclusion of duplicated reads, reads shorter than 40 bp, reads with ≥5% mismatches, as well as reads that were not aligned in a pair on mtDNA. The remaining reads were then piled up against the revised Cambridge Reference Sequence (rCRS) by using “samtool mpileup” with a mapping quality filter at 20 [72]. The list of tools and specific commands employed in each step was detailed in S10 Fig. A comparison of methods for heteroplasmy calling with previous studies was given in S4 Table. Point homoplasmy and heteroplasmy calling To reduce the false positive rate, we only considered reads with a recalibrated base quality Phred score ≥20 (tantamount to ≤0.01 sequencing error) for calling point homoplasmies and heteroplasmies. For homoplasmy, the alternative allele according to rCRS must be supported by at least 10 reads and not be in the heteroplasmic state. Criteria for calling heteroplasmies encompassed (1) >40X depth of coverage, (2) the alternative allele being observed on both strands, and (3) comparable allele fractions calculated from reads mapped to two strands (Fisher’s exact test, P≥0.01). We also required that an mtDNA site was valid for calling heteroplasmies if its median site coverage was >70X depth in the study population. Based on one-tailed power calculation for one sample proportion test, it yielded >90% power of detecting real heteroplasmies with minor allele fraction (MAF) ≥ 5% over 1% sequencing error (S11 Fig). As a result, we obtained a total of 12,180 candidate heteroplasmies from 13,704 valid mtDNA sites (83% of total mtDNA sites, S5 Table and S12 Fig). Maximum likelihood based approach for estimating heteroplasmy fraction Next, we utilized a maximum likelihood based approach to estimate MAF for each heteroplasmy by taking into account the base quality information of covered reads [37,40]. In brief, the likelihood of observing n reads with the minor allele and m reads with the major allele at a particular site was computed as: , where f refers to the MAF of interest and ε i is the corresponding sequencing error for mapped read i. We then took the maximum-likelihood estimate of f and calculated the log likelihood ratio between the heteroplasmic model and the homoplasmic model (LLR = log(L( )/L( ))). LLR score >5 was used as a criterion to define heteroplasmies of high confidence, equivalent to a false positive rate <10−5 [40]. Of heteroplasmies detected with MAF ≥5% at valid sites, over 98% had a LLR score over 5 (S13 Fig), which was consistent with our power calculation. In addition to insufficient sequencing depth, heteroplasmies with a low LLR score can also result from sequencing error or complications from nuclear mitochondrial sequences (NUMTs). Both give rise to a much lower transition-to-transversion (Ti/Tv) ratio compared to real heteroplasmies on mtDNA, which predominantly arise from mutations caused by endogenous, mtDNA polymerase replication errors [51]. When filtered at MAF ≥5%, high-confidence heteroplasmies yielded a Ti/Tv ratio of 29.5, in line with the hypothesized mutation pattern of mtDNA. High-confidence heteroplasmies could be recovered with the LLR score filter >5 down to MAF = 2% (greater than the base quality error threshold 1% used), which had a Ti/Tv ratio comparable to that of heteroplasmies with MAF ≥5% (22.8 vs. 29.5; chi-square test P = 0.5; S6 Table). But the majority of low-fraction heteroplasmies (MAF<5%) were probably missing due to insufficient sequencing depth. Sample quality control We first removed 19 samples with average mtDNA coverage ≤40X depth. To minimize DNA contamination problems, we further removed samples (n = 19) if they had (1) >5 heteroplasmies located at polymorphic mtDNA sites in this population and/or (2) >50% of heteroplasmies detectable as homoplasmies in another sample, which also accounted for >50% of homoplasmic sites in that particular sample distinct from those in the sample under investigation. Contamination detection was based on high-confidence heteroplasmies with MAF ≥2%. 903 (97.1%) families with sequencing data from the proband, the sibling and the mother remained and were used for further analysis. The exome sequencing data from the father were processed using the same method but were only utilized for exploring data quality and detecting sample contamination. The average depth of mtDNA coverage in the remaining samples was ~141X, comparable to the coverage reported in previous studies using whole-exome or whole-genome sequencing data [38,39,73]. Comparisons of site-specific sequencing coverage for homoplasmic and heteroplasmic sites also showed strong correlations between individuals within the same family (R2>0.6, S14 Fig). After sample quality control, we obtained 677 high-confidence heteroplasmies with MAF ≥5% in 903 families (S1 Dataset). We checked the mtDNA site for each of the 677 heteroplasmies within the family where it was found. 649 of them (95.9%) were located at mtDNA sites (n = 447) where all members from a family had >40X sequencing depth and thus were used for comparing incidence and studying transmission pattern of heteroplasmies (S1 Dataset). We further defined the major allele in a family, the allele with the highest fraction in the mother-proband-sibling trio, as the reference allele for each site on the consensus sequence, and considered a heteroplasmy inherited if the reference allele and the derived allele in the child were detected in the mother with fraction ≥2%. Otherwise we considered it de novo in the child. Annotation of mtDNA variants mtDNA variants were annotated by using the ANNOVAR pipeline [74]. Pathogenicity predictions for all possible mtDNA nucleotide substitutions according to the revised Cambridge Reference Sequence (rCRS) were retrieved from the CADD database (version 1.3). The CADD score is a composite matric for deleteriousness of human mutations obtained through agglomerating annotation information from 63 distinct sources, which outperforms most popular pathogenicity predictors such as PolyPhen-2 and SIFT [75]. We followed the recommended cutoff for the scaled CADD score at 15 to define pathogenic mutations and used a more stringent cutoff at 20 for sensitivity testing. Alternative pathogenicity scores were also used for verification, including PolyPhen-2 [76], Mutation Assessor [77] and MutPred [78]. PolyPhen-2 scores and Mutation Assessor scores were extracted from the MitImpact2 database (version 2.4) [79], and MutPred scores were retrieved from a previous study [43]. All of them generated pathogenicity scores comparable to CADD scores (S15 Fig). A list of disease-associated, nonsynonymous and RNA mutations was compiled according to the MITOMAP website [80] and the ClinVar database [81] (last accessed Jan. 2016). Mutations with ambiguous annotations such as “unclear”, “conflicting reports” or “further studies needed”, or listed as “synergistic” or “secondary” on the MITOMAP website were not included. Both position and mutant allele information were used for defining mutations. In addition, we constructed a list of mutations which were predicted pathogenic by at least two of the five pathogenicity predictors: (1) CADD Phred score >15; (2) PolyPhen-2 possible and probable damaging; (3) Mutation Assessor medium and high impact; (4) MutPred score >0.6; (5) associated with disease. We relied on this list of mutations to replicate results from analyses using only CADD scores. Related results were denoted as “combined prediction” in S4 Fig, S5 Fig and S9 Fig. mtDNA haplogroups The mtDNA haplogroup information was assessed by using HaploGrep2 [82]. The resulting distribution of macro-haplogroups among the 903 families was representative of that among the U.S. population [83] (S7 Table). The concordance rate between the mtDNA-inferred ancestral lineage and self-reported maternal ethnicity or race was >96.8% in non-admixed populations including non-Hispanic white (n = 654) and Asian (n = 49) (S7 Table), in close agreement with the concordance rate recently reported for the NHANES study [83] and among 588 mtDNA-sequenced individuals [84]. Statistical testing Incidence of mtDNA mutations between probands and siblings was compared at a family level by using paired t-test. Differences in transmission pattern in mother-autistic proband pairs and mother-sibling pairs were assessed by using Fisher’s exact test. Other statistical methods, such as correlation test, ANOVA, linear regression and logistic regression, were also used as indicated in the text. Statistical testing in this study was carried out by using R (version 3.2.2) and python (version 2.7.5)/scipy (version 0.16.1).

Acknowledgments We thank Yiping Wang, Ruoyu Zhang and Dr. Kaixiong Ye for their discussions for this project, and Drs. Lars Steinmetz and Haiyuan Yu for their comments on the manuscript. We are very grateful to the three reviewers and the editor for their constructive comments. We also appreciate being able to obtain access to the exome sequencing data from the National Database for Autism Research (NDAR) and to the phenotypic data from the Simons Foundation Autism Research Initiative (SFARI) database.

Author Contributions Conceptualization: YW ZG. Data curation: YW MP ZG. Formal analysis: YW MP ZG. Funding acquisition: ZG. Investigation: YW MP ZG. Methodology: YW ZG. Project administration: ZG. Resources: YW MP ZG. Software: YW ZG. Supervision: ZG. Validation: YW MP ZG. Visualization: YW MP ZG. Writing – original draft: YW MP ZG. Writing – review & editing: YW MP ZG.