Participant recruitment, phenotyping, and DNA sequencing

All participants were recruited to SPARK under a centralized IRB protocol (Western IRB Protocol #20151664). All participants provided written informed consent to take part in the study. Written informed consent was obtained from all legal guardians or parents for all participants age 18 and younger and all participants age 18 and older who have a legal guardian. Assent was also obtained from dependent participants age 10 and older. Participants are asked to fill out questionnaires online as described here: https://www.sfari.org/spark-phenotypic-measures/. Families are classified as multiplex if the initial individual with ASD registered in the study has a first-degree family member with ASD, as indicated either by enrollment or survey report.

Essential phenotypic information was curated across language and motor development, co-morbidities, and Repetitive Behavior Scale-Revised,58 Social Communication Questionnaire-Lifetime59 and Developmental Coordination Disorder Questionnaire score60 (Table 2). In SSC, all phenotype details were determined through clinic evaluation and interview; specifically, language delay was defined by Autism Diagnostic Observation Schedule module (1–4) per age,61 and regression was determined from the Autism Diagnostic Interview-Revised.62 For SPARK, all variables were taken from parent report. It was noted that rates of language disorder and psychiatric co-morbidities are lower in SSC likely due to DSM-IV diagnostic practice at the time.

Saliva was collected using the OGD-500 kit (DNA Genotek) and DNA was extracted in a CLIA-certified laboratory at the Baylor Miraca Genetics Laboratories (Houston, TX) or PreventionGenetics (Marshfield, WI). Exome capture was performed using VCRome and the spike-in probe set PKv2 at the Baylor College of Medicine Human Genome Sequencing Center (Houston, TX). Captured exome libraries were sequenced using the Illumina HiSeq platform in 100 bp paired end reads. Samples were sequenced to a minimum standard of >85% of target covered at 20×, and on average, 96% of the target was sequenced to 20×. The Illumina HumanCoreExome (550K SNP sites) array was used for genotyping.

Read alignment and QC

Postsequencing reads were aligned to build 37 of the human genome using bwa version 0.6.2-r126,63 duplicates were marked using Picard version 1.93 MarkDuplicates, and indels were realigned using GATK64 version 2.5-2-gf57256b IndelRealigner. Quality checks were performed on the BAM files using SAMTools65 version 1.3.1 flagstat and Picard version 2.5.0 CalculateHsMetrics. Overall, 98 ± 1.8% of the reads mapped to the genome, 96 ± 2.3% of the reads were properly paired reads, and 87 ± 15% of targeted regions had ≥10× coverage.

KING66 was used for relatedness inference based on the genotype of exome SNPs (MAF >0.01). Estimated kinship coefficient and number of SNPs with zero shared alleles (IBS0) between a pair of individuals were plotted. Parent–offspring, sibling pairs, and unrelated pairs can be distinguished as separate clusters on the scatterplot (Supplementary Fig 1). One outlier parent–offspring pair (SP0002452 and mother) showed higher than expected IBS0 and was caused by parental chr6 iso-UPD. Pairwise scatterplots of heterozygotes to homozygotes (het/hom) ratio of chromosome X, sequencing depth of chromosome X and Y normalized by the mean depth of autosomes were used for sex check. Two samples with sex chromosome aneuploidy were identified as outliers in the scatterplot (Supplementary Fig. 2).

Variant calling

De novo SNV/indel detection

De novo sequence variants were called by three groups—University of Washington (UW), Simons Foundation (SF), Columbia University Medical Center (CUMC)—according to the methods below.

UW

Variants were called from whole exome sequence (WES) using FreeBayes67 and GATK.64 FreeBayes version v1.1.0-3-g961e5f3 was used with the following parameters: –use-best-n-alleles 4 -C 2 -m 20 -q 20; and GATK version 3.7 HaplotypeCaller was used with the following parameters: -A AlleleBalanceBySample -A DepthPerAlleleBySample -A MappingQualityZeroBySample -A StrandBiasBySample -A Coverage -A FisherStrand -A HaplotypeScore -A MappingQualityRankSumTest -A MappingQualityZero -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -A VariantType. Postcalling bcftools68 version 1.3.1 norm was used with the following parameters -c e -O z -s -m –both. We identified candidate de novo calls based on the intersection of FreeBayes and GATK VCF files and identifying variants present in offspring but not in parents. We required a minimum of ten sequence reads in all members of the parent–offspring trio; an allele balance >0.25 and a PHRED quality >20 for both FreeBayes and GATK variants.

SF

Sequence data were preprocessed using GATK best practices and variant calls were predicted using three variant callers: GATK v3.6,69 FreeBayes v1.1.0-441, and Platypus v0.8.1-0.70 GATK: gVCF files were generated for each sample with GATK HaplotypeCaller (minimum confidence thresholds for calling and emitting was set to 30 and 10, respectively); joint variant calls were performed using GATK GenotypeGVCFs with the recommended default hard filters. For SNPs, we filtered out: QD <2.0 || FS >60.0 || MQ <40.0 || MQRankSum <−12.5 || ReadPosRankSum <−8.0. For indels, we filtered out: QD <2.0 || FS >200.0 || ReadPosRankSum <−20.0. FreeBayes: variants were called with default settings for optimal genotyping of indels in lower-complexity sequence. The final data set included candidate calls with a quality of 5 or greater. Platypus: variant calling was performed with local assembly analysis when at most ten haplotypes were allowed. Variants were filtered out for allele bias (p-value < 0.0001), bad reads (>0.9), sequence complexity (>0.99) and RMSMQ (<20); other filters were applied on estimated haplotype population frequency (FR), total coverage at the locus (TC) and phred-scaled quality of reference allele (QUAL): (FR[0] < = 0.5 and TC < 4 and QUAL < 20),or (TC < 13 and QUAL < 10),or (FR[0] > 0.5 and TC < 4 and QUAL < 50). For each variant caller, a variant was identified as a candidate de novo variant if the variant was called in the proband and it occurred only once in the cohort, with an alternative allele fraction between 0.2 and 0.8. Both parents were required to have the homozygous reference genotype at the de novo locus. Read coverage of the variant locus had to be at least ten reads in each sample in the trio. De novo candidate variants were classified by DNMFilter algorithm71 that was retrained with the SSC data set3,14: 1800 de novo mutations identified by both Iossifov et al.3 and Krumm et al.,14 1104 validated SNVs and indels from both studies and 400 variants that failed validation. We also randomly selected ~3000 negative examples from the pool of all SSC variants that were not confirmed to be de novo. After merging de novo candidate variants from three variant callers, candidate de novos were considered if they occurred only once in the cohort, passed hard filters, and had assigned de novo probability greater than 0.88 for SNVs and greater than 0.0045 for small indels. In the latter case, the total parental alternative allele count <3 reads.

CUMC

Variants were called from aligned sequence data using GATK HaplotyperCaller to generate individual level gVCF files. All samples in the cohort were then jointly genotyped and have variant quality recalibrated by GATK v3.8.64 A variant present in the offspring with homozygous reference genotypes in both parents was considered to be a potential de novo variant. We used a series of filters to identify de novo variants. Briefly, we included variants that passed VQSR filter (tranche ≤ 99.7 for SNVs and ≤99.0 for indels) and had GATK’s Fisher Strand ≤ 25, quality by depth ≥2. We required the candidate de novo variants in probands to have ≥5 reads supporting the alternative allele, ≥20% alternative allele fraction, Phred-scaled genotype likelihood ≥60 (GQ), and population AF ≤0.1% in ExAC; and required both parents to have ≥10 reference reads, <5% alternative allele fraction, and GQ ≥ 30.

De novo SNV/indel consensus call set and annotation

De novo variants were independently called by three centers—UW, SF, CUMC. De novo variants called by all three groups were included in the final list by default. Those called by one or two groups were manually evaluated and included in the final list if consensus was reached among all groups after discussion and manual inspection with IGV plots. Variants were annotated by ANNOVAR72 based on GENCODE Basic v19.73 Candidate variants in the ACMG secondary findings v2 59 gene list74 (except PTEN, TSC1, and TSC2) were excluded. Coding de novo variants—nonsense, missense, or synonymous SNVs, frameshift or nonframeshift indels, and splicing site variants—were annotated. De novo variants were also annotated with snpEff version 4.1g75 (reference GRCh37.75), SFARI Gene scores (version q1, 2018, https://gene.sfari.org/database/gene-scoring/), CADD,10 MPC11 and findings from Deciphering Developmental Disorders project (gene2phenotype).16

Inherited singleton variants

We first performed following filtering on individual genotypes. We required minimal read-depth ≥10 and GQ ≥30, required allelic balance <0.1 for homozygotes reference, >0.9 for homozygotes alternative, and 0.3–0.7 for heterozygotes SNVs (0.25–0.75 for heterozygotes indels). Genotype calls not passing those criteria were set to missing. Then we removed variants having missing genotypes in >25% of founders. We focused analysis on singleton variants in which the alternative allele was only seen in one parent in the data. We calibrated GATK’s VQS LOD score for SNV and indels separately such that synonymous singleton SNVs and nonframeshift singleton indels were transmitted 50% of the time (Supplementary Fig. 14) The resulting VQS LOD score cutoffs are −1.85 for SNVs and −1.51 for indels. As mentioned in the Results section, inherited LGD variants are less likely to cause a complete loss of function to the gene. To prioritize inherited LGD variants, we require the variant to be annotated as HC (high-confidence) by LOFTEE v0.312 using default parameters in >60% of the GENCODE transcripts.

Identification of mosaic mutations

Mosaic SNVs were independently called by two centers—Oregon Health & Science University (OHSU) and CUMC. The OHSU approach was previously published23 and utilized a binomial deviation and logistic regression model to score candidate mosaic variants. The CUMC approach used a novel approach that was based on a beta-binomial deviation and an FDR based approach to determine per site thresholds.

OHSU

SNVs were called as previously described.23 In brief, pileups were generated using SAMtools (v 1.1) with BAQ disabled and mapQ 29 (samtools mpileup –B –q 29 –d 1500) on processed BAMs. Variants were called on individual samples using VarScan 2.3.2, LoFreq 2.1.1 and an in-house mpileup parsing script (mPUP). Additional parameters for Varscan included: –min-var-freq 1 × 10−15 –p-value 0.1. Per sample caller outputs were combined and annotated using ANNOVAR (03/22/15 release) with databases: Refseq genes (obtained 03/2017), segmental duplications (UCSC track genomicSuperDups, obtained 03/25/2015), repetitive regions (UCSC track simpleRepeat and hg19_rmsk, obtained 03/25/2015), Exome Aggregation Consortium (ExAC) release 0.3 (obtained 11/29/2015), Exome Sequencing Project (ESP) 6500 (obtained 12/22/2014), and 1000 Genomes Phase 3 version 5 (obtained 12-16-2014).

Variants were filtered based on the best practices established in Krupp et al.:23 (1) variant must be exonic or disrupt a canonical splice site, (2) have a population frequency of ≤0.5%, (3) have at least five alternative reads, (4) not be in a known segmental duplication or repetitive regions (SDTRF), (5) called by at least two variant callers, (6) SPARK cohort count ≤1 and SSC cohort count ≤2, (7) variant read mismatch ≤3, and (8) allele fraction upper 90% confidence interval ≤0.05. For a variant to be considered de novo, parental alternative allele count must be ≤4 reads. De novo variants were considered to be candidate mosaic variants if: (1) the probability the allele fraction significantly deviated from heterozygous (PHET) was ≤0.001, (2) the allele fraction upper 90% confidence interval was <0.4, and (3) a logistic regression model score was ≥0.518.

CUMC

SNVs were called on a per-trio basis using SAMtools (v1.3.1-42) and BCFtools (v1.3.1–174). We generated trio VCF files using samtools ‘mpileup’ command with options ‘–q 20 –Q 13’ corresponding to mapQ and baseQ thresholds of 20 and 13 respectively, followed by bcftools ‘call’ with option ‘–p 1.1’ to expand the set of variant positions to be evaluated for mosaicism. In contrast to the OHSU pipeline, BAQ was used to potentially reduce false positive SNV calls caused by misalignments.76 To identify de novo variants from trio VCF files, we selected for sites with (i) a minimum of six reads supporting the alternate allele in the proband and (ii) for parents, a minimum depth of ten reads and 0 alternate allele read support. Variants were then annotated using ANNOVAR (v2017-07-17) to include information from refGene, gnomAD (March 2017), 1000 Genomes (August 2015), ExAC, genomicSuperDups, COSMIC (v70), and dbSNP (v147) databases. CADD,10 MPC11 were used to annotate variant functional consequence.

Preprocessing and QC

To reduce the noise introduced by our variant calling approach, we preprocessed our variants using a set of filters. Since our method is allelic depth-dependent, we took a conservative filtering approach to reduce the impact of false positives on model parameter estimation. We first filtered our variant call set for rare heterozygous coding variants (MAF ≤ 1 × 10−4 across all populations represented in gnomAD and ExAC databases). To account for regions in the reference genome that are more challenging to resolve, we removed variant sites found in regions of nonunique mappability (score <1; 300 bp), likely segmental duplication (score >0.95), and known low complexity.77 We then excluded sites located in MUC and HLA genes and imposed a maximum variant read depth threshold of 500. To account for common technical artifacts, we used SAMtools PV4 p-values with a threshold of 1 × 10−3 to exclude sites with evidence of baseQ bias, mapQ bias, and tail distance bias. To account for potential strand bias, we used an in-house script to flag sites that have either (1) 0 alternate allele read support on either the forward or reverse strand or (2) p < 1 × 10−3 and OR < 0.33 or OR > 3 when applying Fisher’s method to compare strand based reference or alternative allele counts. Finally, we excluded sites with frequency >1% in the SPARK pilot, as well as sites belonging to outlier samples (with abnormally high de novo SNV counts, cutoff = 7) and complex variants (defined as sites with neighboring de novo SNVs within 10 bp).

IGV visualization of low allele fraction de novo SNVs

To identify likely false positives among our low allele fraction (VAF <0.3) de novo SNVs, we used Integrative Genomics Viewer (IGV v2.3.97) to visualize the local read pileup at each variant across all members of a given trio. We focused on the allele fraction range 0.0–0.3 since this range captures the majority of the technical artifacts that will negatively impact downstream parameter estimation. Sites were filtered out if (1) there were inconsistent mismatches in the reads supporting the mosaic allele, (2) the site overlapped or was adjacent to an indel, (3) the site had low MAPQ or was not primary alignment, (4) there was evidence of technical bias (strand, read position, tail distance), or (5) the site was mainly supported by soft-clipped reads.

Empirical bayes postzygotic mutation detection model

To distinguish variant sites that show evidence of mosaicism from germline heterozygous sites, we modeled the number of reads supporting the variant allele (N alt ) as a function of the total site depth (N). In the typical case, N alt follows a binomial model with parameters N = site depth and p-value = mean VAF. However, we observed notable overdispersion78,79 in the distribution of variant allele fraction compared with the expectations under this binomial model. To account for this overdispersion, we instead modeled N alt using a beta-binomial distribution. We estimated an overdispersion parameter θ for our model whereby for site depth values N in the range 1–500, we (1) bin variants by identifying all sites with depth N, (2) calculate a maximum-likelihood estimate θ value using N and all N alt values for variants in a given bin, and (3) estimate a global θ value by taking the average of θ values across all bins, weighted by the number of variants in each bin.

We used an expectation-maximization (EM) algorithm to jointly estimate the fraction of mosaics among apparent de novo mutations and the FDR of candidate mosaics. This initial mosaic fraction estimate gives a prior probability of mosaicism independent of sequencing depth or variant caller and allows us to calculate, for each variant in our input set, the posterior odds that a given site is mosaic rather than germline.

Finalized union mosaic call set and validation selection

The high confidence call sets from the two parallel mosaic determination approaches were combined, and all candidate mosaic variants were then inspected manually in IGV. Variants in regions with multiple mismatches or poor mapping quality were removed, and the remaining mosaics comprised the high confidence mosaic call set. For calls that were unique to one approach, the variant was annotated with which quality filter it initially failed. Variants that were flagged as low confidence germline by CUMC approach but mosaic by OHSU approach had posterior odds >1 and were thus retained in the union call set.

CNV detection

De novo and rare inherited CNVs were independently called by two centers—UW and SF. The final CNV list included all autosomal CNVs that were called by both SF and UW pipelines either with reciprocal overlap of at least 50% or when the CNV from one pipeline was completely within the CNV from the other pipeline. In both cases, the overlapping region was reported as the final region and annotated as described below. CNVs called only by one pipeline were considered as high confidence CNVs if they were called by at least two tools or if they were de novo CNVs confirmed by manual inspection of plots on exome data. High confidence CNVs were also included in the final list after discussion and manual inspection of plots on exome data. De novo CNVs were additionally inspected on BAF and LRR plots on genotyping data. CNVs that had at least 75% overlap with known segmental duplications (segDups track for hg19 from UCSC browser) were excluded. All CNVs were annotated with the list of RefSeq HG19 genes, OMIM genes, brain embryonically expressed genes,3 brain critical genes,19 ASD significant,80 and ASD related genes8,14 that have their coding regions overlapping with the CNV. CNVs greater than or equal to 50 kbp in size were annotated with morbidity map81 case and control frequencies using a 50% reciprocal overlap while CNVs < 50 kbp were annotated with their frequency in the 1000 genomes project82 using a 50% reciprocal overlap. We do note that it is possible some events may be missed with this annotation because of different platforms (e.g. exome, array, and genome), but the two analyses provide reasonable insight into the population prevalence of large and smaller CNVs in the general population. In addition, each found gene was annotated with pLI (ExAC release 0.3, http://exac.broadinstitute.org/downloads), ASD,83 RVIS,84 LGD,85 and SFARI Gene scores (version q1, 2018, https://gene.sfari.org/database/gene-scoring/). dnCNVs that affect DUSP22 and olfactory genes were excluded due to high variability in copy number of those regions among individuals.86

UW, detection using XHMM and CoNIFER

CNVs from WES were called using CoNIFER and87 XHMM.88 CoNIFER version v0.2.2 was used with the S value, –svd 7, set as a threshold as suggested by the scree plot. XHMM version statgen-xhmm-3c57d886bc96 was used with the following parameters –minTargetSize 10 –maxTargetSize 10000 –minMeanTargetRD 10 –maxMeanTargetRD 500 –minMeanSampleRD 25 –maxMeanSampleRD 200 –maxSdSampleRD 150 to filter samples and targets, and then to mean-center the targets; PVE_mean –PVE_mean_factor 0.7 was used to normalize mean-centered data using PCA information; –maxSdTargetRD 30 was used to filter and z-score centers (by sample) the PCA normalized data; and then to discover CNVs in all samples. Calls from CoNIFER and XHMM were merged in a VCF file using https://github.com/zeeev/mergeSVcallers with the following parameters -t xhmm,conifer -r 0.5 -s 50000, then merged VCF was sorted by Picard version v2.5.0, and zipped and indexed with Tabix version v0.2.6. We re-genotyped each XHMM and CoNIFER CNV event by assessing the RPKM values from the CoNIFER workflow on an individual. Probands were considered to have a deletion if their average RPKM value was less than −1.5 s.d and have a duplication if their average RPKM value was greater than 1.5 s.d. For an event to be considered as variant in a parent, we required an average ZRPKM less than −1.3 or greater than 1.3 for deletions and duplications, respectively.

UW, CNV validation using SNP microarray

We generated an independent CNV callset for validation purpose using SNP microarray genotyping data generated from Illumina InfiniumCoreExome-24_v1.1, where IDATs (n = 1,421) were processed using Illumina Genome Studio Software. CNV analysis was performed using the Illumina CNVpartition algorithm version v3.2.0. Log R Ratio data for all samples and probes was exported. PennCNV89 version v1.0.4 was used to detect CNVs with the following parameters -test –hmm -pfb all.pfb –gcmodelfile –confidence. We determined the maximum and minimum overlap of SNP microarray CNVs based on the presence of WES probes to make the array calls more similar to the exome calls and considered an event to have support by PennCNV or CNVpartition if there was at least 50% reciprocal overlap. We also generated per probe copy number estimates using CRLMM90,91 version 1.38.0 as previously described14 and genotyped each candidate WES CNV. Deletions were considered variant if they had a p-value less than 0.05 and a mean percentile rank less than 30. Duplications were considered variant if they had a p-value less than 0.05 and a percentile rank of mean greater than 70. CNVs passing the RPKM genotyping were combined with the CNV data from CRLMM, PennCNV, and CNVPartition. We considered WES CNVs as valid if there was support for gain or loss from the PennCNV, CNVpartition, or CRLMM approaches described above. We assessed inheritance using both SNP and WES data and preferentially scored inherited events over de novo CNVs.

SF

CNVs were called with two tools - xHMM v 1.092 and CLAMMS v 1.1.93 xHMM: CNVs were called with default settings (except not filtering on the maximum target size), including filtering low complexity and GC extreme targets. CLAMMS: CNVs were called with INSERT_SIZE = 390 bp and training per-sample-models on sample specific reference panels due to the observed batch effect in the data; CLAMMS calls were filtered for all CNVs with Q_EXACT less than 0, or Q_SOME less than 100, or CNVs that were in samples with more than 70 predicted CNVs of the size at least 10 Kb and of quality score Q_SCORE at least 300. The inheritance status of the autosomal CNVs was determined by default xHMM protocol for de novo CNVs identification with plink 1.0794 and Plink/Seq 0.10 [https://atgu.mgh.harvard.edu/plinkseq/]. Similar protocol was implemented in java for CLAMMS analysis. For each tool, two tiers of CNV calls—the most confident calls (tier 1) and less confident calls (tier 2)—were defined, based on de novo and transmission rates for different cuts on quality scores: SQ (phred-scaled quality of some CNV event in the interval) and NQ (phred-scaled quality of not being diploid, i.e., DEL or DUP event in the interval) in xHMM and Q_SOME (phred-scaled quality of any CNV being in this interval) in CLAMMS. xHMM tier1 included all autosomal CNVs with both SQ and NQ quality scores of at least 60, and tier2—all autosomal CNV calls with quality scores between 30 and 60. Samples with more than 10 de novo CNVs in xHMM tier1 of size at least 10 kb were excluded. CLAMMS tier1 included all predictions with quality score 999, except predictions for 25 probands that have CNVs of size greater than 500 kb with quality score 999 or predictions, which region was partially inherited and partially de novo; tier 2 included those excluded from tier 1 predictions as well as all CNVs with quality score Q_SOME at least 400 and less than 999. Predictions by both methods that had less than 3 exons or at least 75% overlap with known segmental duplications (segDups track for hg19 from UCSC browser) were removed from the list. The final list of CNV predictions included all CNVs from tier 1 predicted by either xHMM or CLAMMS and “intersection” of tier 2 sets from both tools, that is, CNVs that were confirmed by two tools with reciprocal or cumulative reciprocal overlap of at least 50%. In the latter case, CNV predicted by one tool is covered by a set of CNVs predicted by the other tool. If a CNV from xHMM or CLAMMS was confirmed by the other tool, the overlapping region was reported as the final region. CNVs were removed from the analysis if it had more than half of its length overlapping with the ACMG secondary findings v2 gene74 (except PTEN, TSC1, and TSC2). If such gene covers less than 50% of CNV, the part of CNV without the gene was kept if it has at least 25% of its length not covered by segmental duplications. To identify higher confidence CNV predictions, xHMM and CLAMMS plots were manually investigated for each CNV in the final SF list. In addition, SF predictions were compared with PennCNV89,95,96 calls from array data, which have confidence score of at least 100. All reciprocal overlaps of at least 50% were treated as additional evidence for CNV support.

UW, chromosome aneuploidy assessment

We also assessed evidence of chromosomal aneuploidy by calculating sequence read depth using SAMTools10 version 1.4 on a per chromosome basis normalizing by the relative density of WES probes and comparing the normalized value for each chromosome to the normalized value on chromosome 1 (assumed to be diploid). For autosomes, we multiplied this number by two to get the estimate of chromosomal copy number. We did not multiply by two for the X or Y chromosomes. To further assess the chromosomal copy number, the heterozygosity was calculated for all SNPs and indels. For heterozygous sites, the absolute mean deviation from 0.5 was also calculated. We assessed both metrics to identify outliers. Aneuploidies were required to have support from both the read depth and SNP/indel metrics.

Burden of de novo variants

Baseline mutation rates for different classes of de novo variants in each GENCODE coding gene were calculated using a previously described mutation model.9 Briefly, the trinucleotide sequencec context was used to determine the probability of each base mutating to each other possible base. Then the mutation rate of each functional class of point mutations in a gene was calculated by adding up the mutation rate of each nucleotide in the longest transcript. The rate of frameshift indels was presumed to be 1.1 times the rate of nonsense point mutations. The expected number of variants in different gene sets were calculated by summing up the class-specific variant rate in each gene in the gene set multiplied by twice the number of patients (and if on chromosome X, further adjusted for female-to-male ratio97).

The observed number of variants in each gene set and case group was then compared with the baseline expectation using a Poisson test. In all analyses, constrained genes were defined by a pLI score of ≥0.5. To compare with previously published ASD studies, we collected published de novo variants identified in 4773 simplex trios from three largest ASD studies to date.3,4,7 To account for platform differences, the baseline mutation rate of each gene was scaled so that the exome-wide expected number of silent variants matches the observed count.

TADA analysis

To perform TADA analysis of de novo variants, we assumed the fraction of disease genes is 5% as estimated by previous studies.26,98 The prior relative risk for LGD variants and D-mis (defined by CADD > = 25) were specified as Gamma (18,1) and Gamma (6,1). The prior mean relative risks were determined using the relationship between burden and relative risk as described previously.26 The baseline mutation rate of each gene was the same as used in burden analysis. The analysis was performed on de novo variants of 4773 published trios and after combing de novo variants identified from SPARK pilot trios.

Laminal layer and cell type enrichment

To evaluate the expression specificity of laminal layer of human developing cortex, we analyzed RNA-seq data of neocortical samples of BrainSpan48 following the method of Parikshak et al.45 The expression specificity was measured by a t-statistic comparing the expression level in each layer against all other layers. Two candidate ASD risk genes (PAX5, KDM1B) were not included in the analysis due to the low expression levels (RPKM <1 for at least 20% available neocortical samples). To evaluate cell-type specificity, we used published data of mouse neuronal cell types inferred from analyzing single cell RNA-seq data of fetal and adult mouse brains generated by the Karolinska Institutet (KI),99 and human CNS cell types inferred from a single nucleus RNA-seq data.50 The mouse orthologs of human genes were retrieved from MGI database.100 The cell-type specificity was measured by a specificity index which is the mean expression level in one cell type over the summation of mean expression level across all cell types.101 To analyze the overall trend of specificity of a gene set, the mean specificity measure of its genes was compared with 10,000 sets of randomly drawn genes matched for the transcript length and GC content and the enrichment is measured by the standard deviation from the mean specificity of random gene sets.101

Network and functional analysis

The network depicted in Fig. 2a was constructed using the top decile of forecASD genes, SFARI Genes scoring 1 or 2, and SPARK newly implicated genes (6 in total). These genes were projected onto the STRING network102 (v10) using the igraph R package (1708 genes). Edges within the STRING network were thresholded at 0.4, according to the authors’ recommendation. The largest connected subcomponent (1664 genes) was then extracted as the basis for further network analysis. Clustering was performed on the fully connected network using the fastgreedy community function available within the igraph package. Clusters with fewer than 30 genes were not considered for further analysis (none of these clusters contained the six genes highlighted here). Following the first round of clustering, clusters with >150 genes were subject to an additional round of clustering, with the goal of separating broad functions of genes into more specific subcomponents. This process resulted in ten clusters. Each cluster was assessed for functional enrichment using the Gene Ontology103 as accessed through the clusterProfiler package within R. During the functional analysis the background gene universe was always set to the full set of genes represented among the ten clusters. Visualization of this network analysis was performed in Cytoscape.104 The top five most significant GO terms associated with each cluster are available in the Supplementary Data 9. Cluster labels in Fig. 2 were chosen as the most representative among the top terms for each cluster. Figure 2b was constructed using the subset of the larger network (Fig. 2a), corresponding to SPARK newly implicated genes and SFARI Gene genes scoring 1 or 2 (88 genes). These genes were projected onto the STRING network within Cytoscape using the STRINGapp. All nonzero-weighted edges were considered. The fully connected component was visualized, which resulted in two genes being dropped (DEAF1 and RANBP17). Edges adjacent to newly implicated genes with a STRING interaction score of ≥0.4 are highlighted.

ForecASD analysis

We used a recently developed method, forecASD40 that indexes support for a gene being related to ASD by integrating genetic, expression, and network evidence through machine learning. We examined the forecASD scores of candidate ASD risk genes from the TADA analysis and compared them to the remainder of the genome using a Wilcoxon rank-sum test. We similarly used the Wilcoxon test and employed two predictive features used by forecASD (BrainSpan_score and STRING_score) to assess whether the new genes showed similarity to known ASD risk genes in terms of brain expression patterns and network connectivity. Importantly, because forecASD uses previously published TADA scores among its predictive features, which are strongly correlated with updated TADA scores, we investigated whether the elevated forecASD scores in our candidate genes could be explained solely by the previous TADA scores. Specifically, we fit a logistic regression model with the candidate ASD risk genes labeled as ‘1’ and 500 size-matched background genes (not listed in the SFARI gene database) labeled as ‘0’ in the dependent variable (Y). Separate models were fit using either forecASD or TADA8 scores as predictors, or both together in a full model. Both TADA and forecASD were significantly associated with the “new gene” indicator when considered in isolation (P <0.001 for both, Z-test on logistic regression coefficients). However, when included together in a model of Y, forecASD remained significantly associated (p-value = 0.00012, Z-test on logistic regression coefficients) while TADA lost significance (p-value = 0.41, Z-test on logistic regression coefficients). The Akaike information criterion (AIC) indicated that the forecASD-only model was a more optimal fit compared with either the TADA-only or TADA + forecASD fit. This analysis suggests that the elevated forecASD scores observed in the ten new genes cannot be fully explained by the use of TADA as a predictor in forecASD.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.