Subject cohort and sequencing

From 2015 to 2016, we recruited subjects with EIEE for whom no underlying diagnosis was identified despite extensive prior testing. We excluded subjects with established genetic, metabolic, structural, or birth trauma-related causes. The final cohort included 14 subjects for whom DNA was also available from both parents (see Table 1 and Supp. Table 1 for extensive phenotypic and prior testing information for these subjects). We anticipated that, for the majority of subjects, the causative variant would be a de novo mutation,13,14,15,16 which are notoriously difficult to detect accurately from short-read sequencing data.17 Therefore, we performed deep whole-genome Illumina sequencing on all 14 families (i.e., 42 individuals). Two sequencing lanes from two distinct DNA libraries were used to maximize discovery in each family, producing an average of 65× (range 51× to 93×) median coverage per individual (see Supp. Table 2 and Supp. Fig. 1). Increased sequence coverage provides greater power to detect de novo mutations in subjects, and it also reduces false positive de novo mutation predictions in cases where the transmitted allele is not sequenced in one of the parents.17,18

Table 1 Summary of clinical phenotypes and prior genetic testing for each EIEE subject Full size table

Variant identification

After sequence alignment with BWA-MEM,19 we carried out comprehensive detection of genetic variation in each EIEE family trio, using a combination of existing alignment-based tools and our reference-free approach (Methods). We scanned each family for single-nucleotide variants (SNVs) and insertion-deletions (INDELs) using the GATK20 best practices pipeline. We also used LUMPY21 to detect structural variants (SV) and copy number variants (CNV), in conjunction with SVTyper22 to generate SV genotypes for each family member. Because of the strong prior expectation that the causative variant would be a de novo mutation in the affected child, we also applied RUFUS,23 our k-mer-based, alignment-free analysis algorithm designed specifically to reduce false positive de novo mutations predictions (see Methods) and reveal mutations that can be missed by alignment-based approaches.

Variant prioritization

With candidate de novo mutations detected in the 14 probands, we followed a tiered variant prioritization strategy to identify causative mutations (see Table 2). We first targeted missense, frameshift, or nonsense coding mutations within known genes associated with EIEE using both GEMINI24 and the web-based variant visualization and interrogation tool gene.iobio (http://gene.iobio.io). GEMINI was used to identify de novo mutations in genes that ClinVar25 associated with the terms “epileptic” and “infant”. To prioritize variants with gene.iobio, we first created an inclusive list of 223 EIEE candidate genes (Supp. Table 4) by merging genes across EIEE-specific gene panel tests and ClinVar,25 followed by a Phenolyzer26 search with the relevant phenotype search terms (see Methods). Candidate variants were classified as “pathogenic” or “likely pathogenic” according to ACMG criteria.27

Table 2 Mutations and affected genes identified for each subject Full size table

In 9 of the 14 subjects, GEMINI identified a single, de novo variant with high confidence in pathogenicity. Of these, seven subjects carried de novo missense variants in ion-channel genes (SCN1A, SCN2A, SCN8A, KCNQ2) with known association to EIEE (Table 2, Supp. Table 5). One subject had a de novo missense variant in the eukaryotic translation elongation factor 1 alpha 2 (EEF1A2) gene, and another subject harbored a one base pair frameshift insertion in the syntaxin-binding protein 1 (STXBP1) gene. In addition, gene.iobio identified a likely pathogenic mutation in a tenth subject within the phosphatidylinositol glycan anchor biosynthesis class A (PIGA) gene. Notably, these procedures allowed us to rapidly (in less than 5 min) screen a comprehensive candidate gene list and identify diagnostic variants in EIEE-associated genes for 10 of the 14 subjects (subjects #1, 3–6, 8, 11–14).

For the remaining four subjects, we searched for de novo SVs predicted to disrupt genes that have been previously implicated in EIEE. In subject #7, we detected a 63 kb de novo duplication within CDKL5. This copy number mutation created a tandem duplication of exons 5 through 15 (Fig. 1a) that we predicted to cause a frameshift when splicing of the mutant transcript joins exon 15 with the duplicated exon 5. In turn, the frameshift is predicted to create a stop codon five amino acids downstream from the end of the first copy of exon 15. The tandem duplication, frameshift, and stop gain were confirmed by sequencing cDNA derived from a fresh blood sample from subject #7 (Fig. 1b). This mutation is predicted to have an X-linked recessive effect in our male patient in a gene previously associated28,29,30 with EIEE.

Fig. 1 a A 63 kb de novo tandem duplication in CDKL5 duplicates exons 5 through 15 (for Ensembl canonical transcript ENST00000379989) in subject 7. b Targeted cDNA sequencing confirms the predicted frameshift and stop gain mutation caused by the de novo tandem duplication Full size image

For subjects #2, #9, and #10, we then searched for de novo missense or putative loss of function (i.e., nonsense, frameshift, and splice donor/acceptor) mutations in protein coding regions of genes not previously associated with EIEE. This search led to 18, 22, and 12 GATK-called variants, and 0, 1, and 2 RUFUS-called variants, respectively. Manually excluding low-quality variant calls and reviewing the potential for association with the phenotype, we excluded all but a single de novo variant (i.e., the one called by RUFUS) in subject #9, in the DNA binding, SAND domain of the DEAF1 gene. Missense variants in the SAND domain of DEAF1 have been previously reported in association with dominant intellectual disability phenotypes, and a severe recessive epilepsy phenotype.31,32 The same allele identified in subject #9 (p.G212S) was recently reported in a 15-year-old male with developmental regression and seizures.33 Functional studies suggest that this allele eliminates both DEAF1 transcriptional repression activity and DEAF1–DNA interactions.

Subject #10 harbored a de novo missense variant in CAMK2G, the gamma subunit of the calcium/calmodulin-dependent protein kinase II (CAMKII) complex. CAMKII is a multi-subunit complex that plays an essential role in synaptic function including learning and memory.34 The alpha and beta isoforms (CAMK2A and CAMK2B) are involved in calcium signaling in glutamatergic synapses.35 Furthermore, the CAMKII complex has been implicated in temporal lobe epilepsy,36 and de novo mutations in CAMK2A and CAMK2B were reported to cause intellectual disability.37 The variant identified in our subject substitutes a threonine with methionine in a highly conserved region of the catalytic subunit of CAMK2G. This variant is extremely rare: it is observed as a heterozygote in only one Finnish individual of >138,000 individuals sequenced in the gnomAD database,38 and incomplete penetrance could explain the lack of a known seizure phenotype for the gnomAD individual. While not directly associated with epilepsy or other clinical phenotypes, CAMK2G has been predicted to be a drug target for refractory epilepsies.39 A separate de novo variant in CAMK2G (c.1075G>A, p.V359M) was observed in a developmental disorder proband as part of the DDD study,40 but pathogenicity details of the phenotype were not available.

Lastly, for subject #2 we identified a de novo, inverted, balanced translocation between chromosome 2p16.1 and chromosome Xq28 (Fig. 2). This rearrangement moves a short, but gene-dense segment of chromosome X to chromosome 2. The translocated segment of chromosome X includes 92 genes with a breakpoint between MAGEA4 and GABRE. In this segment, three genes coding for subunits of the GABA receptor genes (GABRE, GABRA3, and GABRQ) and MECP2 have potential neurological phenotypes. Other GABA receptor genes including GABRA1, GABRB1, and GABRB3 have been associated with severe epilepsy phenotypes.41 While we did not find a sequence variant associated with epilepsy in this subject, the translocation likely disrupts patterns of X-inactivation and alters transcription patterns.42 Furthermore, MECP2 is associated with Rett syndrome and is approximately 2 Mb from the translocation breakpoint. There is some phenotypic similarity between subject #2 and patients with Rett syndrome, including microcephaly, seizures, and developmental regression. Furthermore, a Rett syndrome phenotype was described in a previous patient42 with a pericentric inversion in the vicinity of MECP2. We also identified a de novo variant in subject #2 that impacts an intronic or upstream (depending on the isoform) POL2 binding site within MECP2, though it is unclear if there is a change in transcript level as a result of this variant. Given the known association between MECP2 and infantile seizure disorders, as well as the Rett-like phenotype of this subject, we hypothesize that the disruption of MECP2 transcription is the most plausible mechanism.

Fig. 2 An inverted, reciprocal translocation between chromosomes X and 2. a The inverted translocation in subject 2 results in DNA exchange between the X chromosome and chromosome 2. The chromosome 2 break occurred in the p arm at position 59,405,748, leaving minor (24%) and major (76%) portions, and the chromosome X break occurred at the extreme q arm at position 151,118,513 leaving a minor (3%) and major (97%) portions. As a result, GABRE, GABRA3, and MECP2 are translocated from the X chromosome to chromosome 2. b A de novo mutation in subject 2 is also observed that is intronic to multiple isoforms (e.g., ENST00000303391) of MECP2 and upstream of other isoforms (e.g., ENST00000415944) of MECP2. The mutation lies within the observed binding site of multiple transcription factors, including Pol II Full size image

This study represents the first diagnostic application of our RUFUS de novo mutation detection method23 (manuscript in preparation). In contrast to the read alignment-based variant detection methods that are most commonly used today, the alignment-free, k-mer-based RUFUS algorithm directly compares k-mers in the sequencing reads between a child and his/her parents to identify child-specific k-mers that suggest de novo mutations. This strategy avoids the vast majority of the false positive mutation calls that arise from read alignment artifacts in alignment-based methods. Therefore, the main advantage of RUFUS over alignment-based detection approaches is the much higher specificity for calling mutations. For example, RUFUS detected on average 1.7 coding de novo mutations per subject, as compared to the average of 61.8 de novo mutation detected by GATK (Supp. Table 3). In fact, in 6 of the 14 subject genomes, RUFUS only called a single coding variant, and in 7 of the 14 subjects only a single-amino acid-changing variant (see Fig. 3 for an example). Furthermore, RUFUS detects all forms of de novo mutation in a single step, including SNVs, short INDELs, and SVs, thereby eliminating the need to run multiple detection programs on the data. RUFUS detected all diagnostic and putative disease causing mutations uncovered in this study, while reporting only a handful of additional mutations affecting coding sequences.