When treating seriously ill children, time is of the essence. Clark et al. built an automated pipeline to analyze EHR data and genome sequencing data from dried blood spots to deliver a potential diagnosis for hospitalized, often critically ill, children with suspected genetic diseases. Their pipeline required minimal user intervention, increasing usability and shortening time to diagnosis, delivering a provisional finding in a median time of less than 24 hours. Although this pipeline would need to be adapted for use at different hospital systems, such an automated tool could aid clinicians to expedite an accurate genetic disease diagnosis, potentially hastening lifesaving changes to patient care.

By informing timely targeted treatments, rapid whole-genome sequencing can improve the outcomes of seriously ill children with genetic diseases, particularly infants in neonatal and pediatric intensive care units (ICUs). The need for highly qualified professionals to decipher results, however, precludes widespread implementation. We describe a platform for population-scale, provisional diagnosis of genetic diseases with automated phenotyping and interpretation. Genome sequencing was expedited by bead-based genome library preparation directly from blood samples and sequencing of paired 100-nt reads in 15.5 hours. Clinical natural language processing (CNLP) automatically extracted children’s deep phenomes from electronic health records with 80% precision and 93% recall. In 101 children with 105 genetic diseases, a mean of 4.3 CNLP-extracted phenotypic features matched the expected phenotypic features of those diseases, compared with a match of 0.9 phenotypic features used in manual interpretation. We automated provisional diagnosis by combining the ranking of the similarity of a patient’s CNLP phenome with respect to the expected phenotypic features of all genetic diseases, together with the ranking of the pathogenicity of all of the patient’s genomic variants. Automated, retrospective diagnoses concurred well with expert manual interpretation (97% recall and 99% precision in 95 children with 97 genetic diseases). Prospectively, our platform correctly diagnosed three of seven seriously ill ICU infants (100% precision and recall) with a mean time saving of 22:19 hours. In each case, the diagnosis affected treatment. Genome sequencing with automated phenotyping and interpretation in a median of 20:10 hours may increase adoption in ICUs and, thereby, timely implementation of precise treatments.

We previously described diagnosis by rWGS in 26 hours in a research setting ( 16 , 17 ). In the clinical studies reported to date, however, the fastest genetic diagnosis by genome sequencing was 37 hours, the mean time to diagnosis was 16 days, and the largest cohort comprised only 63 patients ( 8 , 16 – 30 ). The small cohort size and longer time to diagnosis in those clinical studies substantiate the limitations of current methods of rWGS. Here, we report methods for clinical diagnosis of genetic diseases in a median of 20:10 hours that can be scaled to 30 patients per week per genome sequencing instrument, with automated provisional diagnosis.

Clinical studies are starting to substantiate the diagnostic and clinical utility and cost effectiveness of rWGS in seriously ill infants in ICUs, with reported rates of diagnosis of 42 to 57%, changes in medical management in 30 to 72% of cases, and altered outcomes in 24 to 34% of cases ( 12 , 14 , 16 – 30 ). This evidence has led to calls for accelerated implementation in national health care systems as the new standard of care ( 31 – 33 ). The National Health Service of the United Kingdom, for example, will offer whole-genome sequencing as part of care for all seriously ill children from 2019 ( 34 ). The major impediments to universal implementation in ICUs are absence of reimbursement outside the United Kingdom, lack of knowledge of genomic medicine by pediatricians, and the high capital and labor intensity of current clinical rWGS and interpretation.

Genetic diseases are the leading cause of infant mortality in the United States, particularly among about 15% of infants admitted to neonatal, pediatric, and cardiovascular intensive care units (ICUs) ( 1 – 11 ). As disease progression in infants is rapid, etiologic diagnosis must be equally fast to inform interventions that can lessen suffering, morbidity, and mortality ( 12 , 13 ). Unfortunately, this is rarely the case. More than 13,000 genetic diseases are known ( 14 , 15 ), and their presentations often overlap in seriously ill infants and are typically abridged with respect to classical descriptions ( 14 , 15 ). Standard genome sequencing takes weeks to return results, which is too slow to guide inpatient management. Rapid whole-genome sequencing (rWGS) provides faster diagnosis, enabling precision medicine interventions in time to decrease the morbidity and mortality of infants with genetic diseases ( 12 , 13 ). Furthermore, in genetic diseases with uniformly dismal prognosis, rapid diagnosis facilitates end-of-life care decisions that can alleviate suffering and aid the grieving process.

The third diagnosis was made in patient 412, a 3-day-old boy admitted to the neonatal ICU with seizures and a strong family history of infantile seizures responsive to phenobarbital. The autonomous system identified a likely pathogenic, heterozygous variant in the potassium voltage-gated channel, KQT-like subfamily, member 2 gene (KCNQ2 c.1051C > G). This gene is associated with autosomal dominant benign familial neonatal seizures 1 (OMIM disease record 121200). The diagnosis was made in 20:53 hours, which was 27:30 hours earlier than a concurrent run with the fastest manual methods. A verbal provisional result was conveyed to the clinical team upon review of the result by a laboratory director as the diagnosis provided confidence in treatment with phenobarbital and changed the prognosis. For the remaining four patients, no diagnosis was evident with either the manual or autonomous method.

The second diagnosis was made in patient 7052, a previously healthy 17-month-old boy admitted to the pediatric ICU with pseudomonal septic shock, metabolic acidosis, ecthyma gangrenosum, and hypogammaglobulinemia. Singleton, proband, rapid sequencing, and automated interpretation identified a pathogenic hemizygous variant in the Bruton tyrosine kinase gene (BTK c.974 + 2 T > C) associated with X-linked agammaglobulinemia 1 (OMIM #300755) in 22:04 hours. This was 16:33 hours earlier than a concurrent trio run with the fastest manual methods. The provisional result provided confidence in treatment with high-dose intravenous immunoglobulin (to maintain serum immunoglobulin G concentration of >600 mg/dl) and 6 weeks of antibiotic treatment. This provisional diagnosis was verbally conveyed to the clinical team upon review of the autonomous result by a laboratory director. Clinical whole-genome sequencing subsequently returned the same result and showed the variant to be maternally inherited.

We prospectively compared the performance of the autonomous diagnostic system with the fastest manual methods in seven seriously ill infants in ICUs and three previously diagnosed infants ( Table 1 ). The median time from blood sample to diagnosis with the autonomous platform was 19:56 hours (range, 19:10 to 31:02 hours), compared with the median manual time of 48:23 hours (range, 34:38 to 56:03 hours). This included two automated runs that were delayed by operator error or data center downtime. The autonomous system coupled with InterVar post-processing made three diagnoses and no false-positive diagnoses. All three diagnoses were confirmed by manual methods and Sanger sequencing. The first was for patient 352, a 7-week-old female, admitted to the pediatric ICU with diabetic ketoacidosis. rWGS was performed on the singleton proband. In 19:11 hours, the autonomous system identified a previously unreported, heterozygous missense variant in the insulin gene (INS c.26C > G, pPro9Arg), which is associated with autosomal dominant permanent neonatal diabetes mellitus (OMIM disease record 606176). According to American College of Medical Genetics and Genomics (ACMG) and Association for Molecular Pathology (AMP) pathogenicity criteria, the variant was of uncertain significance (VUS). After 42:04 hours, parent-child trio sequencing with the fastest manual methods confirmed the result and showed the variant to be de novo, which changed the variant classification to likely pathogenic.

Infant 6159, with autosomal dominant Alport syndrome (COL4A4 c.4715C > T, p.Pro1572Leu), had hematuria, nephrotic syndrome, glomerulonephritis, hypertension, and anasarca. OMIM indicated that COL4A4-associated Alport syndrome (CAS) was autosomal recessive, and p.Pro1572Leu was recorded as pathogenic in ClinVar for autosomal recessive Alport syndrome. There are, however, a large number of reports of autosomal dominant CAS. The variant was maternally inherited. Because the infant’s mother was asymptomatic, we assumed that she exhibited incomplete penetrance of autosomal dominant CAS, as has been reported ( 51 , 52 ). The autonomous system classified the infant as a carrier for autosomal recessive CAS.

We retrospectively examined the concordance between the autonomous system and previous, team-based, manual expert interpretation in 95 of the 101 children, diagnosed with 97 of the 105 genetic diseases (table S15). We excluded eight findings that had been reported but that were considered incidental (without current evidence of any of the expected phenotypic features). This cohort was diverse in race and ancestry. Eleven diagnoses were associated with SVs, and 86 were associated with nucleotide variants. No training patients were included in the test set. In two patients, a revised clinical report was issued of a new diagnosis (infant 6007, EIEE9, Xp22 del, and patient 6033, Cockayne syndrome B, ERCC6 p.Gly528Glu and c.-15 + 3G > T, which was validated by functional studies). Therefore, initial expert manual interpretation had a recall of 98% (95 of 97). Although we did not re-analyze manual diagnoses, none of them had been demoted in the period since initially reported clinically. The autonomous diagnostic system had a precision of 99% (93 of 94) and a recall of 97% (94 of 97). For nucleotide variants and SVs, the median rank of the correct diagnosis was first (range, 1 to 4 for nucleotide variants; range, 1 to 13 for SVs) (table S18).

We also wrote scripts to automatically transfer a patient’s nucleotide and structural variants (SVs) from the DRAGEN platform to MOON as soon as it finished, without user intervention. For rWGS, there was a mean of 4,742,595 nucleotide variants and 19.3 SVs, and rapid whole-exome sequencing (rWES) had a mean of 39,066 nucleotide variants and 10.3 SVs per patient (table S16). Of these, MOON retained 67,589 nucleotide variants and 12 SVs and 791 nucleotide variants and 4.5 SVs for rWGS and rWES, respectively, that had allele frequencies of <2% and affected known disease genes (table S17). A Bayesian framework and probabilistic model in MOON ranked the pathogenicity of these variants with 15 in silico prediction tools, ClinVar assertions, and inheritance pattern–based allele frequencies. In singleton and family trio analyses, on average five and three provisional diagnoses were ranked, respectively (table S18). Because MOON was optimized for sensitivity, it shortlisted a median of six nucleotide variants per diagnosed subject (range, 2 to 24) and often shortlisted false-positive diagnoses in cases considered negative by manual interpretation. Both were largely remedied, however, by processing the MOON output in InterVar software and retaining only pathogenic and likely pathogenic variants ( 49 ). InterVar classified variants with regard to 18 of the 28 consensus pathogenicity recommendations ( 50 ), specifically triaging variants of uncertain significance (VUS). Automated interpretation took a median of 5 min from transfer of variants and HPO terms to display of the provisional diagnosis and supporting evidence, including patient phenotypic features matching that disorder, for laboratory director review. In four timed runs, the time from blood samples or blood spot receipt to display of the correct diagnosis as the top-ranked variant was 19:14 to 20:25 hours (median, 19:38 hours; Table 1 , retrospective cases). This conformed well to a daily clinical operation cycle: Sample receipt in the morning enabled library preparation in the afternoon, genome sequencing overnight, and provisional reporting early the following morning for laboratory director review.

The remaining step in automated diagnosis of genetic diseases was to combine the automated ranking of the patient’s CNLP phenome with respect to all genetic diseases, together with the automated ranking of the pathogenicity of all their genomic variants based on literature knowledge and in silico tools ( Fig. 1 and fig. S3). We wrote scripts to automatically transfer the patient’s CNLP-derived phenotypic features and genomic variants to autonomous interpretation software (MOON, Diploid). MOON identified the phenotypic features associated with each genetic disease by natural language processing of the medical literature. Typically, this was a larger set of phenotypic features than those listed in the OMIM Clinical Synopsis. MOON then compared the patient’s phenotypic features with those associated with each genetic disease and rank-ordered the genetic diseases on the basis of their likelihood of causing the child’s illness.

Traditionally, genetic diseases have been clinically diagnosed by the identification of one or more pathognomonic phenotypic features. Such phenotypic features have high IC (the logarithm of the probability of that phenotypic feature being observed in all OMIM diseases; Fig. 2 ) ( 48 ). A potential concern was that phenotypic features extracted by CNLP would have less IC than those prioritized manually by experts during interpretation. However, among the 101 children, the mean IC of CNLP phenotypic features (8.1; SD, 2.0; range, 2.6 to 11.4) was significantly higher than manual (7.8; SD, 2.0; range, 2.1 to 11.4; P = 0.003, Mann-Whitney U test) or OMIM phenotypic features (7.3; SD, 1.7; range, 3.2 to 11.4; P < 0.0001, Mann-Whitney U test) ( Fig. 3E ). We note that the mean IC correlated significantly with the number of phenotypic features extracted manually and by CNLP [Spearman’s rho, 0.24 (P = 0.02) and 0.44 (P < 0.0001), respectively; Fig. 3C ]. The mean IC of CNLP phenotypic features was higher than manual phenotypic features ( Fig. 3F ), and the mean IC correlated significantly with the number of phenotypic features extracted by CNLP [Spearman’s rho, 0.30 (P < 0.0001); Fig. 3G ].

In the 101 diagnosed children, phenotypic features extracted by CNLP overlapped expected OMIM phenotypic features (mean, 4.31 terms; SD, 4.59; range, 0 to 32) significantly more than the manually extracted phenotypic features (mean, 0.92 terms; SD, 1.02; range, 0 to 4; P < 0.0001, paired Wilcoxon test) ( Fig. 3B ). Although the cohort included eight genetic diseases that were incidental findings, their exclusion did not materially change these results (table S15 and fig. S1). Thus, the recall of OMIM phenotypic features by CNLP, although small (mean, 0.20; SD, 0.16; range, 0 to 0.67), was substantially greater than the sparse expert manual phenotypic features used in expert manual interpretation (mean, 0.04; SD, 0.06; range, 0 to 0.25) (fig. S2). However, the much larger number of phenotypic features extracted by CNLP was associated with lower precision (mean, 0.04; SD, 0.03; range, 0 to 0.15) than manual extraction (mean, 0.25; SD, 0.30; range, 0 to 1) when compared with OMIM, indicating that, by design, an autonomous diagnostic system should not penalize false-positive phenotypic features. Recall and F 1 values increased when phenotypic features with one degree of hierarchical separation to those extracted were included [(mean CNLP recall with inexact matches, 0.29; SD, 0.22; range, 0 to 1), (mean CNLP F 1 with inexact matches, 0.12; SD, 0.08; range, 0 to 0.38), and (mean CNLP F 1 with exact matches, 0.06; SD, 0.05; range, 0 to 0.23)], indicating that, by design, an autonomous system should include hierarchical parents of extracted terms (fig. S2).

( A to D ) One hundred one children diagnosed with 105 genetic diseases. ( E to H ) Two hundred seventy-four children with suspected genetic diseases that were not diagnosed by genome sequencing. Phenotypic features identified by manual EHR review are in yellow, those identified by CNLP are in red, and the expected phenotypic features, derived from the OMIM Clinical Synopsis, are in blue. (A) Frequency distribution of the number of phenotypic features (log-transformed) in 101 children with genetic diseases. The mean number of features detected per patient was 4.2 (SD, 2.6; range, 1 to 16) for manual review, 116.1 (SD, 93.6; range, 13 to 521) for CNLP, and 27.3 (SD, 22.8; range, 1 to 100) for OMIM (OMIM versus manual, P < .0001; CNLP versus OMIM, P < .0001; CNLP versus manual, P < 0.0001; paired Wilcoxon tests). (B) Frequency distribution of IC for each phenotypic feature set in 101 diagnosed patients. The mean IC was 7.8 (SD, 2.0; range, 2.1 to 11.4) for manual review, 8.1 (SD, 2.0; range, 2.6 to 11.4) for CNLP, and 7.3 (SD, 1.7; range, 3.2 to 11.4) for OMIM (manual versus OMIM, P < .0001; CNLP versus OMIM, P < .0001; manual versus CNLP, P = 0.003; Mann-Whitney U tests). (C) Correlation of the mean IC of phenotypic terms with the number of phenotypic terms in each patient. Spearman’s rank correlation coefficient (r s ) was 0.24 for manually extracted phenotypic features (P = 0.02), 0.44 for CNLP (P < 0.0001), and −0.001 for OMIM (P > 0.05). (D) Venn diagram showing overlap of phenotypic terms by the three methods for diagnosed patients. Phenotypic features extracted by CNLP overlapped expected OMIM phenotypic features (mean, 4.31 terms; SD, 4.59; range, 0 to 32) significantly more than manually (mean, 0.92 terms; SD, 1.02; range, 0 to 4; P < 0.0001, paired Wilcoxon test for the difference in the number of terms that overlap with OMIM). (E) Frequency distribution of the number of phenotypic features (log-transformed) in 274 children with suspected genetic diseases that were not diagnosed by genome sequencing. The mean number of features was 3.0 (SD, 1.9; range, 1 to 12) for manual review and 90.7 (SD, 81.1; range, 6 to 482) for CNLP (CNLP versus manual, P < 0.0001; paired Wilcoxon test). (F) Frequency distribution IC for each phenotypic feature set in 274 undiagnosed patients. The mean IC was 7.7 (SD, 2.1; range, 2.1 to 11.4) for manual review and 8.1 (SD, 2.0; range, 2.6 to 11.4) for CNLP (manual versus CNLP, P < 0.0001; Mann-Whitney U test). (G) Correlation of the mean IC of phenotypic terms with the number of phenotypic terms in each patient. r s was 0.02 for manually extracted phenotypic features (P > 0.05) and 0.30 for CNLP (P < 0.0001). (H) Venn diagram showing overlap of phenotypic terms for undiagnosed patients by CNLP and manual methods.

The performance of the optimized CNLP was tested with the EHRs of 10 test children who had received genome sequencing for genetic disease diagnosis. The training and test sets did not overlap. Both exact EHR phenotypic feature matches and their hierarchical root terms were extracted from the first record until time of enrollment for genome sequencing. CNLP identified a mean of 86.7 phenotypic features (SD, 32.8; range, 26 to 158) (table S4) in about 20 s per patient. A detailed manual review of the EHR was performed to identify all true-positive, false-positive, and false-negative CNLP phenotypic features in the test children. The precision (positive predictive value) of CNLP was 80% and the recall (sensitivity) was 93% (table S4), which were superior to previous CNLP-based extraction of HPO terms ( 36 , 41 ). The principal reasons for false positives were as follows: (i) incorrect CLiX encoding (n = 89, 38% of 237 phenotypic features) due to misinterpreted context (n = 31), unrecognized headings (n = 23), incorrect acronym expansion (n = 21), incorrect interpretation of a clinical word (n = 8), or incorrectly attributed finding site for disease (n = 6); (ii) ambiguity of source text (unrecognized or incorrect syntax, abbreviations, acronyms, or terminology; n = 46, 19% of 237); (iii) incongruity among SNOMED CT, HPO, and clinical acumen (n = 20, 8%); (iv) failure to recognize a pasted citation as nonclinical text (n = 68, 29%); and (v) incorrect query logic (n = 14, 6%) (tables S5 to S14).

( A ) Example CNLP of a sentence from the EHR of an 8-day-old baby (patient 341) with maple syrup urine disease, showing four extracted HPO terms. ED, emergency department. ( B ) Hierarchical display of HPO phenotypic features extracted by manual review of the EHR of neonate 341 and by CNLP (red) and expected phenotypic features (from the OMIM Clinical Synopsis; blue). Yellow circles: Phenotypic features extracted by both CNLP and expert review. Purple circles: Phenotypic overlap between CNLP and OMIM. Gray circles: The location of parent terms of identified phenotypic features within the HPO hierarchy. The information content (IC) was defined by IC(phenotype) = −log(p phenotype ), where p phenotype was the probability of observing the exact term or one of its subclasses across all diseases in OMIM. IC increases from top (general) to bottom (specific).

Genetic disease diagnosis requires determination of a differential diagnosis based on the overlap of the observed clinical features of a child’s illness (phenotypic features) with the expected features of all genetic diseases. However, a comprehensive EHR review can take hours. In addition, manual phenotypic feature selection can be sparse and subjective ( 36 , 37 ), and even expert reviewers can carry an unwritten bias into interpretation ( Fig. 1A ). We sought automated, complete phenotypic feature extraction from EHRs, unbiased by expert opinion. The simplest approach would be to extract universal, structured phenotypic features, such as International Classification of Diseases (ICD) medical diagnosis codes or diagnosis-related group (DRG) codes. However, these are sparse and lack sufficient specificity ( 38 , 39 ). Instead, we extracted clinical features from unstructured text in patient EHRs by CNLP that we optimized for identification of patients with orphan diseases (CLiX ENRICH, Clinithink Ltd.) ( Figs. 1B and 2A ). We then iteratively optimized the protocol for the Rady Children’s Hospital Epic EHRs using a training set of 16 children who had received genome sequencing for genetic disease diagnosis (table S3). The standard output from CLiX ENRICH is in the form of Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). However, our automated methods required phenotypic features described in the HPO, a hierarchical reference vocabulary designed for description of the clinical features of genetic diseases ( Fig. 2B ). For this reason, we mapped 7706 (60%) of 12,786 HPO terms (13,685 including synonyms) and 75.4% of Orphanet Rare Disease HPO terms (released in June 2018) to SNOMED CT by lexical and logical methods and then manually verified them (data file S1). This enabled automated translation of phenotypic features extracted from the EHR by CNLP from SNOMED CT concepts to HPO terms ( Fig. 1B ). In contrast, Dhombres and Bodenreider ( 40 ) mapped 92% of HPO terms to SNOMED CT, but only 49% were shown to be ontologically valid and clinically relevant.

Dynamic Read Analysis for GENomics (DRAGEN, Illumina) is a hardware and software platform for alignment and variant calling that has been highly optimized for speed, sensitivity, and accuracy ( 16 ). We wrote scripts to automate the transfer of files from the sequencer to the DRAGEN platform. The DRAGEN platform then automatically aligned the reads to the reference genome and identified and genotyped nucleotide variants. Alignment and variant calling took a median of 1 hour for 150 Gb of 101-nt paired-end sequences (primary and secondary analyses; Table 1 ). Analytic performance of this new method, from blood sample receipt to output of genomic variant genotypes, was similar to standardclinical methods with reference human genome samples, retrospective patient samples, and prospective patient samples, except for lower sensitivity in the detection of nucleotide insertions and deletions (tables S1 and S2). The new method did not assess structural variations.

Following the preparatory steps, our previous method performed rWGS with the HiSeq 2500 sequencer (Illumina) in rapid run mode, with one sample sequenced per sequencing instrument [~120 gigabases (Gb) of 2 × 101 nucleotides (nt)] in ~25 hours ( Fig. 1A ) ( 16 , 17 ). Here, we instead performed rWGS with the NovaSeq 6000 sequencer and S1 flow cell (Illumina) ( Fig. 1B ), as this instrument was faster and less labor intensive, requiring fewer steps to set up a sequencing run and automatically washing the instrument after a run. In four timed runs with retrospective samples, genome sequencing of 2 × 101 nt took a mean 15:32 hours and yielded 404 to 537 Gb per flow cell, sufficient for two to three 40× genome sequences ( Table 1 and table S1).

Primary (1°) and secondary (2°) analyses: Conversion of raw data from base call to FASTQ format, read alignment to the reference genomes, and variant calling. Tertiary (3°) analysis processing: Time to process variants and phenotypic features and make them available for manual interpretation in Opal interpretation software (Fabric Genomics) or to display a provisional, automated diagnosis(es) in MOON interpretation software (Diploid). Std., rapid standard methods; auto., rapid, autonomous platform; dev. delay, global developmental delay; PPHN, persistent pulmonary hypertension of the newborn; HIE, hypoxic ischemic encephalopathy; n.a., not applicable. Patients 263, 6124, and 3003 were retrospectively analyzed by the autonomous system. Patient 263 was analyzed two times by the autonomous system. Patients 6194, 290, 352, 362, 412, and 7072 were prospectively analyzed by both autonomous and standard diagnostic methods.

( A ) Steps in conventional clinical diagnosis of a single patient by genome sequencing (GS) with manual analysis and interpretation in a minimum of 26 hours but with a mean time to diagnosis of 16 days ( 8 , 16 – 30 ). Genome sequencing was requested manually. We manually extracted genomic DNA from blood samples, assessed the DNA quality (QA), and manually normalized the DNA concentration. We then manually prepared TruSeq PCR-free DNA sequencing libraries, performed the QA again, and manually normalized the library concentration. Genome sequencing was performed on the HiSeq 2500 system (Illumina) in rapid run mode (RRM). Sequences were manually transferred to the DRAGEN Platform version 1 (Illumina) for alignment and variant calling. Phenotypic features were identified by manual review of the electronic health record (EHR). Variant files and phenotypic features were manually loaded into Opal software (Fabric), and interpretation was performed manually. ( B ) Steps in autonomous diagnosis of up to six patients concurrently in a minimum of 19 hours (fig. S3). Steps included (i) automation of order entry from the EHR with a portal; (ii) manual or robotic preparation of Nextera DNA Flex sequencing libraries directly from the blood in 2.5 hours; (iii) rapid 40-fold coverage genome sequencing in 15.5 hours with the NovaSeq 6000 system and S1 flow cell (Illumina); (iv) automation of sequence transfer, alignment, and variant calling in 1 hour with the DRAGEN platform, version 2 (Illumina); (v) automated extraction of patient phenomes from the EHR by clinical natural language processing (CNLP) and translation to Human Phenotype Ontology (HPO) terms in 20 s; and (vi) automated transfer of variant and phenotype files and automated Bayesian comparison of the CNLP phenome with those of all genetic diseases (MOON, Diploid) combined with automated assessment of the pathogenicity of their genomic variants based on aggregated literature knowledge and in silico predictive tools (InterVar) and with automated display of the highest-ranked provisional diagnosis(es).

In light of the limitations of current methods of rWGS, we developed an automated platform for rapid, high-throughput, provisional diagnosis of genetic diseases with genome sequencing by automating and accelerating our conventional workflow ( Fig. 1 ). Conventional clinical genome sequencing requires preparatory steps of manual purification of genomic DNA from blood samples, DNA quality assessment, normalization of DNA concentration, sequencing library preparation, and library quality assessment ( Fig. 1A ). Instead, we manually prepared sequencing libraries directly from blood samples or dried blood spots using microbeads to which transposons were attached (Nextera DNA Flex Library Prep Kit, Illumina Inc.; Fig. 1B ) ( 35 ), because this method was both faster and less labor intensive. Dried blood spots are the sample type used in mandatory newborn screening worldwide. In four timed runs with retrospective samples, manual Nextera library preparation from dried blood spots took a mean of 2 hours and 45 min, compared with at least 10 hours by conventional DNA purification and library preparation (TruSeq DNA PCR-free Library Prep Kit, Illumina Inc.; Table 1 ). As with standard methods, Nextera Flex allowed samples to be prepared in batches and was amenable to automation with liquid-handling robots.

DISCUSSION

Previously, the fastest time to diagnosis by genome sequencing in clinical practice was 37 hours (8, 15–26) . The protocol was, however, extremely labor and capital intensive and was limited to one sample at a time. Here, we described a prototypic, autonomous system for genetic disease diagnosis in a median of 20:10 hours requiring decreased user intervention and a throughput of up to two parent-child trios or six probands per run. Most decision-making in ICUs is made deliberatively in morning rounds attended by a multidisciplinary health care team. Thus, a potential 20-hour diagnosis would return results to the on-call physician who had ordered testing in time for morning rounds. This would simplify information transfer during rounds and facilitate management decisions. A 20-hour diagnosis is important in seriously ill infants because most timely genomic diagnoses result in changes in ICU management (16–25).

Our autonomous platform for potential 20-hour diagnosis of genetic diseases was designed to meet the needs of acutely ill infants in ICUs with diseases of unknown etiology. It has been estimated that 10 to 12% of infants admitted to regional ICUs may benefit from same-day diagnosis and implementation of targeted treatments (8, 16–30). In 2014, the U.S. Food and Drug Administration (FDA) permitted provisional reporting in seriously ill children when the diagnosis indicated changes in management that could improve outcomes and where a delay in reporting until confirmation of results by Sanger sequencing could result in avoidable morbidity or mortality (18, 20, 21). In our previous experience, provisional diagnoses were reported in 17% (114 of 684) of genome sequencing cases, with a mean time to report of 3.6 days. Presentations in which 20-hour diagnoses were likely to be associated with improved outcomes included neonatal epileptic encephalopathies, metabolic diseases (as in patient 352), septic shock possibly associated with immunodeficiency (as in patient 7052), organ failure, and extracorporeal membrane oxygenation that is considered in the absence of a known disease etiology (18–24, 28). Thus, a circumscribed application of an autonomous diagnostic system is to identify provisional diagnoses for laboratory director review, earlier than standard rapid testing, in a subset of neonatal and pediatric ICU admissions in which morbidity or mortality is likely to be avoided by early institution of targeted treatment. It will be important to evaluate the proportion of seriously ill patients and extent of urgent health care settings in which a potential 20-hour diagnosis would inform acute interventions and for which a longer time to result would not be effective.

This paper demonstrated the automated extraction of a deep, digital phenome from the EHR. The analytic performance of the extraction of phenotypic features from the EHRs of children with genetic diseases by CNLP herein was considerably better than previous reports and appeared adequate for replacement of expert manual EHR review (36, 41). CNLP extracted 27-fold more phenotypic features from the EHR than those selected by experts during manual interpretation, consistent with previous reports (36, 41, 47). In addition, the mean IC of the CNLP phenome was greater than that of the phenotypic features selected by experts during manual interpretation. The superiority of deep CNLP phenomes was shown by substantially greater overlap with the expected (OMIM) clinical features than by those selected by experts during manual interpretation. Phenotypic features selected by experts during manual interpretation had poorer diagnostic utility than CNLP-based phenotypic features when used in the autonomous diagnostic system. This concurred with two recent reports of genome sequencing of cohorts of patients in which the rate of diagnosis was greater when more than 15 phenotypic features were used at time of interpretation than when one to five features were used (53, 54).

Here, we described fully automated interpretation of sequencing results. In 95 seriously ill children, the automated system had 97% recall and 99% precision in recapitulating 97 genetic disease diagnoses made by a team of experts. Where the system suggested more than one diagnosis, the median rank of a variant associated with the correct diagnosis was first. The three false-negative automated results had explanations that either can be addressed by parameter adjustments or were of types that cause assessments of variant pathogenicity to vary between laboratories (55). Prospectively, molecular laboratory directors determined that the automated system made correct provisional diagnoses in three of seven seriously ill ICU infants (100% precision and recall) with an average time saving of 22:19 hours. In light of insufficient expert analysts, molecular laboratory directors, medical geneticists, and genetic counselors to expand genomic diagnosis to regional ICU infants worldwide, such diagnostic performance was sufficient to suggest several, high-throughput clinical applications (31–33). Supervised autonomous systems may provide effective first-tier, provisional diagnoses, allowing valuable cognitive resources to be reserved for unsolved or difficult cases, manual curation of variants, and clinical report generation that includes a summary of medical management literature. Second, in the roughly 67% of cases where manual interpretation fails to provide a diagnosis, it is difficult to know when analysis should be considered complete. With further development, autonomous diagnostic systems could provide an independent, objective analysis in such cases. Third, autonomous systems could reanalyze unsolved cases periodically. This is burdensome to perform manually because 250 new gene-disease associations and 9200 new variant-disease associations are reported annually. However, reanalysis yields up to 8 to 10% new diagnoses per annum (56–60). Automated reanalysis could include updated CNLP of the EHR, which would be useful when the phenotype evolves with time. A known risk of genetic testing is overtreatment as a result of overdiagnosis (61). Periodic, autonomous reanalysis would also detect cases where the diagnosis is changed as a result of reclassification of the causality of the gene or pathogenicity of the variant and/or where phenome overlap was minimal. An autonomous system, akin to an autopilot, can decrease the labor intensity of genome interpretation. One hundred six years after the invention of the autopilot, however, two pilots are still employed in cockpits of commercial aircraft. Likewise, a skilled team will still be required to curate the literature and make tough decisions/classifications for the foreseeable future.

The automated system has several limitations. First, system performance is partly predicated on the quality of the history and physical examination and on the completeness of the write-up in EHR notes. The performance of the autonomous diagnostic system, although acceptable, is anticipated to improve with additional training, increased mapping of HPO terms associated with genetic diseases in OMIM, Orphanet, and the literature to SNOMED CT (the native language of the CNLP), inclusion of phenotypes from structured EHR fields, measurements of phenotype severity (such as phenotype term frequency in EHR documents), and material-negative phenotypes (pathognomonic phenotypes whose absence rules out a specific diagnosis). As part of this, a quantitative data model is needed for improved multivariate matching of nonindependent phenotypes that appropriately weights related, inexact phenotype matches. Although possible, the automated system did not take advantage of commercial variant database annotations, such as the Human Gene Mutation Database, and did not eliminate the labor-intensive literature curation that is the current standard for variant reporting. Diagnosis of genetic diseases due to SVs requires standard library preparation and additional software steps that add several hours to turnaround time. Because the autonomous system uses the same knowledge of allele and disease frequencies as manual interpretation, which underrepresent minority races or ethnicities, pathogenicity assertions in the latter groups are less certain. Likewise, because the autonomous system uses the same consensus guidelines for variant pathogenicity determination as manual interpretation, it is subject to the same general limitations of assertions of pathogenicity (55–61).

The major barriers to widespread adoption of genomic medicine for seriously ill infants with disorders of unknown etiology are an untrained medical workforce and substantial shortage of domain experts, including medical geneticists, molecular laboratory directors, and genetic counselors. Manual genome analysis and interpretation are very labor intensive. In addition, the extreme number of rare genetic diseases precludes easy domain mastery by nonexperts. Thus, pediatric genomic medicine may be one of the first clinical areas where artificial intelligence is necessary for its general adoption (62). Diagnosis of seriously ill infants with diseases of unknown etiology represents an early application of autonomous diagnostic systems because such cases are abundant in ICUs and a faster time to result is critical for optimal outcomes.