Our increasingly sophisticated ability to phenotype humans, coupled with differences in physiology between humans and model organisms, argues that primary mutation discovery in humans will remain crucial to progress.Moreover, now that a finite set of protein-coding genes has been defined, determining the phenotypic consequences of their variation represents a vital and attainable goal partly because of the advances in the production and analysis of whole-exome sequencing (WES) and whole-genome sequencing (WGS) data.Each successful discovery will define potential diagnostic, preventive, and therapeutic opportunities for the corresponding diseases and illuminate normal biology and disease mechanisms.

Whereas protein-coding regions compose only about 1% of the human genome, the overwhelming majority of Mendelian phenotypes identified thus far result from altered function, localization, or presence of the encoded proteins. Furthermore, few Mendelian phenotypes appear to be caused exclusively by mutations outside coding regions.This is not only a matter of ascertainment bias, given that loci that are well mapped by unbiased analysis of linkage data prior to the discovery of underlying causes have yielded extremely high ratios in favor of variants that alter protein function.However, progress in the elucidation of promoters and tissue-specific regulatory elements by projects such as ENCODEand in linking perturbations in these elements to alterations of gene expression holds promise for the identification of new Mendelian phenotypes caused by non-coding mutations.Mendelian phenotypes for which mutations have not been discovered in coding regions or canonical splice sites are ideal candidates for such studies.

Much remains to be learned. The HGP and subsequent annotation efforts have established that there are ∼19,000 predicted protein-coding genes in humans.Nearly all are conserved across the vertebrate lineage and are highly conserved since the origin of mammals ∼150–200 million years ago,suggesting that certain mutations in every non-redundant gene will have phenotypic consequences, either constitutively or in response to specific environmental challenges. The continuing pace of discovery of new Mendelian phenotypes and the variants and genes underlying them supports this contention.

Improved understanding of human disease was a primary goal of the Human Genome Project (HGP).This promise has, in part, been realized with the identification of the consequence of germline mutation (single-nucleotide variants [SNVs] and copy-number variants [CNVs]) for more than 2,900 protein-coding genes in humans.These disease-associated mutations directly link DNA variants to altered protein function or dosage and to human phenotypes, thus transforming our understanding of the basic biology of development and physiological homeostasis in health and disease. Indeed, much of what is known about the relationship between gene function and human phenotypes is based on the study of rare variants underlying Mendelian phenotypes. Furthermore, these discoveries have identified new preventative, diagnostic, and therapeutic strategies for a growing number of rare and common diseases.

Molecular diagnostic rates for many Mendelian phenotypes are similarly low. For example, of 292 well-characterized categories of Mendelian phenotypes for which clinical testing is available (GeneReviews, July 2014; Table S1 ), a causal variant can be identified in only 52% of cases overall ( Figure 1 A). Diagnostic rates vary widely by phenotype and are inversely correlated with the level of genetic heterogeneity (Spearman correlation ρ = −0.155, p value = 0.008019; Figure 1 B; Table S2 ). These observations are consistent with diagnostic rates of 25%–75% for many major categories of inherited conditions (e.g., kidney disease,cardiomyopathy,epilepsy,etc.) The increasing availability of clinical WES testing holds promise for improving diagnostic yield, and although the diagnostic rate reported with this technology is currently only ∼25%–30%, it represents a substantially higher rate than the diagnostic yield of other genomic assays, such as karyotyping (<5%) and array comparative genomic hybridization (∼15%–20%).Importantly, studies of the diagnostic efficacy of clinical WES show that a substantial fraction (25%–30%) of the diagnostic successes depends on recent progress in the discovery of genes underlying disease.This observation highlights the value of continued research into the genetic basis of Mendelian phenotypes. Moreover, it predicts that diagnostic rates will continue to increase as we work toward a more complete catalog of the genes and variants responsible for human disease.

(B) Boxplots show the molecular diagnostic rate (y axis) for Mendelian conditions organized by the number of causal genes (x axis). The diagnostic rate per condition is inversely correlated with the level of genetic heterogeneity (Spearman correlation ρ = −0.155, p value = 0.008019).

(A) Histogram of the percentage of individuals who had a Mendelian condition (x axis) and who received a corresponding molecular diagnosis from clinical testing. Collectively, for 292 Mendelian conditions, a causal variant could be identified in only ∼52% of affected subjects overall.

All Mendelian conditions, or phenotypic series, included are listed in GeneReviews and might be genetically heterogeneous (i.e., caused by mutations in one or more genes).

Board of the Working Group for Inherited Kidney Diseases of the European Renal Association and European Dialysis and Transplant Association Rare inherited kidney diseases: challenges, opportunities, and perspectives.

It remains a challenge to diagnose many Mendelian phenotypes by phenotypic features and conventional diagnostic testing. In a general clinical genetics setting, the diagnostic rate is ∼50%.Across a broader range of rare diseases, diagnostic rates are even lower. For example, in the NIH Undiagnosed Disease Program, the diagnostic rate was, despite state-of-the-art evaluations, 34% in adults and 11% in children.Moreover, the time to diagnosis is often prolonged (the “diagnostic odyssey”); in a European survey of the time to diagnosis of eight rare diseases, including cystic fibrosis (MIM: 602421 ) and fragile X syndrome (MIM: 309550 ), 25% of families waited between 5 and 30 years for a diagnosis, and the initial diagnosis was incorrect in 40% of these families.

In aggregate, clinically recognized Mendelian phenotypes compose a substantial fraction (∼0.4% of live births) of known human diseases, and if all congenital anomalies are included, ∼8% of live births have a genetic disorder recognizable by early adulthood.This translates to approximately eight million children born worldwide each year with a “serious” genetic condition, defined as a condition that is life threatening or has the potential to result in disability.In the US alone, Mendelian disorders collectively affect more than 25 million people and are associated with high morbidity, mortality, and economic burden in both pediatric and adult populations.Birth defects, of which Mendelian phenotypes compose an unknown but most likely substantial proportion, are the most common cause of death in the first year of life, and each year, more than three million children under the age of 5 years die from a birth defect, and a similar number survive with substantial morbidity. Beyond the emotional burden, each child with a genetic disorder has been estimated to cost the healthcare system a total of $5,000,000 during their lifetime.

Studies in mice with engineered loss-of-function (LOF) mutations suggest that the majority of the gene knockouts compatible with survival to birth are associated with a recognizable phenotype, whereas ∼30% of gene knockouts lead to in utero or perinatal lethality.Of the latter, it remains to be determined whether partial LOF mutations (i.e., hypomorphic alleles) or other classes of mutations (e.g., gain of function, dosage differences due to gene amplification,etc.) in the same genes might result in viable phenotypes. Nevertheless, given the high degree of evolutionary conservation between humans and mice, mutations in the majority of non-redundant human protein-coding genes are likely to result in Mendelian phenotypes, most of which remain to be characterized ( Figure 2 ).

Of approximately ∼19,000 protein-coding genes predicted to exist in the human genome, variants that cause Mendelian phenotypes have been identified in ∼2,937 (∼15.5%; orange squares). Genes underlying ∼643 Mendelian phenotypes (∼3.38%; gray squares) have been mapped but not yet identified. On the basis of analysis of knockout mouse models, LOF variants in up to ∼30% of genes (∼5,960; red squares) could result in embryonic lethality in humans. Note that the consequences of missense variants in these genes could be different. For a minimum of ∼52% of genes (∼10,330; blue squares), the impact in humans has not yet been determined. Collectively, ∼16,063 genes remain candidates for Mendelian phenotypes.

Our knowledge of the diversity of Mendelian phenotypes is increasingly sophisticated, but substantial gaps remain.Specifically, it is challenging to establish the number of Mendelian phenotypes that exist, to delineate new Mendelian phenotypes, to distinguish novel from known Mendelian phenotypes, to define what constitutes expansion of a known phenotype, and to develop metrics for assessing the relationships and diversity of phenotypes caused by variants in the same gene. These gaps are due, in part, to the diversity of “normal” human morphology and physiology, the challenge of defining normal versus abnormal, and the difficulty of setting limits for traits with quantitative distributions.Moreover, biological variation is, in large part, not binary but rather described by partially overlapping distributions.With these caveats in mind, we analyzed OMIM entries (February 2015) to identify 7,440 rare Mendelian phenotypes (7,315 monogenic; 125 chromosomal duplications and/or deletions). This number is not static; ∼300 new Mendelian phenotypes are added to OMIM each year, and this probably underestimates the number of phenotypes newly recognized each year. Delineation of new Mendelian phenotypes in populations worldwide is limited by a lack of infrastructure, resources, and expertiseMoreover, studies of model organisms show that the number and types of recognized phenotypes increase with expanding environmental challenges.Therefore, to completely enumerate “all” human Mendelian phenotypes, it will be necessary to consider a more comprehensive span of environmental conditions and develop more-sophisticated tools to evaluate phenotype.

Use of other approaches, such as identification of common variants of small effect, might be less effective at facilitating drug development. For example, of 348 proteins specifically targeted by current therapeutics, 42.5% are encoded by a gene responsible for a Mendelian phenotype, whereas only 28.2% of proteins targeted by current therapeutics are encoded by a gene found within GWAS signals (the closest downstream and upstream genes were counted per intergenic signal, and all overlapping genes were counted per coding signal). Accounting for the over-representation of genes underlying Mendelian phenotypes in GWAS signals, 27.3% of proteins targeted are only encoded by a gene underlying a Mendelian phenotype, whereas 13.6% of proteins targeted are found only in a GWAS signal. Moreover, compared to therapeutics that are still in clinical trials, currently approved therapeutics are enriched with drugs that target a protein encoded by a gene in which mutations are responsible for a Mendelian phenotype (32.8% versus 42.5%), suggesting that drugs associated with a gene underlying a Mendelian phenotype more often receive FDA approval. No such relationship is observed for genes found within GWAS signals (28.2% are FDA approved, whereas 29.4% are in clinical trials). Accordingly, using information about whether a target protein is encoded by a gene underlying a Mendelian phenotype might help to stratify drug candidates for development.

Development of new therapeutics to address common diseases that constitute major public-health problems is limited by the ignorance regarding the fundamental biology underlying disease pathogenesis.As a consequence, 90% of drugs entering human clinical trials fail, commonly because of a lack of efficacy and/or unanticipated mechanism-based adverse effects.Studies of families affected by rare Mendelian phenotypes segregating with large-effect mutations that increase or decrease risk for common disease can directly establish the causal relationship between genes and pathways and common diseases and identify targets likely to have large beneficial effects and fewer mechanism-based adverse effects when manipulated. For example, certain Mendelian forms of high and low blood pressure are due to mutations that cause increases and decreases, respectively, in renal salt reabsorption and net salt balance; these discoveries identified promising new therapeutic targets, such as KCNJ1 (potassium channel, inwardly rectifying, subfamily J, member 1 [MIM: 600359 ]), for which drugs are now in clinical trials. Understanding the role of salt balance in blood pressure has provided the scientific basis for public-health efforts in more than 30 countries to reduce heart attacks, strokes, and mortality by modest reduction in dietary salt intake.Similarly, understanding the physiological effects of CFTR (cystic fibrosis transmembrane conductance regulator [MIM: 602421 ]) mutations responsible for cystic fibrosis has led to allele-specific therapies that significantly improve pulmonary function in affected individuals.Other common-disease drugs based on gene discoveries for Mendelian phenotypes (e.g., orexin antagonists for sleep,beta-site APP-cleaving enzyme 1 [BACE1] inhibitors for Alzheimer dementia,proprotein convertase, subtilisin/kexin type 9 [PCSK9] monoclonal antibodies to lower low-density lipoprotein levels) are undergoing advanced clinical trials. Discoveries such as these will directly facilitate the goals of the Precision Medicine Initiative.

The etiologies of common diseases, such as hypertension, coronary artery disease, diabetes, obesity, scoliosis, and autism, are heterogeneous and typically include a small subset of individuals with a monogenic condition underlying their diagnosis with a common disease.The variants responsible for this small fraction of affected individuals rarely explain much of the genetic contribution to these common diseases,but they are nevertheless often highly relevant to our understanding of more-general mechanisms of these conditions.A classic example in cardiovascular disease research is the identification of the genetic basis of rare, monogenic forms of hypercholesterolemia, which provided critical insights into the relevance of lipid transport.In turn, these findings have led to the development of new therapies for common, complex cardiovascular diseases by targeting the implicated genes and pathways.Collectively, nearly 20% of genes implicated in Mendelian phenotypes also either contain or are nearest to a variant responsible for a genome-wide association study (GWAS) signal that achieves genome-wide significance for a complex trait ( Figure 3 A; Supplemental Material and Methods Figure S1 ). In contrast, ∼15% of all genes overall underlie a Mendelian phenotype, suggesting that genes implicated in Mendelian phenotypes are enriched in GWAS signals. The fraction of genes that are found near GWAS signals and in which variants are responsible for Mendelian phenotypes is also positively correlated with the strength of association ( Figure 3 B). Widespread co-morbidity among Mendelian phenotypes and complex diseases provides further evidence that variation in genes that underlie Mendelian phenotypes plays a role in complex disease.

(B) Cumulative plot of the proportion of GWAS signals in which a gene underlying a Mendelian phenotype (MP) was found (orange dots) and GWAS signals in which a gene underlying a Mendelian phenotype was not found (gray dots). At virtually every p value, a higher proportion of GWAS signals overlapped genes underlying Mendelian phenotypes.

(A) Plot of the fraction of GWAS-signal genes that are also implicated in Mendelian phenotypes (MPs). Each orange dot represents the proportion of GWAS signals that, in a sliding window of 500 GWAS signals, are mapped to a gene also known to underlie a Mendelian condition. In GWAS signals, approximately 26.6% of genes with the top 500 lowest p values underlie a Mendelian phenotype. In contrast, only 14.2% of genes overall are known to underlie a Mendelian phenotype, suggesting that GWAS signals are more likely to be enriched with genes implicated in Mendelian phenotypes. Varied colored dots represent the percentage of genes underlying a Mendelian phenotype in GWAS signals underlying different phenotypic categories as follows (of increasing percentages from bottom to top): 10% for reproductive traits (blue); 11% for respiratory traits (gold); 13% for autoimmune inflammatory traits (dark green); 16% for immunologic traits (blue); 17% for mental-health traits (teal); 19% for infectious-disease traits (gray); 21% for anthropometric traits (brown); 23% for cancer (red); 25% for cardiovascular traits (tan); 26% for metabolomics traits (yellow); 28% for pharmacogenetic traits (green); and 33% for musculoskeletal traits (blue).

Although substantial progress has been made toward identifying the genetic basis of Mendelian phenotypes, the genes underlying about half of all known Mendelian phenotypes (i.e., 3,152) have not yet been discovered, despite the fact that ∼20% (i.e., 643) have been mapped (∼80% with robust linkage data [e.g., significant linkage to a single region or recurrent structural variants (SVs) involving the same region] according to a manual review) as per data from OMIM (February 2015). Most of these “unsolved” Mendelian phenotypes are rare and often have high locus heterogeneity and/or are intractable to mapping-based approaches because they are caused by de novo mutations in the germline or mosaicism in somatic tissues.Sequencing technologies and analytical approaches have now sufficiently matured to make gene discovery at scale for all Mendelian phenotypes feasible and cost effective. To this end, national and international efforts led by the human genetics community have emerged to identify the genetic basis of Mendelian phenotypes at scale even as the number of recognized phenotypes continues to increase each year.

The first successful efforts to identify genes underlying Mendelian phenotypes often required extensive prior knowledge of disease biology, including the identity of the affected protein. In 1986, discovery of mutations causing chronic granulomatous disease in CYBB (MIM: 300481 ) demonstrated that mapping followed by sequencing of genes within the maximum-likelihood interval offered a promising alternative for discovering genes underlying disease, and during the next 10 years, 42 genes associated with Mendelian phenotypes were identified via positional cloning.The ensuing two decades witnessed a steady accumulation of genes discovered to underlie Mendelian phenotypes by a combination of positional cloning and candidate-gene approaches. However, it became increasingly obvious that gene identification for most Mendelian phenotypes without a known cause was difficult via these approaches. Gene-discovery strategies based on WES and WGS introduced powerful alternatives that were agnostic to both known biology and mapping data.Combined with conventional genetic approaches, WES and WGS have proved to be disruptive technologies that have rapidly accelerated the pace of discovery of genes underlying Mendelian phenotypes, such that the pace of gene discovery has increased from an average of ∼166 per year between 2005 and 2009 to 236 per year between 2010 and 2014. Between January of 2010 and February 2015, ∼555 and ∼613 genes associated with monogenic Mendelian phenotypes were discovered via next-generation sequencing approaches and conventional approaches, respectively. However, over this time period, there has been a rapid shift toward increasing the use of WES and WGS, and since 2013, WES and WGS have made almost three times as many discoveries as conventional methods ( Figure 4 Figure S2 ).

Since the introduction of WES and WGS in 2010, the pace of discovery of genes implicated in Mendelian phenotypes per year has increased substantially, and the proportion of discoveries made by WES or WGS (blue) versus conventional approaches (red) has steadily increased (see Supplemental Material and Methods for a detailed description of the analysis). Since 2013, WES and WGS have discovered nearly three times as many genes as conventional approaches.

The CMGs were expected to make substantial progress toward discovering the genomic basis of most, if not all, known Mendelian phenotypes. Specifically, the CMGs had the following goals: (1) assess the genetic basis of ∼1,000 Mendelian phenotypes in collaboration with investigators worldwide; (2) develop new methods and approaches for discovering the genetic basis of Mendelian phenotypes; (3) generate public resources that can be leveraged by the biomedical community to facilitate investigator-initiated gene-discovery efforts, studies of gene function, and clinical translation and interpretation of human genome variation; and (4) lead and coordinate US efforts with other large-scale projects aimed at discovering genes implicated in Mendelian phenotypes. Key to accomplishing these goals was that collaborating clinicians and investigators were able to access WES, WGS, and technical expertise from the CMGs at no cost and preserve their control over data sharing, analysis, and rights to publish. It was also anticipated that the overall genetic architecture of Mendelian phenotypes would be further elucidated and that novel underlying genetic mechanisms might be revealed.

Widespread, convenient, and cost-effective application of WES and WGS for finding genes underlying Mendelian phenotypes posed a number of challenges when the strategy was first introduced.Moreover, achieving the goal of finding all genes underlying all Mendelian phenotypes requires searching the entire human population and therefore necessitates a worldwide collaboration among clinicians and scientists to identify and characterize both novel and well-known Mendelian phenotypes. Accordingly, multiple national efforts were initiated to establish the collaborative framework and physical infrastructure necessary for undertaking large-scale identification of affected individuals, genomic sequencing, and gene discovery for Mendelian phenotypes; these included the NHGRI- and NHLBI-supported Centers for Mendelian Genomics (CMGs),Finding of Rare Disease Genes (FORGE) Canada,and the Wellcome Trust Deciphering Developmental Disabilities (WTDDD),each of which was established in 2011. The CMGs consist of three centers: (1) the University of Washington Center for Mendelian Genomics, (2) the Baylor-Hopkins Center for Mendelian Genomics, and (3) the Yale Center for Mendelian Genomics. All of these consortia, as well as a myriad of individual investigators and small research groups, have made major contributions to gene discovery for Mendelian phenotypes over the past 4 years.

CMG Discoveries

2 MP, provide a mechanism for investigators with a candidate gene for a Mendelian phenotype to connect with other clinicians and/or basic scientists around the world with an interest in the same gene and to link phenotypic profiles to rare variants, respectively. 80 Sobreira N.

Schiettecatte F.

Boehm C.

Valle D.

Hamosh A. New tools for Mendelian disease gene identification: PhenoDB variant analysis module; and GeneMatcher, a web-based tool for linking investigators with an interest in the same gene. Figure 5 Overview of Deliverables from the CMGs Show full caption Collectively, the CMGs have worked with 529 investigators from 36 countries to collect and sequence 16,226 exomes and 96 genomes. Analyses of these data have resulted in 956 discoveries. These discoveries, as well as tools and technical methods developed by the CMGs, have led to the publication of 146 manuscripts. Figure 6 Worldwide Interactions with the CMGs Show full caption In collaboration with 529 investigators representing 261 institutions in 36 countries (or 1 of every 5 countries [orange] in the world), the CMGs have collected 18,863 samples from 8,838 families. Approximately 60% (n = 20) of these countries are located outside of North America, Europe, or Australia. As of January 2015, 18,863 samples representing 579 known and 470 novel Mendelian phenotypes from 8,838 families (see the CMG investigated phenotypes in the Web Resources) have been assessed by the CMGs in partnership with 529 investigators from 261 institutions in 36 countries (i.e., ∼1 of every 5 countries in the world) ( Figure 5 ). 60% of countries, 32% of institutions, and 20% of investigators are located outside of North America, Europe, or Australia ( Figure 6 ). Exome and whole-genome data have been produced for 16,226 and 96 samples, respectively, and about half of these sequences can be deposited in dbGaP ( Figure 5 ). Additionally, data for newly identified causal variants have been made available through ClinVar and via a new track on the UCSC Genome Browser. Finally, web-based tools developed by the CMGs, such as GeneMatcher and GenoMP, provide a mechanism for investigators with a candidate gene for a Mendelian phenotype to connect with other clinicians and/or basic scientists around the world with an interest in the same gene and to link phenotypic profiles to rare variants, respectively.Accordingly, the CMGs have empowered the entire international rare-disease research community.

To assess progress toward identifying the genes underlying Mendelian phenotypes, it is critical to apply objective discovery metrics. To date, it has been challenging to quantify and compare reported discovery rates across different contexts (e.g., clinical service versus research). Because of its perceived simplicity, one discovery metric that has been suggested is the “solve rate,” or the proportion of investigated families in whom a causal variant for a Mendelian phenotype is identified. This definition is not particularly useful on its own given that one could, for example, achieve a high solve rate by sequencing only families affected by disorders for which the mutated gene was previously known.

81 MacArthur D.G.

Manolio T.A.

Dimmock D.P.

Rehm H.L.

Shendure J.

Abecasis G.R.

Adams D.R.

Altman R.B.

Antonarakis S.E.

Ashley E.A.

et al. Guidelines for investigating causality of sequence variants in human disease. , 82 Casanova J.L.

Conley M.E.

Seligman S.J.

Abel L.

Notarangelo L.D. Guidelines for genetic studies in single patients: lessons from primary immunodeficiencies. Table 1 Definitions of Terms Used to Characterize Discovery Type Term Definition Phenotype the collection of observable or measurable traits of an individual Known phenotype a Mendelian phenotype with a MIM number Explained, known phenotype a Mendelian phenotype with a MIM number and for which a causal variant(s) in one or more genes is known Unexplained, known phenotype a Mendelian phenotype with a MIM number and for which no causal variant(s) has been reported New phenotype a Mendelian phenotype without a MIM number (MIM number assigned thereafter) Known gene a gene in which a causal variant(s) has been previously associated with a Mendelian phenotype Novel gene a gene in which a causal variant(s) has not been previously associated with a Mendelian phenotype Known gene; explained, known phenotype a Mendelian phenotype with a MIM number and for which a causal variant was found in a gene previously associated with the same phenotype Novel gene; unexplained, known phenotype a Mendelian phenotype with a MIM number, for which no causal variant(s) has been reported, and for which a causal variant was discovered in novel gene Novel gene; new phenotype a Mendelian phenotype without a MIM number and for which a causal variant(s) was found in a gene in which a causal variant(s) has not been previously associated with a Mendelian phenotype (MIM number assigned thereafter) Known gene; unexplained, known phenotype a Mendelian phenotype with a MIM number, for which a causal variant(s) has not been reported, and for which a causal variant was found in a gene previously associated with a different phenotype Known gene; new phenotype a Mendelian phenotype without a MIM number and for which a causal variant was found in a gene previously associated with a different phenotype (MIM number assigned thereafter) Phenotype expansion expansion of the spectrum of clinical characteristics of an explained, known Mendelian phenotype Figure 7 Criteria for Establishing Causality of Discoveries Show full caption Flow diagram of decisions and criteria used for establishing whether gene discoveries by CMGs ( Table 2 ) were considered causal by conservative or suggestive guidelines. In an attempt to provide clearly defined measures, we developed three complementary discovery metrics and applied them to phenotypes studied by the CMGs on the basis of strict criteria for (1) variant causality, (2) definitions of novel and known phenotypes, and (3) definitions of novel and known genes underlying a Mendelian phenotype ( Table 1 Figure 7 ). This was necessary because although multiple guidelines for assessing causality have previously been proposed,none have been operationalized, much less used for assessing large-scale gene-discovery efforts studying thousands of families and samples across hundreds of rare Mendelian phenotypes.

The overall diagnostic rate, defined as the proportion of families in whom a causal variant was identified, was 0.31 and 0.40 per conservative and suggestive causality criteria, respectively. This is comparable to diagnostic rates achieved by clinical WES, but neither the diagnostic rate in the CMGs nor its comparison to diagnostic rates of clinical service labs is a highly informative metric of success. On the one hand, families studied by the CMGs are specifically selected to have phenotypes that are less likely to be explained by an already known gene, thereby potentially lowering the CMG diagnostic rate. On the other hand, the CMGs often have the advantage of studying multiple individuals in a family and multiple families affected by the same unexplained phenotype, which is predicted to improve the diagnostic rate.

Table 2 Summary of Discoveries of Genes Underlying Mendelian Phenotypes Discovery Type Evidence of Causality Conservative Suggestive Total Known known gene; explained, known phenotype 320 19 339 Novel phenotype expansion 174 24 198 known gene; unexplained, known phenotype 4 0 4 known gene; new phenotype 17 7 24 novel gene; unexplained, known phenotype 25 27 52 novel gene; new phenotype 107 232 339 Total novel 327 290 617 Total number of discoveries 647 309 956 To date, 647 and 309 genes by conservative and suggestive causality criteria, respectively, or a total of 956 genes, were discovered by the CMGs to be implicated in a Mendelian phenotype ( Table 2 ). Of the genes discovered by conservative criteria, 327 were (1) a gene that was not previously known to underlie a Mendelian phenotype (i.e., novel gene) but was found to be implicated in a known but unexplained phenotype (i.e., a phenotype with an OMIM number but for which no underlying gene was known; n = 25) or a novel phenotype (i.e., without an OMIM number; n = 107); (2) a gene that was previously known to underlie a Mendelian phenotype (i.e., known gene) and explained either a different known (n = 4) or a novel (n = 17) phenotype; or (3) a gene that was previously implicated in a Mendelian phenotype and was now discovered to be associated with an expanded set of clinical features (i.e., phenotypic expansion, n = 174).

Of our gene discoveries, 320 involved identification of a known gene that explained a known phenotype; the vast majority of these phenotypes (e.g., non-syndromic hearing loss [MIM: PS220290 PS124900 ]), asphyxiating thoracic dysplasia (MIM: PS208500 ), and oculocutaneous albinism (MIM: PS203100 ) had high locus heterogeneity. Less commonly, clinical screening failed to identify a causal variant that was discovered by WES, or a family was recognized in retrospect to be affected by an explained, known Mendelian phenotype with a clinical presentation that was unusual but not different enough to be classified as a phenotypic expansion. Overall, the causal-gene-identification rate, defined as the ratio of causal genes identified to Mendelian phenotypes studied, was 0.51 genes identified per Mendelian phenotype studied.

Figure 8 Breakdown of Discoveries Made in the 1,049 Mendelian Phenotypes Assessed in the CMG Pipeline Show full caption Phenotypes entering the CMG pipelines are putatively either new phenotypes or unexplained, known phenotypes. A substantial fraction (i.e., 32%) of phenotypes were found to have causal variants in known genes, consistent with explained, known phenotypes. However, a larger fraction (40%) of phenotypes assessed resulted in discoveries of novel genes in addition to the expansion of 198 Mendelian phenotypes. For ∼28% of phenotypes assessed, no causal variant has yet been discovered. Novel genes are those that were not associated with any Mendelian phenotype when a project was accepted by the CMGs. Phenotypes are defined on a gene- and/or genotype-centric basis—if a novel gene was discovered for a known, explained phenotype, the phenotype was reclassified as a novel phenotype because it is almost certain that deeper phenotyping would reveal (molecular, biochemical, or physiological) differences that distinguish the novel phenotype from the previously known, explained phenotype caused by mutations in another gene. If gene discoveries meeting conservative and suggestive criteria are combined, 617 were (1) a novel gene that was found to underlie a known, unexplained Mendelian phenotype (n = 52) or a novel phenotype (n = 339); (2) a known gene that explained a different known Mendelian phenotype (n = 4) or a novel Mendelian phenotype (n = 24); or (3) a gene underlying a phenotypic expansion (n = 198) ( Figure 8 ). 339 discoveries were for a gene previously known to underlie the Mendelian phenotype studied. Accordingly, the causal-gene-identification rate combining conservative and suggestive criteria was 0.76 genes per Mendelian phenotype studied.

83 Alazami A.M.

Patel N.

Shamseldin H.E.

Anazi S.

Al-Dosari M.S.

Alzahrani F.

Hijazi H.

Alshammari M.

Aldahmesh M.A.

Salih M.A.

et al. Accelerating novel candidate gene discovery in neurogenetic disorders via whole-exome sequencing of prescreened multiplex consanguineous families. , 84 Rehman A.U.

Santos-Cortez R.L.

Drummond M.C.

Shahzad M.

Lee K.

Morell R.J.

Ansar M.

Jan A.

Wang X.

Aziz A.

et al. University of Washington Center for Mendelian Genomics

Challenges and solutions for gene identification in the presence of familial locus heterogeneity. Figure 9 Discovery Metrics under Different Models of Inheritance for Mendelian Phenotypes Studied by the CMGs Show full caption (A) The percentage of Mendelian phenotypes for which a gene was discovered on the basis of conservative causality criteria per different models of inheritance with mapping data (dark green) or without mapping data (light green) is shown. Also shown is the percentage of Mendelian phenotypes for which a causal gene was not found per different models of inheritance with mapping data (dark gray) or without mapping data (light gray). Note that for most phenotypes analyzed under an autosomal-recessive homozygous model that failed, mapping data were available; however, the statistical significance of the mapping data varied (e.g., number and length of runs of homozygosity, magnitude of LOD score, etc.). The mean number of genes discovered per Mendelian phenotype was 0.52 or 0.76 on the basis of only conservative or combined conservative and suggestive criteria, respectively. These figures do not include results from persons found to have more than one Mendelian phenotype. (B) Classification of discoveries of genes underlying Mendelian phenotypes as known (white squares) or novel (blue squares). (C) Percentage of Mendelian phenotypes for which a novel discovery (dark blue) or known discovery (light blue) was made on the basis of conservative causality criteria per different models of inheritance. The mean number of novel discoveries per Mendelian phenotype was 0.52 or 0.66 on the basis of only conservative or combined conservative and suggestive criteria, respectively. Abbreviations are as follows: AD, autosomal dominant; AR, autosomal recessive (when recessive inheritance was clear, but analysis of both consanguineous and non-consanguineous families contributed to the discovery); AR homozygous, autosomal recessive in a consanguineous family; AR heterozygous, autosomal recessive in a non-consanguineous family (i.e., compound-heterozygous mutations). Analysis of gene-identification rates by mode of inheritance used for modeling segregation in the analysis of each phenotype provides further resolution about the types of Mendelian phenotypes for which gene discovery was successful and the challenges that remain. Gene-identification rates based on conservative criteria ranged from 0.29 (multiple models) to 0.66 (autosomal recessive) ( Figure 9 A); for comparison, if a causal gene were identified for every phenotype, this ratio would approach a maximum value of 1. Gene discoveries in consanguineous families were sometimes complicated by locus heterogeneity and by the rarity of the phenotype, consistent with a lower-than-anticipated gene-identification rate (0.60).Lastly, the novel-discovery rate, or the proportion of Mendelian phenotypes in which the gene was newly discovered to underlie a novel or unexplained phenotype, including phenotype expansions, was 0.52 ( Figures 9 B and 9C). Thus, the novel-discovery rate based on conservative and suggestive causality criteria was 0.66.