It is worth reminding ourselves that regardless of its impact on medicine, the sequencing of the human genome represents a monumental achievement. It is the blueprint that quite literally specifies how to build a human, even if we do not yet fully understand the means by which it does so. To have gone from observing the double helix to the assembly and rudimentary understanding of the human genome’s 3 billion nucleotides in 50 years is a stunning trajectory, with no obvious equivalent other than our progression from the first powered flight to a moon landing in about the same amount of time. Furthermore, although it has only been 15 years since an achievement that will be remembered for millennia, the Human Genome Project (HGP) has already had scientific and economic impacts that more than amply justify its cost ().

This praise notwithstanding, we should not forget that the prioritization and cost of the HGP were justified by, and its completion celebrated with, the setting of ambitious expectations about the time frame on which it would transform the diagnosis, treatment, and prevention of a broad swath of human diseases. In this Perspective, we attempt to take stock of the progress made, as well as the hurdles to, the clinical translation of the human genome—the nascent field of genomic medicine. For the citizens that funded it, has the bet of the HGP paid off? If it has not, will it ever? Is the value proposition as originally laid out still justified, or do we need to recalibrate?

This is a large topic to undertake, and we have organized this review as follows. First, we summarize the key technological developments since the HGP. Second, we consider the successes and challenges to genomic medicine in four areas: common inherited diseases, rare inherited diseases, reproductive health, and cancer ( Figures 1 and 2 ). Finally, we take stock of the field as a whole and suggest areas that warrant further investment to fully unlock its potential.

There are many modalities for genomics to have an impact on clinical care, with entry points for application that span the human life cycle from conception to death.

The precipitous rate at which genotyping and sequencing costs have dropped was scarcely anticipated at the completion of the HGP in 2003. Given that it has only been a few years since the full maturation of these technologies, the number of humans that have been already been genotyped by arrays or subjected to exome or genome sequencing is staggering. Although a comprehensive count is not easily achieved, it is estimated that the number of individuals genotyped by direct-to-consumer genealogy companies was less than 1 million as recently as 2014 but 3 million by 2016 and 12 million by 2018 ( Figure 3 , left). The number of individual humans whose genomes have been sequenced is estimated to have gone from 1 in 2003 to over 50,000 by 2015 and over 1.5 million by 2018 ( Figure 3 , right). These trends are driven by distinct forces in the research, medical, and direct-to-consumer fields and do not show any signs of abating. For example, large cohorts, including nationwide efforts such as the UK Biobank and US All of Us programs, are collectively targeting the genome sequencing of over 25 million humans ().

We show estimates of number of individuals that have been received genetic testing in the form of direct-to-consumer microarrays (DTC) and non-invasive prenatal testing (NIPT) (left) and whole-genome sequencing (WGS) (right) as a function of time. For NIPT, estimates are from, and. For DTC and WGS, estimates are from Illumina (personal communication), with estimates of WGS based on equivalents of 30X coverage.

If there is one area where we have over-delivered as a field since the HGP, it is in the development and deployment of technologies for ascertaining interindividual genetic differences. Two technologies now critically underpin nearly every aspect of genomic medicine. First, high-density DNA microarrays can be used to genotype millions of specific positions in each of many human genomes. Coupled with population-based maps of linkage disequilibrium (LD), array-based genotyping enables the ascertainment of most common genetic variation in a human genome for a remarkably low cost (initially hundreds, now tens, of dollars per individual) (). Second, massively parallel DNA sequencing technologies, which have steadily improved since their introduction in 2005, can generate billions of short sequencing reads within a day or less (). Also known as next-generation sequencing (NGS), such platforms now permit the near-comprehensive ascertainment of both rare and common genetic variation for about $1,000 per individual (or a few hundred dollars, if one selectively sequences the exome or coding regions of the genome). Importantly, both array-based genotyping and NGS depend heavily on the availability of a high-quality reference genome such as the one generated by the HGP, the former for designing probes with which to query positions of common variation and the latter for mapping short reads to, so as to localize bona fide variants and distinguish them from sequencing errors. Of note, NGS has also become an incredibly powerful tool for quantifying a broad range of molecular phenomena, e.g., transcriptomes (RNA sequencing, RNA-seq), protein-DNA binding (chromatin immunoprecipitation sequencing, ChIP-seq), etc., essentially through the counting of molecules ().

The HGP was completed in 2003 at an estimated cost of $2.7 billion, primarily through the brute-force scaling of automated Sanger sequencing of large insert clones, followed by hierarchical assembly (). The commonplace use of the article “the” in conjunction with “human genome” emphasizes the nearly perfect similarity of individual humans to one another (∼99.9%) but downplays the millions of differences (∼0.1%) that make each of us genetically unique. However, the raison d’etre for the field of human genetics lies not with our similarities but our differences—more specifically, with disentangling how our genotypic differences underlie our phenotypic differences.

Genomic Medicine → Common Disease

Manolio et al., 2009 Manolio T.A.

Collins F.S.

Cox N.J.

Goldstein D.B.

Hindorff L.A.

Hunter D.J.

McCarthy M.I.

Ramos E.M.

Cardon L.R.

Chakravarti A.

et al. Finding the missing heritability of complex diseases. Risch and Merikangas, 1996 Risch N.

Merikangas K. The future of genetic studies of complex human diseases. Collins et al., 1997 Collins F.S.

Guyer M.S.

Charkravarti A. Variations on a theme: cataloging human DNA sequence variation. Gunderson et al., 2005 Gunderson K.L.

Steemers F.J.

Lee G.

Mendoza L.G.

Chee M.S. A genome-wide scalable SNP genotyping assay using microarray technology. International HapMap Consortium, 2005 International HapMap Consortium

A haplotype map of the human genome. Burdett et al., 2018 Burdett T.

Hastings E.

Welter D. SPOT, EMBL-EBI, and NHGRI. Whether fairly or not, much of the discussion about the perceived shortcomings of genomic medicine has centered on genome-wide association studies (GWASs). In brief, most genetic variants in individual human genomes are common (allele frequency > 1%), leading to the hypothesis that our individual genetic risk for common diseases derives mostly from common variants, as opposed to the rare variants or de novo mutations that underlie Mendelian disorders (). The GWAS framework, first proposed by Risch and Merikangas in 1996 as an alternative to linkage studies (which had succeeded for Mendelian diseases but largely failed for common diseases), is designed to detect even subtle associations between common variants and common diseases on a systematic, genome-wide basis (). Around 2005, several developments converged to enable well-powered GWAS, including public catalogs of common human genetic variants, initial maps of LD among common variants in human populations, and cost-effective array-based genotyping technologies (). Over the ensuing decade, through the genome-wide genotyping of increasingly large cohorts of cases and controls, the imputation of additional genotypes based on LD maps, and the application of appropriately corrected statistical tests, the field has collectively discovered over 100,000 unique, robust associations between common variants and common diseases ().

This sounds like success—why are we so unhappy? It is worth taking a step back and asking: for what reasons do we want to investigate the genetic basis for common human diseases in the first place? One motivation is risk prediction—that is, using genetic factors to better stratify which individuals are at higher risk for specific common diseases, which may facilitate preventative measures and/or the better allocation of resources across a heterogeneously susceptible population. A second motivation is target identification, grounded in the view that our historical approach to understanding the pathogenesis of common diseases has been largely ad hoc and therefore prone to false positives and negatives. In contrast, GWASs provide a systematic, genome-wide approach for identifying genes that play a role in each disease. As this should result in a longer, higher-quality list of potential drug targets, GWASs were/are expected by some to accelerate our ability to develop effective therapies.

So what has gone wrong? A first challenge, primarily to the goal of risk prediction, has been that with few exceptions, the genetic component of common human disease risk consists of an extremely large number of variants of small effects, the vast majority of which would require astronomically large study sizes to definitively implicate. A subset of these weakly associated variants achieves genome-wide significance, but the effect sizes are usually modest even for these, and they have limited predictive power whether taken individually or considered together.

Manolio et al., 2009 Manolio T.A.

Collins F.S.

Cox N.J.

Goldstein D.B.

Hindorff L.A.

Hunter D.J.

McCarthy M.I.

Ramos E.M.

Cardon L.R.

Chakravarti A.

et al. Finding the missing heritability of complex diseases. Fuchsberger et al., 2016 Fuchsberger C.

Flannick J.

Teslovich T.M.

Mahajan A.

Agarwala V.

Gaulton K.J.

Ma C.

Fontanillas P.

Moutsianas L.

McCarthy D.J.

et al. The genetic architecture of type 2 diabetes. Yang et al., 2010 Yang J.

Benyamin B.

McEvoy B.P.

Gordon S.

Henders A.K.

Nyholt D.R.

Madden P.A.

Heath A.C.

Martin N.G.

Montgomery G.W.

et al. Common SNPs explain a large proportion of the heritability for human height. A second challenge is that for most common diseases, genome-wide-significant common variants turn out to explain only a small minority of their heritability. This was recognized relatively early in the GWAS era, and many potential explanations were put forth (). A leading hypothesis that emerged was that rare variants might explain a substantial fraction of this “missing heritability,” motivating large-scale exome- and genome-sequencing studies of common diseases. However, even when reasonably well-powered studies are conducted, this hypothesis has not borne out, or at least not yet. For example, in type II diabetes, it was shown that lower-frequency variants are collectively likely to contribute less to heritability than common variants (). Recently, the mystery of missing heritability has been solved to a large extent by the demonstration that common variants as a class account for a much larger proportion of heritability than the subset that achieve genome-wide significance ().

A third challenge, primarily to the goal of therapeutic target identification, has been that the same LD structure that makes GWAS considerably cheaper to execute ironically limits its resolution, the consequence being that we have succeeded in implicating tens of thousands of haplotypes rather than tens of thousands of specific variants. Although considerable effort has been invested in fine-mapping, the task of confidently dissecting which variants are causally responsible for each observed association between a haplotype and a common disease can be maddening.

Finucane et al., 2015 Finucane H.K.

Bulik-Sullivan B.

Gusev A.

Trynka G.

Reshef Y.

Loh P.-R.

Anttila V.

Xu H.

Zang C.

Farh K.

et al. ReproGen Consortium Schizophrenia Working Group of the Psychiatric Genomics Consortium RACI Consortium

Partitioning heritability by functional annotation using genome-wide association summary statistics. A fourth challenge, also more relevant to the goal of target identification, is that the vast majority of the GWAS-defined heritability signal partitions to non-coding regions of the genome, and much of it to cell-type-specific regulatory elements (). As most enhancers are not definitively linked to genes, even if one is successful in pinpointing a causal regulatory variant, identifying the gene through which it mediates its subtle effects on disease risk, not to mention the mechanisms by which the gene acts, represents additional hurdles. A major rate limiter to further progress in this field is that we lack scalable solutions for any of these tasks, in part because they require non-trivial experiments incorporating disease-specific biology.

Boyle et al., 2017 Boyle E.A.

Li Y.I.

Pritchard J.K. An Expanded View of Complex Traits: From Polygenic to Omnigenic. A fifth challenge, raised in a recent perspective by Boyle & Pritchard, is that gene regulatory networks are so densely interconnected, and GWAS so well-powered to detect subtle effects, that many bona fide associations may be due to genes that subtly impact genes in core pathways but themselves are only peripherally relevant to the phenotype (). An implication of this “omnigenic” model is that many if not the vast majority of GWAS signals, even if successfully fine-mapped, may not meaningfully inform target identification nor our understanding of disease.

Finally, as the cohorts required to identify additional GWAS signals grow larger and larger, a broader question is when do we stop caring? How can one credibly argue for the marginal value of the 100th significant association with type II diabetes, when the vast majority of the first 99 have larger effect sizes but have yet to be effectively followed up on with respect to identifying the causal variants and genes?

On one hand, we feel that these are fair concerns to raise, provided that they are raised constructively. At the same time, for a goal as audacious as dissecting the basis of all common human diseases, we should not expect that the solution to every obstacle should have been established in advance, or we would have never gotten started. Furthermore, despite these non-trivial challenges, we actually remain quite positive with regard to the ultimate impact that GWASs will have on the diagnosis, treatment, and prevention of common diseases. There are four main reasons for our optimism.

Visscher et al., 2017 Visscher P.M.

Wray N.R.

Zhang Q.

Sklar P.

McCarthy M.I.

Brown M.A.

Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Willer et al., 2013 Willer C.J.

Schmidt E.M.

Sengupta S.

Peloso G.M.

Gustafsson S.

Kanoni S.

Ganna A.

Chen J.

Buchkovich M.L.

Mora S.

et al. Global Lipids Genetics Consortium

Discovery and refinement of loci associated with lipid levels. First, it is retrospectively unsurprising that many of the strongest GWAS associations came early, as smaller studies were only powered to detect large effects, and large effects seem more likely to be mediated through core genes and pathways. The vast majority of GWASs have been conducted in European populations (), and with the exception of some unique subpopulations, we are skeptical of the marginal value of ever-larger studies in these same populations for the purpose of gene discovery. However, each non-European population represents a fresh source of variants common to that population, and comparatively smaller studies in these populations may yield additional large-effect signals (presumably easier to fine-map and more likely to be therapeutically relevant) for a reasonable cost. Furthermore, smaller studies in populations with less LD (e.g., African ancestry) can facilitate the fine-mapping of associations identified in other populations ().

Visscher et al., 2017 Visscher P.M.

Wray N.R.

Zhang Q.

Sklar P.

McCarthy M.I.

Brown M.A.

Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Do et al., 2013 Do R.

Willer C.J.

Schmidt E.M.

Sengupta S.

Gao C.

Peloso G.M.

Gustafsson S.

Kanoni S.

Ganna A.

Chen J.

et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Voight et al., 2012 Voight B.F.

Peloso G.M.

Orho-Melander M.

Frikke-Schmidt R.

Barbalic M.

Jensen M.K.

Hindy G.

Hólm H.

Ding E.L.

Johnson T.

et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Nelson et al., 2015 Nelson M.R.

Tipney H.

Painter J.L.

Shen J.

Nicoletti P.

Shen Y.

Floratos A.

Sham P.C.

Li M.J.

Wang J.

et al. The support of human genetic evidence for approved drug indications. Motsinger-Reif et al., 2013 Motsinger-Reif A.A.

Jorgenson E.

Relling M.V.

Kroetz D.L.

Weinshilboum R.

Cox N.J.

Roden D.M. Genome-wide association studies in pharmacogenomics: successes and lessons. Second, there are an increasing number of clear examples of GWASs shedding light on the specific pathways and cell types that are most relevant for particular common diseases, of association signals being followed up on to implicate specific variants and genes, and of these insights having meaningful consequences for how the disease will be approached from a drug-discovery perspective. These are reviewed elsewhere (), but a particularly compelling example is the use of GWAS together with Mendelian randomization to convincingly demonstrate that the associations of LDL cholesterol and triglyceride levels with coronary artery disease (CAD) reflect causal relationships, whereas the association of HDL cholesterol levels with CAD does not (). A more general observation is that the pharmaceutical industry is an increasingly sophisticated consumer of GWAS analyses in order to make maximally well-informed decisions about target selection for drug discovery (). On a related topic, the list of genetic variants that impact drug response, i.e., pharmacogenomic interactions, is growing, with many of the newer discoveries made via GWASs (). Of note, despite their clear clinical utility and often large effect sizes, pharmacogenomics has been slow to achieve clinical adoption, illustrating how the science is often only the first of many challenges.

Gasperini et al., 2019 Gasperini M.

Hill A.J.

McFaline-Figueroa J.L.

Martin B.

Kim S.

Zhang M.D.

Jackson D.

Leith A.

Schreiber J.

Noble W.S.

et al. A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Mumbach et al., 2016 Mumbach M.R.

Rubin A.J.

Flynn R.A.

Dai C.

Khavari P.A.

Greenleaf W.J.

Chang H.Y. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Third, although we are still far from where we need to be, the toolkit for identifying the variants and genes that causally underlie GWAS signals is steadily improving. These include statistical methods that incorporate biochemical annotations (to identify which variants lie in bona fide regulatory regions), expression quantitative trait locus (QTL) studies (to locate genes whose expression is modulated by the same haplotype as a disease), massively parallel reporter assays (to pinpoint variants with regulatory effects), and CRISPR/Cas9 genome editing (to test the functional consequences of a specific variant, or potentially libraries of variants, in their endogenous genomic context). Methods are also advancing for linking regulatory elements to the gene(s) that they regulate, e.g., by 3C-based identification of “loops” or by coupling CRISPR/Cas9 perturbations and single-cell readouts (). To date, such tools have been applied to investigate only a small number of GWAS signals. However, as they become more widely used and more scalable, the number of common disease associations for which the causal variants and genes are known is likely to grow.

Wray et al., 2013 Wray N.R.

Yang J.

Hayes B.J.

Price A.L.

Goddard M.E.

Visscher P.M. Pitfalls of predicting complex traits from SNPs. Khera et al., 2018 Khera A.V.

Chaffin M.

Aragam K.G.

Haas M.E.

Roselli C.

Choi S.H.

Natarajan P.

Lander E.S.

Lubitz S.A.

Ellinor P.T.

Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Fourth, as long evidenced by plant and animal breeding programs, we need not restrict ourselves to genome-wide significant associations to build phenotypic predictors from GWAS results. Polygenic risk scores (PRSs) are not a new concept (), but an increasing number of studies are showing that PRSs that incorporate information from common variants throughout the genome (including from vast numbers of single nucleotide variants [SNVs] that fail to achieve genome-wide significance) achieve reasonable performance in stratifying risk for complex diseases in humans. For example, Khera et al. recently reported that a PRS trained on a portion of the UK Biobank (training set) identifies 2.5% of the remaining participants (test set) that are at 4-fold higher risk for CAD, essentially equivalent to monogenic hypercholesterolemia but impacting a much larger proportion of the population (). Analogous results were obtained for breast cancer and obesity. Through PRS, we may more effectively deliver on the HGP’s promise of better predicting individual risk for common diseases, without necessarily requiring any understanding of the biology on which those predictors are based.