These methods, which center on unbiased and largely genome-wide approaches, have enabled the majority of our understanding of the etiology of disease.

There are many challenges left in understanding the genetic basis of AD and PD and much work to do; however, there are other challenges that we must face in parallel. First, how can we efficiently move from a locus to a gene; second, when we have genes, how do we build an accurate picture of pathogenesis; and last, as we move toward potential therapies, how do we test these in an efficient manner? See Figure 3

Thus, we can see that the likely advances in PD and AD genetics will center on a combination of WGS, WES, deep resequencing of known loci, and GWA studies.

Genes that contain causal mutations are shown in blue; genes that contain moderate effect protein coding risk alleles are shown in red; GWA-identified loci that contain modest effect risk alleles are represented by proximal gene symbols in black.

What is notable in the genetic architecture of PD is that the same loci show up under different risk categories, i.e., some loci contain an allelic series across a range of frequencies and functional impact. A notable example exists at the gene encoding alpha-synuclein. SNCA point mutations are a rare and highly penetrant cause of young-onset PD, SNCA dosage mutations are also rare, and penetrance is linked to copy number (), and common variability at SNCA increases risk for PD by only a modest amount (Odds Ratio ∼ 1.5) ( Figure 2 ). Likewise, the p.G2019S mutation in LRRK2 occurs in about 2% of North American Caucasian PD patients and is moderately penetrant by late age. There also exist common protein coding variants in LRRK2 that increase risk for disease ∼2-fold, and, lastly, there are common non-coding variants at the LRRK2 locus that increase risk for disease ∼1.2-fold. This phenomenon of pleomorphic risk does not yet appear generalizable to every disease or locus, but it is reasonable to suggest that the known risk loci should be investigated for other types of disease risk. Again, second-generation sequencing provides an opportunity to address this question with the use of targeted resequencing. In such an approach, an entire genomic locus can be sequenced in a very large number of samples. This method has the potential to answer several questions: first, it can be used to fine map association signals identified by GWAS; second, it can be used to identify rare coding, non-coding, and copy number variants associated with disease independent from the GWA signal that nominated the locus; and third, in the identification of a coding variant associated with disease, this effort reveals the gene that is the biologic effector at a particular locus ( Figure 2 ).

Some success has already occurred in identifying rare risk alleles in AD. In 2012, Guerreiro and colleagues identified homozygous mutation of TREM2 as a cause of frontotemporal dementia (). Mutations in TREM2 were previously shown to cause Nasu-Hakola disease, a rare disease characterized by bone cysts and early-onset progressive dementia (). We had previously predicted that variability at genes causing young-onset autosomal recessive neurological diseases would contribute to late-onset neurological diseases (). Because of the involvement of TREM2 in recessive diseases with a neurological component, this gene was a natural candidate for investigation in AD. Our work, and that of others, revealed that indeed rare variants at TREM2 confer moderate risk for AD (, p. 2;). In both instances, the foundation for this discovery was next-generation sequencing data (both WGS and WES), and it is likely that this method will be the primary means for discovery of additional rare risk loci. The majority of variants that confer moderate risk for disease (OR > 2) discovered thus far have altered the amino acid sequence of a protein and this, coupled with the lower price of WES compared to WGS, means that it is often argued that WES will remain the dominant methodology for some time to come. This argument is quite circular, because thus far attempts to look at non-coding sequence for rare risk variants have been quite limited, and it is difficult to interpret the consequences of such variants. Inevitably, as the true cost of whole-genome sequencing approaches that of WES, whole-genome sequencing will become the dominant approach. Interpretation of non-coding variability will remain a challenge; however, as our understanding of the functional relevance of non-coding motifs and regions increases, this will improve. We predict that WGS will become a standard and that advances in technology will center on more efficient generation of WGS, and the use of single-molecule, low-error, and long-read sequences.

Following the considerable success of GWAS, one question that remains is how much genetic influence is left to find? An analysis of the heritability of PD using genetic sharing estimates suggests ∼30% of the risk for PD can be attributed to genetic factors. This is likely a significant underestimate, because very rare variants are difficult to capture using this method, and this approach, which estimates narrow sense heritability (or the proportion of trait variance that is due to additive genetic factors), does not take into account other genetic factors such as those that are dominant or multiplicative in effect. Notably, however, these analyses show that to date GWAS have identified only one-tenth of this narrow sense heritable component: thus, we can be sure that there is much left to find (). The next question then, is where to look? Undoubtedly there are common risk loci for AD and PD that remain undiscovered, and extending the size of current GWA studies is one approach that will yield results, as will performing GWA in diverse populations and using alternate data mining methods to prioritize loci. A limiting factor in identifying additional common risk loci through GWA centers both on sample availability and cost. Most of the previous efforts have centered on marginal increases in sample sizes and meta-analysis of existing datasets; however, in large part this pool of extant data has been exhausted. The continued search for such loci will therefore likely require large increases in sample size, and it is unlikely in the current funding climate that there will be much appetite for such a substantial resource investment. Thus, gains in this area are likely to be less dramatic than in previous years. However, it is also likely that some of this missing heritability exists in risk alleles that are too rare to detect using traditional GWAS methods, and this is an area of much interest and investigation.

In the fields of AD and PD this work has been driven by large consortia. In Alzheimer’s disease there now exist a large number of GWA efforts that have identified more than 18 loci that contain risk alleles (). Likewise there are now a large number of published GWA studies in PD, and since 2009 these have reported reliable and replicable risk loci (, p. 2;). In the latest meta-analysis of PD, GWA studies of 28 independent risk loci have been identified and confirmed (). Like most GWAS hits, individually these PD loci confer only modest risk for disease. If we examine these loci collectively, however, the 20% of individuals with the highest burden of genetic risk are about 3.5 times as likely to get disease than those 20% of individuals with the lowest burden of risk.

The challenge of identifying common risk variability was eventually met, again by a system-wide method: genotyping using whole-genome SNP arrays. The extremely high information content of SNP arrays, and our ability to predict the genotype of many millions of variants based on these data, provided a revolutionary tool in risk variant identification: genome-wide association (GWA). In this arena, GWA tests broadly and comprehensively, common genetic variation for association with a trait, and once again the system-wide nature of this method is a critical component of its success. GWA studies (GWAS) using these assays were initially controversial but have proven remarkably successful in identifying the genetic basis of complex disease and traits.

While the success in genetic linkage and positional cloning is illustrative, so too are our failures during the same time. In part because of the broad availability and ease of PCR amplification, candidate gene association studies were widely applied during the mid/late 1990s and early 2000s, with the aim of identifying risk variability for complex disease. These methods were easy to apply, relatively cheap, and extensively used; yet, they were also almost universally unsuccessful. With hindsight, the likelihood of nominating the correct gene to be tested and testing the right variants within it was vanishingly small, particularly as we later appreciated that the typical risk effect sizes associated with variants were too small to be seen by the majority of studies. Candidate gene association studies were in essence the antithesis of genome-wide unbiased approaches. Our ability to guess at the molecules involved in the rather enigmatic cascade of events that constitutes disease was proven to be poor.

While a detailed discussion of the monogenic forms of these diseases is outside the remit of this Perspective, a catalog of the dominant and recessive mutations for Alzheimer’s disease (AD) and PD can be found online ( http://www.molgen.ua.ac.be/ADmutations/ http://www.molgen.vib-ua.be/PDMutDB/ ) ().

The initial major successes in disease genetics were dependent on genetic linkage screens, a genome-wide and unbiased method of identifying segregating chromosomal regions. Linkage, followed by positional cloning, was the primary tool of genetic discovery from the late 1980s and is still in use today. While these efforts involved a great deal of time and resources, they were reliable and served as the mainstay of genetics for a number of years, identifying large numbers of genetic causes of monogenic disease (). The prime method in the search for causes of monogenic disease is now second-generation sequencing, usually whole-exome sequencing (WES) but also whole-genome sequencing (WGS). In the context of PD this has led to the identification of the p.D620N mutation in VPS35 as a cause of disease and the nomination of DNAJC13 and CHCHD2 mutations as disease causing (, p. 1;, p. 2;, p. 35;). Both WGS and WES will continue to provide insights into the causes of monogenic Parkinson’s disease (PD). In each of these approaches, it was the development of methods to interrogate an entire system that was key to the success, whether the system comprised chunks of inherited DNA broken down by meiosis, as with genetic linkage studies, or the protein coding regions of the genome as with exome sequencing.

While our success in genetics has been both tangible and substantial, translating this understanding to the development of mechanism-based therapies has been a much more difficult, and less productive, endeavor. A common theme in the success of genetics is that the rate of progress has been dependent on the use of unbiased and system-wide assays; when we have not relied on these tools and instead used our perceived knowledge of the disease to predict genetic targets, our success has been less impressive. We argue here that until now the understanding of pathobiology has been limited by the absence of unbiased and system-wide approaches for understanding molecular processes and that the traditional approach to teasing apart pathogenic function is inefficient and often misleading. We suggest that in order to take advantage of the growing fund of genetic discovery in understanding the molecular basis of disease, we will need to develop and use high content system-wide approaches.

There has been a great deal of success in the early parts of this schema. In a little over 20 years, human disease genetics has moved from a backwater of biology, to the foundation of much of our understanding of complex biological questions. While this success has been hard won, the tools and skills needed to identify disease-linked genes have improved so much that the route to disease gene identification is becoming clearer, more straightforward, and, for certain types of risk variation, routine.

A core aim of human disease genetics is to facilitate the development of etiologic-based treatments. The field’s approach has centered on the notion that identifying gene mutations will ultimately allow us to understand the molecular processes that initiate and sustain the disease pathogenesis. This knowledge in turn will allow the development of mechanism-based therapies ( Figure 1 ).

Insight into the mechanisms underlying disease has been achieved using traditional hypothesis-driven functional approaches. In the context of Alzheimer’s disease, the most prominent example of this effort has come from translating amyloid precursor and presenilin mutations into an understanding of the amyloid cascade hypothesis. This represents more than two decades of hard-won functional effort (). While this progress has been absolutely critical in formulating novel therapeutic approaches for Alzheimer’s disease, it has come at a substantial cost. The majority of the pathogenic hypotheses tested have either been shown to be wrong, or ultimately revealed to be true biologically but unimportant pathologically. The approaches used to test these ideas were, out of necessity, performed almost exclusively using traditional hypothesis testing reductionist experiments. In the context of these approaches, we have become used to a scientific culture of failure, punctuated by rare significant success. We argue here that much of the initial work characterizing the biology of disease-implicated proteins, could be more efficiently performed using the burgeoning hypothesis generating whole-genome screening methods. Such an approach does not remove the likelihood of chasing false leads, but it may minimize it, and notably it would do so at the earliest stages of investigation. Importantly, this effort does not seek to remove the need for traditional functional work, but simply to refocus it based on largely unbiased hypotheses (or more accurately those without the perception of understanding).

Secreted amyloid beta-protein similar to that in the senile plaques of Alzheimer’s disease is increased in vivo by the presenilin 1 and 2 and APP mutations linked to familial Alzheimer’s disease.

Comprehensive Screening and Big Data Integration

Because the early success in human disease genetics was in the identification of gene mutations that underlie monogenic forms of disease, our approaches to understanding the pathobiologic role of disease genes and their products has centered on manipulating systems using protein coding mutations. These approaches are difficult to apply to alleles that confer only low to moderate risk. For most disease-associated loci identified by GWA the effector gene is unknown, so any attempt to understand pathobiology must initially include an assessment that will reveal the gene and protein of interest. For many loci, the underlying risk allele (which is often unknown) confers only minor risk for disease, and one might predict that the pathobiological effect is also either minor or is not evident under basal conditions. Lastly, the majority of low to moderate risk alleles are not associated with protein coding variants, and thus they must confer an effect through altering expression of a transcript. These constraints represent considerable challenges, and they certainly require a rethinking of our approach to understanding pathobiology and a retooling of functional research groups. However, we believe that the return on meeting these challenges is likely to be so meaningful that we must prepare to meet and best them.

Novarino et al., 2014 Novarino G.

Fenstermaker A.G.

Zaki M.S.

Hofree M.

Silhavy J.L.

Heiberg A.D.

Abdellateef M.

Rosti B.

Scott E.

Mansour L.

et al. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Guerreiro et al., 2015 Guerreiro R.

Escott-Price V.

Darwent L.

Parkkinen L.

Ansorge O.

Hernandez D.G.

Nalls M.A.

Clark L.

Honig L.

Marder K.

et al. International Parkinson’s Disease Genomics Consortium (IPDGC)

Genome-wide analysis of genetic correlation in dementia with Lewy bodies, Parkinson’s and Alzheimer’s diseases. Holmans et al., 2013 Holmans P.

Moskvina V.

Jones L.

Sharma M.

Vedernikov A.

Buchel F.

Saad M.

Bras J.M.

Bettella F.

Nicolaou N.

et al. International Parkinson’s Disease Genomics Consortium

A pathway-based analysis provides additional support for an immune-related genetic susceptibility to Parkinson’s disease. International Genomics of Alzheimer’s Disease Consortium (IGAP), 2015 International Genomics of Alzheimer’s Disease Consortium (IGAP)

Convergent genetic and expression data implicate immunity in Alzheimer’s disease. Jones et al., 2010 Jones L.

Holmans P.A.

Hamshere M.L.

Harold D.

Moskvina V.

Ivanov D.

Pocklington A.

Abraham R.

Hollingworth P.

Sims R.

et al. Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer’s disease. Moskvina et al., 2013 Moskvina V.

Harold D.

Russo G.

Vedernikov A.

Sharma M.

Saad M.

Holmans P.

Bras J.M.

Bettella F.

Keller M.F.

et al. IPDGC and GERAD Investigators

Analysis of genome-wide association studies of Alzheimer disease and of Parkinson disease to determine if these 2 diseases share a common genetic risk. Jones et al., 2010 Jones L.

Holmans P.A.

Hamshere M.L.

Harold D.

Moskvina V.

Ivanov D.

Pocklington A.

Abraham R.

Hollingworth P.

Sims R.

et al. Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer’s disease. There already exists a large number of datasets that can help in this regard, and these have the potential to reveal pathobiology and identify the causal gene at risk loci. A seminal example of this approach was recently published in the field of hereditary spastic paraplegia (HSP) genetics (). In this work, the authors used the large number of known HSP genes and existing protein interaction databases to build an HSPome—a protein interaction network centered on HSP proteins. The construction of this network helped nominate proteins/genes for involvement in the molecular pathogenesis of HSP. Taking this information, they were then able to reexamine exome-sequencing data generated in HSP patients and were able to identify novel genetic causes of this disease. This work illustrates the power of integrating large-scale genetic and functional data and shows that the combination of these has the power to inform at both the genetic and pathobiological level. Implicitly, this work also answers a common criticism of continued genetic work: why continue to find low risk genes when we do not know how the genes we have are involved in the disease process? In short, the answer is: we believe that the larger the number of genes and loci that we have, the better chance we have of connecting their protein products in a pathologic network; in turn allowing us to build a complete picture of the pathogenic process. Work along these lines has been attempted in both AD and PD, most prominently featuring efforts to perform pathway-based analysis of GWA-implicated genes. This work centers on mining interaction or literature-based datasets in an attempt to reveal whether the GWA genes collectively highlight pathways of pathobiological relevance. Within both AD and PD, the immune system has been highlighted using this approach, although notably there does not appear to be a strong shared genetic component between these two diseases (). Additionally it has been argued that cholesterol metabolism may also be a pathway of significance in AD ().

Satake et al., 2009 Satake W.

Nakabayashi Y.

Mizuta I.

Hirota Y.

Ito C.

Kubo M.

Kawaguchi T.

Tsunoda T.

Watanabe M.

Takeda A.

et al. Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson’s disease. Simón-Sánchez et al., 2009 Simón-Sánchez J.

Schulte C.

Bras J.M.

Sharma M.

Gibbs J.R.

Berg D.

Paisan-Ruiz C.

Lichtner P.

Scholz S.W.

Hernandez D.G.

et al. Genome-wide association study reveals genetic risk underlying Parkinson’s disease. International Parkinson’s Disease Genomics Consortium (IPDGC), 2011 International Parkinson’s Disease Genomics Consortium (IPDGC) Wellcome Trust Case Control Consortium 2 (WTCCC2)

A two-stage meta-analysis identifies several new loci for Parkinson’s disease. A more commonly used data integration method centers on using quantitative trait locus data, where the quantitative trait is a biologic measure. Most typically, this aims to provide broad maps of the genetic control of effects proximal to genetic variability such as gene expression, protein levels, and DNA methylation. These maps can then be used to map the immediate biologic effects of variants linked to disease. It is now fairly typical to combine expression quantitative trait locus (QTL) work with GWAS. There are some limitations to this work; primary of which is that association between a biologic trait and a risk variant does not necessarily imply this effect is disease related. For example, an early identified risk locus for Parkinson’s disease was nominated on the short arm of chromosome 1, and denoted as PARK16 (). Initial work suggested the risk alleles were also highly significant QTL’s for expression of NUCKS1 and DNA methylation at PM20D1, with a less significant expression QTL for RAB7L1 (now RAB29) (); nevertheless, subsequent functional evidence strongly suggests that RAB29 is the disease related gene at this locus.

There are also limits that depend on the biological source used for the QTL map; for example, a typical expression QTL map would examine the relationship between genetic variability and gene expression from a tissue, such as human brain. While quite general genetic effects on constitutive expression are likely to be detectable in such a tissue, expression changes that only occur in a particular cell type, or those that are only evident after induction of expression (for example, in response to cell stress), will not be detected. It is notable that for many loci identified by GWA, no QTL effect has been identified.

To date much of the work in the field of expression has relied on array-based assays, which have considerable limitations, such as being relatively insensitive to splice changes, unable to detect unknown/unassayed transcripts, and having a narrow dynamic range for detection. While the resolution of this approach is likely to improve with the application of transcriptome sequencing, the problem of tissue-specific and induced expression will remain, and may need to be addressed with reference experiments aimed at recapitulating these effects, such as QTL mapping in differentiated iPSC.

Claussnitzer et al., 2015 Claussnitzer M.

Dankel S.N.

Kim K.-H.

Quon G.

Meuleman W.

Haugen C.

Glunk V.

Sousa I.S.

Beaudry J.L.

Puviindran V.

et al. FTO obesity variant circuitry and adipocyte browning in humans. Smemo et al., 2014 Smemo S.

Tena J.J.

Kim K.-H.

Gamazon E.R.

Sakabe N.J.

Gómez-Marín C.

Aneas I.

Credidio F.L.

Sobreira D.R.

Wasserman N.F.

et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. GTEx Consortium, 2013 GTEx Consortium

The Genotype-Tissue Expression (GTEx) project. Nalls et al., 2014b Nalls M.A.

Saad M.

Noyce A.J.

Keller M.F.

Schrag A.

Bestwick J.P.

Traynor B.J.

Gibbs J.R.

Hernandez D.G.

Cookson M.R.

et al. International Parkinson’s Disease Genomics Consortium (IPDGC) Wellcome Trust Case Control Consortium 2 (WTCCC2) North American Brain Expression Consortium (NABEC) United Kingdom Brain Expression Consortium (UKBEC)

Genetic comorbidities in Parkinson’s disease. Gamazon et al., 2015 Gamazon E.R.

Wheeler H.E.

Shah K.P.

Mozaffari S.V.

Aquino-Michaels K.

Carroll R.J.

Eyler A.E.

Denny J.C.

Nicolae D.L.

Cox N.J.

Im H.K. GTEx Consortium

A gene-based association method for mapping traits using reference transcriptome data. It is notable that analysis of expression as an effector of GWAS signals is being taken a step beyond correlative studies and into mechanistic investigation; this work has shown clearly the complexity of gene regulation and serves as a warning that the most obvious association may not be disease relevant. A prime example comes from the investigation of the FTO locus, where genetic variability is strongly associated with risk for obesity and diabetes; notably, FTO expression is also linked with obesity, and it was believed that the genetic risk at this locus was mediated through FTO. Recent work has suggested however that this may not be the case (). A critical resource in this type of mechanistic approach is data derived from the ENCODE project (encyclopedia of DNA elements), which aims to build a comprehensive list of functional elements within the human genome. The data underlying ENCODE come from a wide variety of experimental approaches and tissues, and provides data regarding RNA transcripts, chromatin states, and transcriptional regulation. Such data allow investigators to rapidly take the first steps toward understanding the role of genetic variability linked to disease in the context of altered gene regulation. Likewise public sources of gene expression and genetic data such as that in the GTEx, Braineac, UKBEC, and NABEC data allow the integration of genetics, transcriptomic, and regulatory data in order to better understand the genetic control of gene expression in the context of disease (). This work is being extended in creative ways not only to understand the function of disease-linked variants, but also to identify new disease-linked genes previously undiscovered in traditional GWA studies ().

Beilina et al., 2014 Beilina A.

Rudenko I.N.

Kaganovich A.

Civiero L.

Chau H.

Kalia S.K.

Kalia L.V.

Lobbestael E.

Chia R.

Ndukwe K.

et al. International Parkinson’s Disease Genomics Consortium North American Brain Expression Consortium

Unbiased screen for interactors of leucine-rich repeat kinase 2 supports a common pathway for sporadic and familial Parkinson disease. Pankratz et al., 2009 Pankratz N.

Wilk J.B.

Latourelle J.C.

DeStefano A.L.

Halter C.

Pugh E.W.

Doheny K.F.

Gusella J.F.

Nichols W.C.

Foroud T.

Myers R.H. PSG-PROGENI and GenePD Investigators, Coordinators and Molecular Genetic Laboratories

Genomewide association study for susceptibility genes contributing to familial Parkinson disease. Satake et al., 2009 Satake W.

Nakabayashi Y.

Mizuta I.

Hirota Y.

Ito C.

Kubo M.

Kawaguchi T.

Tsunoda T.

Watanabe M.

Takeda A.

et al. Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson’s disease. Simón-Sánchez et al., 2009 Simón-Sánchez J.

Schulte C.

Bras J.M.

Sharma M.

Gibbs J.R.

Berg D.

Paisan-Ruiz C.

Lichtner P.

Scholz S.W.

Hernandez D.G.

et al. Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Figure 4 The Application of Both Unbiased Genetics Methods and High Content Protein-Protein Interaction Assays Revealed the Interaction between Lrrk2, GAK, and Rab7L1 Show full caption Not only did this work establish GAK and RAB7L1 as the pathologically relevant genes at these GWA-identified risk loci, but it also considerably expanded our understanding of Lrrk2 function, a critical aim in PD research. Another broad-scale screening approach, in this instance combining genetic and protein interaction data, was recently performed in PD, and we believe this illustrates the power of integrating big data, and a likely path forward for complex disease research (). In this work, the authors performed unbiased high content screens for interactors of the known PD protein Lrrk2. As with most high content screens, a large number of potential interactors were identified, and the prioritization of proteins for follow up would typically have been centered on factors such as putative function and cellular expression. However, the authors used a different approach: combining GWA results with the hits from the Lrrk2 interactor screen showed that two of the hits were encoded by genes under GWA peaks (). These two proteins, GAK and Rab-7L1 (RAB29) were subsequently shown to form a complex with Lrrk2 that promotes clearance of Golgi-derived vesicles through the autophagy-lysosome system ( Figure 4 ).

Figure 5 The Primary Challenge Following Gene Identification Remains Understanding the Pathobiological Process; Typically This Problem Has Been Tackled Using Limited Scale and Reductionist Methods, which Require Some A Priori Hypotheses Regarding Potential Mechanism of Action Show full caption We suggest here that this understanding can be greatly facilitated using burgeoning high-content screening technologies and large-scale data mining. Not only does this have the ability to implicate specific processes in disease, but it can also aid in further mapping of the genetic basis of disease. What we hope to have illustrated is that these high content data have the ability to be informative about the molecular etiology of disease. This is particularly relevant when integrated with a large list of genes/loci. Notably such experiments have the benefit that they both provide information regarding the molecular etiology of disease, and support the nomination of particular genes within known risk loci, creating a self-propagating knowledge generator ( Figure 5 ).