The availability of data is not synonymous, however, with the presence of meaning. The challenge of our current times is to leverage the wealth of recently-available genomic and epigenomic data to derive true biological meaning from GWAS-implicated disease risk loci.

While each locus-trait association will certainly have unique features that require thoughtful “bespoke” delineations of appropriate post-GWAS functional studies, we outline here a general approach that might be applicable to many such studies. We moreover highlight reports exemplifying key steps in this approach.

To begin to prioritize daVs at a given association signal, Bayesian approaches can be used to determine the probability that each daV is causal for the association, resulting in a “credible set” of candidate causal variants, which might range in size from a single variant to hundreds of variants (reviewed in). Furthermore, because most GWASs are performed initially in genetically similar groups of cases and controls, leading to the association of traits with haplotypes as defined in these genetic groups, trans-ethnic fine-mapping can be used to refine the region of association. Specifically, the reduced LD and smaller haplotype blocks in certain populations, particularly Africans, may reduce the number of candidate causal variants.For example, Guthridge et al. (2014) used such an approach, combined with re-sequencing of the candidate region, to reduce the number of candidate causal variants at a lupus-associated locus from 30 to 3.After employing these and other statistical methods, fine-mapped daVs can be investigated for association with gene expression levels in many cell and tissue types, using publicly available eQTL data.While studies integrating GWASs and eQTL data have reported that nearly half of all daVs are associated with gene mRNA levels in at least one cell type,there are several other mechanisms by which a functional variant could influence disease risk. First, a variant could affect protein levels through effects on translation or protein stability without an effect on mRNA levels; indeed, up to 1/3 of variants that associate with protein levels (pQTLs) do not associate with the mRNA levels of the same gene,although only a few studieshave examined this overlap. In addition, a GWAS causal variant might alter the amino acid sequence of a protein, thereby affecting protein function rather than abundance.These possibilities can usually be excluded, however, if there are no daVs in exonic regions. In such a case, the association of a daV with mRNA expression levels of one or more potential target genes is important for downstream analyses. Namely, conditional and colocalization analyses can be performed using the sentinel GWAS and eQTL variants to determine if both effects are likely driven by the same underlying mechanism. If so, testable hypotheses regarding the function of the disease risk causal variant—that it either increases or decreases the expression of a specific gene or genes—follow naturally.

The resolution of microarray-based GWASs can be greatly increased by performing imputation of variants that were not directly genotyped, using population-based sequencing data, such as that from the 1000 Genomes Project.In this way, the significance of association of virtually all common (minor allele frequency ≥ 1%) variants with disease risk can be estimated.Conditional analyses can additionally be performed to determine if multiple weakly linked or unlinked causal variants are contributing to the association of the same locus with disease risk,as compared to a situation in which only one signal exists at the locus in question. In one example of the former situation, Glubb and colleagues (2015) performed a meta-analysis of breast cancer GWASs, finding a complex pattern of association involving at least three independent signals at and around the MAP3K1 locus.In an example of the latter situation, Wu and colleagues (2014) performed conditional analyses on seven loci associated with levels of adiponectin, an adipocyte-secreted protein associated with cardiovascular and metabolic traits.After conditioning on the sentinel GWAS SNP for each locus, six out of seven loci showed no residual association at any other variants, suggesting that these associations are driven by one or more strongly linked functional variants.

Taken together, the prioritization of candidate causal variants based on epigenomic annotations may yield fruitful directions for downstream investigation. Moreover, the availability of “user-friendly” tools for this prioritization, recently reviewed elsewhere,make these types of analyses accessible to many types of scientists. We close this section, however, with a consideration of the limitations of these existing data. First, many gene-regulatory processes are known to be context-dependent. Because the vast majority of epigenomic and eQTL studies have been performed on resting (unstimulated) cells,these studies might be limited in their ability to identify context-dependent effects. Second, some cell and tissue types are more difficult to obtain and/or culture than others, which may preclude their incorporation into consortium-based, large-scale studies. Thus, for some diseases/loci involving these cell and tissue types in driver roles, the currently available datasets might be less useful. In such a scenario, approaches taking into account evolutionary conservation might be helpful in prioritizing candidate causal variants. In summary, limitations in causal variant identification might stem from the nature of existing epigenomic and eQTL datasets for some diseases and some loci. However, the bottleneck in our global understanding of risk loci found by GWAS is more likely to be due to a lack of disease-focused functional biological studies downstream of GWAS locus discovery than to a lack of epigenomic and eQTL datasets.

Making sense of GWAS: Using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome.

In addition, epigenomic datasets have been used to investigate loci that are associated with disease risk by GWASs, but do not reach statistical significance after correction for multiple hypothesis testing.GWASs typically employ the Bonferroni correction method for multiple hypothesis testing, which might be overly conservative due to LD between nearby SNPs throughout the genome. Thus, some SNPs that do not reach the conventionally accepted genome-wide significance threshold (p < 5 × 10) might represent true disease risk loci. To identify such loci, a study by Wang et al. (2016) examined the overlap of SNPs associated with cardiac QT interval with epigenetic enhancer marks in cardiac and non-cardiac tissues. The authors found that both genome-wide significant SNPs (p < 5 × 10) and “sub-threshold” SNPs (5 × 10≤ p ≤ 1 × 10) were significantly enriched in predicted cardiac enhancers, and > 70% of enhancers harboring sub-threshold SNPs exhibit allele-specific regulatory activity in induced pluripotent stem cell (iPSC)-derived cardiomyocyte luciferase reporter assays.Furthermore, enhancer-associated sub-threshold SNPs were more strongly associated with QT interval than non-enhancer-associated sub-threshold SNPs, and the enhancer-associated SNPs were more likely to reach genome-wide significance in larger GWAS meta-analyses.

To test such a genetic regulatory function hypothesis, one can capitalize on the wealth of previously-mentioned, publicly available epigenomic data to prioritize candidate causal variants. Specifically, overlap with accessible chromatin, TF binding, and/or histone marks associated with regulatory activity might all suggest a functional effect for a given candidate causal variant located within a predicted CRE. Moreover, the pattern of histone modifications observed at a putative CRE can help predict which type of regulatory element it may be (e.g., promoter, enhancer, insulator, etc.), guiding choice of functional assay. Such a “filtering” approach is exemplified in a study of the 8q21 locus associated with allergic diseases.The sentinel GWAS SNP was found to associate with the expression of PAG1 in B lymphoblasts, and ENCODE data was used to select 35 candidate causal SNPs (out of a total of 118 that are in moderate LD (r≥ 0.6) with the sentinel SNP) overlapping four distinct regions of DNase I hypersensitivity and enhancer-associated histone marks in this cell type. These potential CREs were then investigated by multiple approaches, including chromosome conformation capture (3C) and reporter gene assays.

Testing the Function of a Regulatory Variant

16 Farh K.K.

Marson A.

Zhu J.

Kleinewietfeld M.

Housley W.J.

Beik S.

Shoresh N.

Whitton H.

Ryan R.J.

Shishkin A.A.

et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. 57 Heinz S.

Romanoski C.E.

Benner C.

Glass C.K. The selection and function of cell type-specific enhancers. Once a list of candidate CREs is identified, all containing one or more potential causal variants, various experimental approaches can be used to test the functions of these regions. A common approach involves in silico analysis to determine whether a particular variant is predicted to disrupt a TF binding motif, with the caveat that many causal variants that may in fact disrupt TF binding do not reside in known TF motifs. For example, only 10%–20% of predicted autoimmune GWAS causal cis-regulatory variants may reside in known TF motifs.An alternative approach is to functionally test all candidate CREs, using both the risk and protective alleles of the candidate causal variants. Cell culture-based reporter assays have been widely used for these purposes: the candidate CRE is cloned into a physiologically relevant position with respect to the reporter gene and transfected into a relevant cell type, and the activity of CREs containing alternate alleles (or haplotypes, if multiple daVs overlap the CRE) are compared. Because some CREs are not only cell type-specific, but signal-dependent,attention to the appropriate experimental conditions in which to test the variant is important.

58 Inoue F.

Ahituv N. Decoding enhancers using massively parallel reporter assays. 32 Tewhey R.

Kotliar D.

Park D.S.

Liu B.

Winnicki S.

Reilly S.K.

Andersen K.G.

Mikkelsen T.S.

Lander E.S.

Schaffner S.F.

Sabeti P.C. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. 32 Tewhey R.

Kotliar D.

Park D.S.

Liu B.

Winnicki S.

Reilly S.K.

Andersen K.G.

Mikkelsen T.S.

Lander E.S.

Schaffner S.F.

Sabeti P.C. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. 30 Dimas A.S.

Deutsch S.

Stranger B.E.

Montgomery S.B.

Borel C.

Attar-Cohen H.

Ingle C.

Beazley C.

Gutierrez Arcelus M.

Sekowska M.

et al. Common regulatory variation impacts gene expression in a cell type-dependent manner. , 31 GTEx Consortium

Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. , 33 Patwardhan R.P.

Hiatt J.B.

Witten D.M.

Kim M.J.

Smith R.P.

May D.

Lee C.

Andrie J.M.

Lee S.I.

Cooper G.M.

et al. Massively parallel functional dissection of mammalian enhancers in vivo. Rather than testing reporter constructs one-by-one in cell culture contexts, several groups have developed massively parallel reporter assays (MPRAs), in which thousands of variants can be tested in a single experiment.For example, Tewhey and colleagues (2016) investigated ∼30,000 SNPs representing > 3,500 eQTL signals (eSNPs), testing each eSNP and all variants in perfect LD with it for enhancer activity in immortalized liver and B lymphoblast cell lines. ∼12% of the putative CREs displayed enhancer function in one or both of the cell types tested, and of these, ∼25% contained SNPs that caused significant changes in reporter gene expression.Importantly, ∼80% of the expression differences caused by these variants agreed with the direction of previously published eQTL effects in the same cell type.In addition, the majority of functional variants identified in this study altered reporter gene levels by less than 2-fold, consistent with eQTL effect sizes predicted by previous studies.These results underline the importance of investigating the cellular or organismal effects of modest changes in target gene expression.

59 Brown C.D.

Mangravite L.M.

Engelhardt B.E. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. 58 Inoue F.

Ahituv N. Decoding enhancers using massively parallel reporter assays. 58 Inoue F.

Ahituv N. Decoding enhancers using massively parallel reporter assays. While reporter assays are often useful in determining the function of a potential regulatory variant, they have several limitations. First, reporter assays can display a significant amount of transcriptional noise, and thus are not always reproducible.Second, small differences in reporter activity can result from small differences in the molar amounts of each plasmid that is transfected into cells, which is unavoidable even with the most accurate DNA concentration measurements. These issues can make small differences in expression difficult to distinguish statistically. Perhaps most importantly, reporter assays test the transcriptional function of a variant in the context of plasmid DNA, rather than the native genomic context in which the variant actually exists.This situation can produce false negative and false positive results, due to the intricate relationships between DNA, histones, transcription factors, noncoding RNAs, and long-range chromatin interactions.

60 Engel K.L.

Mackiewicz M.

Hardigan A.A.

Myers R.M.

Savic D. Decoding transcriptional enhancers: Evolving from annotation to functional interpretation. 61 Gaj T.

Sirk S.J.

Shui S.L.

Liu J. Genome-editing technologies: Principles and applications. 62 Bauer D.E.

Kamran S.C.

Lessard S.

Xu J.

Fujiwara Y.

Lin C.

Shao Z.

Canver M.C.

Smith E.C.

Pinello L.

et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. 63 Lettre G.

Bauer D.E. Fetal haemoglobin in sickle-cell disease: from genetic epidemiology to new therapeutic strategies. 62 Bauer D.E.

Kamran S.C.

Lessard S.

Xu J.

Fujiwara Y.

Lin C.

Shao Z.

Canver M.C.

Smith E.C.

Pinello L.

et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. In light of these issues, a more physiologically-relevant method to confirm the function of a regulatory variant may be genome editing,pioneered through the use of zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), and more recently overtaken by the nucleic acid-based clustered regularly interspaced short palindromic repeat (CRISPR)-based systems.In one of the first applications of genome editing to a GWAS-nominated locus, Bauer and colleagues (2013) edited a region in the mouse ortholog of BCL11A. The orthologous region in humans harbors the top SNPs associated with fetal hemoglobin levels, for which BCL11A is a known repressor.Thus, the causal variant may function by regulating BCL11A, affecting downstream levels of fetal and embryonic β-globin—indeed, manipulating this pathway is an attractive prospect for treating β-hemoglobinopathies.The authors demonstrated that several top GWASs SNPs fall within three distinct regions of open chromatin and enhancer-associated histone marks that are specific to human erythroid cells, consistent with the erythroid-specific expression patterns of the globin genes, and the top candidate SNP was hypothesized to disrupt binding of the erythroid TFs GATA1 and TAL1. Using TALENs, this group deleted a 10kb intronic interval containing the putative causal variant in a murine erythroleukemia cell line, which resulted in dramatically reduced expression of Bc11a and concomitant increase of embryonic β-globin, thus establishing the region as a functional Bcl11a enhancer required for repression of embryonic β-globin.

64 Huang Q.

Whitington T.

Gao P.

Lindberg J.F.

Yang Y.

Sun J.

Väisänen M.R.

Szulkin R.

Annala M.

Yan J.

et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. 65 Spisák S.

Lawrenson K.

Fu Y.

Csabai I.

Cottman R.T.

Seo J.H.

Haiman C.

Han Y.

Lenci R.

Li Q.

et al. GAME-ON/ELLIPSE Consortium

CAUSEL: an epigenome- and genome-editing pipeline for establishing function of noncoding GWAS variants. 65 Spisák S.

Lawrenson K.

Fu Y.

Csabai I.

Cottman R.T.

Seo J.H.

Haiman C.

Han Y.

Lenci R.

Li Q.

et al. GAME-ON/ELLIPSE Consortium

CAUSEL: an epigenome- and genome-editing pipeline for establishing function of noncoding GWAS variants. 65 Spisák S.

Lawrenson K.

Fu Y.

Csabai I.

Cottman R.T.

Seo J.H.

Haiman C.

Han Y.

Lenci R.

Li Q.

et al. GAME-ON/ELLIPSE Consortium

CAUSEL: an epigenome- and genome-editing pipeline for establishing function of noncoding GWAS variants. 66 Mendenhall E.M.

Williamson K.E.

Reyon D.

Zou J.Y.

Ram O.

Joung J.K.

Bernstein B.E. Locus-specific editing of histone modifications at endogenous enhancers. , 67 Dominguez A.A.

Lim W.A.

Qi L.S. Beyond editing: repurposing CRISPR-Cas9 for precision genome regulation and interrogation. In the example above, a large genomic region was deleted to demonstrate the importance of that region to gene regulatory function. However, genome editing can also be used to make more precise changes, such as mutating an individual SNP from one allele to the other. In Spisák et al. (2015), the authors used TALEN-mediated homology directed repair (HDR) to confirm the functional role of a SNP previously reported to influence prostate cancer riskby modulating RFX6 expression.Specifically, they compared edited and unedited prostate cancer cell line clones, and demonstrated that the candidate causal variant altered RFX6 expression levels by ∼2-fold.Moreover, the authors characterized the regulatory potential of the region harboring the SNP by fusing a catalytically-inactive TALE array with either a VP64 transcriptional activation domain, or LSD1, a histone lysine-specific demethylase known to remove H3K4 methylation enhancer marks and decrease enhancer activity. As expected, site-specific recruitment of VP64 and LSD1 to the putative causal SNP increased and decreased RFX6 levels, respectively, establishing the region harboring the causal variant as a bona fide regulatory element.Thus, genome editing technologies can also be used to validate potential CREs by altering epigenetic state, rather than the underlying DNA sequence.

68 Soldner F.

Stelzer Y.

Shivalila C.S.

Abraham B.J.

Latourelle J.C.

Barrasa M.I.

Goldmann J.

Myers R.H.

Young R.A.

Jaenisch R. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. 69 Singleton A.B.

Farrer M.

Johnson J.

Singleton A.

Hague S.

Kachergus J.

Hulihan M.

Peuralinna T.

Dutra A.

Nussbaum R.

et al. alpha-Synuclein locus triplication causes Parkinson’s disease. 68 Soldner F.

Stelzer Y.

Shivalila C.S.

Abraham B.J.

Latourelle J.C.

Barrasa M.I.

Goldmann J.

Myers R.H.

Young R.A.

Jaenisch R. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. 68 Soldner F.

Stelzer Y.

Shivalila C.S.

Abraham B.J.

Latourelle J.C.

Barrasa M.I.

Goldmann J.

Myers R.H.

Young R.A.

Jaenisch R. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. 68 Soldner F.

Stelzer Y.

Shivalila C.S.

Abraham B.J.

Latourelle J.C.

Barrasa M.I.

Goldmann J.

Myers R.H.

Young R.A.

Jaenisch R. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. A more recent study used CRISPR/Cas9 gene editing to investigate candidate causal variants at the SNCA locus,which is associated with risk for Parkinson’s disease (PD), and which encodes α-synuclein, the protein that accumulates in the characteristic Lewy Body inclusions of PD. The authors demonstrate that the SNCA risk haplotype is associated with increased SNCA brain expression, which has previously been associated with PD pathogenesis, since families with duplications or triplications of the SNCA locus (resulting in increased SNCA levels) exhibit Mendelian forms of PD.After prioritizing candidate causal variants based on epigenetic signatures and in silico TF motif predictions, the authors deleted a 500bp putative enhancer at this locus containing two SNPs in human embryonic stem (ES) cells.They reinserted the 500bp region using HDR with either the risk or protective alleles of the two SNPs, and differentiated the ES cells into neural precursors and mixed neuronal cultures.Cell clones bearing the risk-associated alleles of the enhancer SNPs demonstrated significantly higher SNCA levels than clones bearing the protective alleles, and this effect was driven entirely by the variant predicted to be functional by in silico and experimental analyses.