Genome-Wide Association Study Methodology and Overview

The human genome contains millions of loci that are commonly polymorphic across people (e.g., single nucleotide polymorphisms, SNPs) and modern genetic methods (e.g., GWAS) are capable of detecting associations between these genetic variants and complex genetic phenotypes, including PTSD. GWASs have been tremendously successful in recent years, identifying thousands of genetic variants associated with complex genetic phenotypes [14••, 15]. This includes over 200 risk loci for psychiatric disorders discovered already [26••, 37,38,39,40]. Notably, most of these discoveries were impossible with modest GWAS that included only thousands of individuals. In addition, just as petroleum engineers can make accurate predictions about how much oil is left in the ground in a particular area (using data and statistical models), statistical geneticists can now predict the likely number of genetic risk variants for a given phenotype that can ultimately be discovered [16•, 37, 41]. Concrete evidence supporting such predictions is already available for phenotypes including height [42] and schizophrenia [37].

Overall, GWAS results have revealed certain points and principles about genetic effects on psychiatric disorders, including the expectation that phenotypes like PTSD are highly polygenic and that there are strict bounds on variant effect sizes (as described above). GWAS results have also revealed complexity in understanding robust GWAS results, even after loci are identified. For example, it is difficult to find the true risk variant(s) within a given GWAS locus. Further, determining the functional effect of a true risk variant is also difficult. These points are described further below, as they pertain to PTSD.

Candidate Gene Study Methodology and Overview

With the current availability of affordable, high-coverage genotyping and analysis methods (e.g., GWAS), candidate gene studies are no longer necessary. Candidate gene studies measure only small numbers of genetic variations (typically one to ten polymorphisms, out of the many millions of common polymorphisms that exist). In contrast, each GWAS typically encompasses all candidate polymorphisms, as well as thousands of variations in and around each candidate gene, and also millions of other genetic variants throughout the genome. Thus, the genomic coverage afforded by GWAS makes candidate gene studies obsolete. A second reason why candidate gene studies are no longer recommended is that GWAS data affords far superior analytical procedures. Simply put, it is impossible to correct for known confounders to genetic studies (i.e., population stratification and subtle relatedness) using candidate gene data alone. Finally, the other reason why geneticists routinely disregard candidate gene findings is that the replication record of candidate gene studies has been notoriously poor. Given that GWAS results have shown that candidate gene hypotheses were typically wrong in two major ways (i.e., they specified the wrong polymorphisms and also explicitly or implicitly hypothesized effect sizes that were too large), we now have a sensible explanation for the poor replication record of candidate gene findings, namely, that nearly all results were false positives [18, 19, 20•, 43,44,45,46]. Exceptions to this poor replication record exist for substance-related phenotypes, in which some of the correct genes (though perhaps not the correct polymorphisms) have been previously hypothesized (e.g., ADH1B associations to alcohol phenotypes [47,48,49,50]). Given these points, we do not review candidate gene findings for PTSD. Fortunately however, the first robust findings from GWAS of PTSD are just emerging, and so we review available and forthcoming findings below.

Robust Molecular Genetic Findings for PTSD Are Emerging

Million Veterans Program

The Million Veterans Program [51] (MVP) biobank is one of the world’s leading repositories of genetic and phenotypic information, and is an unprecedented resource for the study of PTSD. A conference abstract on MVP GWAS of PTSD re-experience symptoms has been published [12••] and additional results about other PTSD phenotypes measured within MVP will be available in future publications. Regarding PTSD re-experiencing symptoms, MVP researchers examined a sample of 146,660 European-ancestry participants and 19,983 African-ancestry participants. This dataset afforded the discovery of eight loci at the level of genome-wide significance (i.e., p < 5 × 10−8). These loci include a chromosome 3 locus with top variant rs2777888 (p = 2.1 × 10−11). This variant is located in an intron of the CAMKV gene (CaM kinase like vesicle associated), which is highly expressed in the brain. An extended locus on chromosome 17 was also identified, with lead SNP rs2532252 (p = 4.5 × 10−10) closest to the KANSL1 gene (KAT8 regulatory NSL complex subunit 1). This locus also encompasses the CRHR1 gene (corticotropin releasing hormone receptor 1), a previous candidate gene for PTSD. This means that CRHR1 may be associated with PTSD, but further research is needed to assess this possibility given the large number of genes and regulatory regions in this broad locus (see notes about fine-mapping below). A third locus on chromosome 18 is located in a locus previously associated with schizophrenia, in the TCF4 gene (transcription factor 4; top PTSD SNP rs2123392, p = 5.4 × 10−11). The discovery of these loci [12••] is a tremendous step forward for PTSD genetics. The forthcoming full manuscript, as well as further genetic studies of PTSD phenotypes from MVP, will provide motivating results for the field. The major limitation of the MVP studies is the exclusive focus on military samples. The Psychiatric Genomics Consortium studies (below) include both military and civilian samples, from a wide variety of contexts relevant to PTSD.

PTSD Group of the Psychiatric Genomics Consortium (PGC-PTSD) and the UK Biobank

As has been the case for much of complex trait genetics research, the formation of international consortia focused on the genetics of PTSD has been a critical step in for discovery because far larger sample sizes can be achieved through sharing of data. In psychiatry, the largest genetics consortium is the Psychiatric Genomics Consortium, PGC [41]. The PGC made possible the identification of over 100 risk loci for schizophrenia, as reported in 2014 [37], and more loci have consistently been identified as data aggregation within the PGC schizophrenia group has continued. The PGC-PTSD group [21, 22] has been employing the same strategy used by other successful PGC groups, and the first empirical paper for the group had a sample size of 20,070 [11••], from the combined analysis of 11 previous GWAS of PTSD [52,53,54,55,56]. This study was notable because it was the first to be adequately powered to estimate h2 SNP for PTSD. Intriguingly, the h2 SNP estimate for females (29%) was higher than the h2 SNP estimate for males (7%), consistent with twin study PTSD heritability estimates (i.e., h2 twin_FEMALE estimates are higher than h2 twin_MALE estimates), as described above. Analyses from the PGC-PTSD group also revealed shared genetic effects between PTSD and schizophrenia, bipolar disorder, and depression [11••, 13••].

The second wave of data analysis from the PGC-PTSD group (abstract currently available [13••], and manuscript forthcoming) replicated and extended the findings from the first PGC-PTSD paper [11••] and also identified potential specific risk loci and genes for PTSD. At the time of writing of this review, the following information about top loci is available from the published abstract [13••]. Stratified analyses revealed two loci (on chromosomes 6 and 13) that exceeded genome-wide significance in the European ancestry analyses (6q25, p = 3.1 × 10−9 and 13q32, p = 2.7 × 10−8). In the African ancestry analyses, a separate locus on chromosome 13 exceeded genome-wide significance (13q.21, p = 3.8 × 10−8). Further, polygenic analyses make it clear that the identification of many more loci will occur once adequate power is achieved. Thus, sample collection is continuing within the PGC-PTSD group. Like those loci identified by in the Million Veterans Program, the next step for the PGC-PTSD loci is “fine-mapping.” Fine-mapping [57•] refers to various analytical and biological procedures used to refine the signal within a particular locus, ideally to the resolution of individual variants causally associated with disease.

In closing this section about GWAS results, we note that the methodological advantages of GWAS (described above) do not imply that GWAS results should be accepted indiscriminately. Rather, consumers of the GWAS literature should be aware of certain guidelines in the evaluation of GWAS results. Above all, sample size has proven to be the best indicator of how many loci will be discovered and how robust findings will prove to be upon investigation in novel samples. Thus, for a given phenotype such as PTSD, a good rule of thumb is that the largest GWAS (i.e., largest N) will likely provide the best information about molecular genetic influences on PTSD. Another indicator of power in GWAS is the presence of a significant SNP-heritability estimate (h2 SNP ).

In general, smaller GWASs can be used to conduct polygenic analyses of heritability and genetic correlations, than are necessary for the identification of individual risk loci [23, 58]. Sample sizes of many tens of thousands of participants have been necessary for risk locus discovery [26••, 37], whereas methods like GCTA [23] can be used to estimate h2 SNP using just thousands of samples. For these reasons, we chose to focus on the MVP and PGC-PTSD GWAS results instead of smaller GWAS studies, which were not even adequately powered for heritability analyses (and by extension, it is less likely that they were adequately powered to detect individual loci). At the same time, it is important to recognize that the progress made by GWAS consortia in recent years would not have been possible without the considerable efforts involved in the conduct of each individual GWAS study. The individual GWAS that made recent consortium findings possible are provided in the reference list [52,53,54,55,56, 59,60,61,62,63,64,65,66] and described in detail in the PGC papers.

Future Directions in Molecular Genetic Studies of PTSD and Related Analyses

Given that large-scale GWAS offers a proven strategy for success in the identification of risk loci for complex genetic phenotypes, the future steps for PTSD genetics are relatively clear. First, many more loci can be identified as current efforts to increase sample size continue. Based on evidence from other phenotypes, it is likely that there will ultimately be thousands of loci associated with PTSD. If we are to realize the full potential of GWAS, which is the identification of entirely novel clues about PTSD etiology, then it makes sense to identify more robust risk loci for PTSD. Next, each locus needs to be investigated using fine-mapping approaches in order to identify the causal variant(s) within each locus. In tandem, genetic analyses that use GWAS data can also identify the specific cell types relevant to PTSD [67]. Further, the strength and direction of genetic relationships between PTSD and other psychiatric and medical disorders can be discovered through genetic correlation analyses [23, 68], as has been successfully achieved for schizophrenia and many other phenotypes [67, 69,70,71].

In addition to the focus of this review—genetic variations—there are important related areas of inquiry, such as gene expression (i.e., transcriptomic) and epigenetic (including methylation) studies. It is important to keep in mind that the same challenges of scope apply to these fields, as they do to genetic association studies. In other words, there is a good chance that very large studies will be necessary to discover robust relationships between gene expression, methylation, and PTSD. Therefore researchers should interpret results from small studies, and those that are not widely replicated, cautiously. Indeed, work within the PGC-PTSD group (unpublished) shows that genome-wide significant methylation results from individual studies are oftentimes not consistent across studies. For this reason, they developed systematic quality control procedures for the analysis of epigenome-wide studies (EWAS) [72], and empirical results are forthcoming. An even greater consideration for gene expression, methylation, and other epigenetic studies is that results are highly variable across cell types and tissues [73,74,75,76]. Thus, results in accessible tissues like blood may only be partially correlated (if at all) with results in relevant brain cell types. Nevertheless, preliminary results are available, for example, gene expression in blood has been examined in a meta-analysis of 540 individuals, combined from multiple individual studies [77]. Finally, additional efforts are underway to quantify molecular genetic effects on trauma exposure (Dalvie et al., in prep); to assess diverse genetic correlations with PTSD (Ratanatharathorn et al., in prep); to understand relationships between genetic variation, brain imaging phenotypes, and PTSD (imaging data only has been published [78]); and to understand relationships among PTSD, genetics, sex, and gender [11••, 79]. As these efforts continue, and in particular as adequately powered studies are created, we can expect important discoveries about genetic, transcriptomic, and epigenetic influences on PTSD.