We brought together studies from all over the world to perform GWA study meta-analyses in over 13,000 long-lived individuals of diverse ethnic background, including European, East Asian and African American ancestry, to characterise the genetic architecture of human longevity. We used the 1000 Genomes reference panel for imputation to expand the coverage of the genome in comparison to previous GWA studies of longevity. Consistent with previous reports, rs429358, defining ApoE ε4, was associated with decreased odds of becoming long-lived. Moreover, we report a genome-wide significant association of rs7412, defining ApoE ε2, with increased odds of becoming long-lived. We additionally found a genome-wide significant association of a locus near GPR78. Gene-level association analysis revealed association of increased KANSL1, CRHR1, ARL17A, and LRRC37A2 expression and decreased ANKRD31 and BLOC1S1 expression with increased odds of becoming long-lived. Genetic correlation analysis showed that our longevity phenotypes are genetically correlated with father’s age at death, CAD and T2D-related phenotypes.

Genetic variation in APOE is well known to be associated with longevity and lifespan, with the first report more than two decades ago in a small candidate gene study27. Since then, there have been numerous candidate gene studies, including individuals of diverse ancestry, which have identified associations of ApoE with longevity28,29,30,31,32. However, thus far, rs7412, the ApoE ε2-defining, genetic variant has not been reported to show a genome-wide significant association in GWA studies of longevity and lifespan. This could be due to the fact that we performed imputation using the 1000 Genomes reference panel, while earlier GWA studies used the HapMap reference panel, which has limited coverage of this variant. ApoE mediates cholesterol metabolism in peripheral tissues and is the principal cholesterol carrier in the brain. The ApoE ε2 and ε4 variants have previously been associated with a decreased (ε2) or increased (ε4) risk for several age-related diseases, such as cardiovascular disease and Alzheimer’s disease33, which could explain their effect on longevity. The fact that the two variants in ApoE show opposite effects may be attributable to differences in structural and biophysical properties of the protein, since ApoE ε2 shows high stability and ApoE ε4 low stability upon folding34.

We also found a genome-wide significant association of rs7676745, located on chromosome 4 near GPR78. We have to note that this locus would benefit from replication in independent cohorts in the future, given that we were not able to replicate this variant in the cohorts in which de novo genotyping was applied. There is no report of association of this locus with other traits according to Phenoscanner (http://www.phenoscanner.medschl.cam.ac.uk/)35, although other genetic variants in this gene have been associated with several diseases and traits in the UK Biobank, including death due to a variety of disorders. The GPR78 protein, belongs to the family of G-protein-coupled receptors, whose main function is to mediate physiological responses to various extracellular signals, including hormones and neurotransmitters36. However, the specific function of GPR78 is still largely unknown, although it has been shown to play a role in lung cancer metastasis37.

To maximise power for discovery, we meta-analysed results from all of the studies that contained long-lived individuals that met our 90th and/or 99th percentile case definitions, had genome-wide genetic data, and were able to participate. Hence, we were not able to replicate our findings in an independent cohort with genome-wide genotype data and participants reaching the age of our case definitions. Therefore, we tried to validate our findings using two related phenotypes, parental longevity and lifespan, in the UK Biobank. We applied our case and control definitions to the parental lifespan of genotyped middle-aged UK Biobank participants rather than the participants themselves, as none of the latter fulfilled the age criteria for cases in our study. Although this resulted in relatively large data sets for both the 90th and 99th percentile analysis, the power to replicate our findings using the parental longevity traits was lower in comparison to replication using the traits based on the genotyped individuals themselves, since these individuals share only half of their parental genomes. In addition, many of the genotyped individuals, who were 40–69 years at recruitment, will never reach the age belonging to the 90th, let alone the 99th, percentile of their birth cohort. This may explain why we were unable to validate any of our suggestive associations (P ≤ 1 × 10−6), with the exception of the genetic variants at the APOE locus in these data sets. On the other hand, we were able to validate one additional locus, CDKN2A/B, in the parental lifespan data set. This is not surprising, since this locus had already been reported to associate with parental lifespan20. However, it is unclear why our reported variants at this locus, rs7039467 and rs2184061, are not associated with parental longevity, given that the most significant parental lifespan-associated variant at this locus, rs1556516, also shows a nominal significant effect on parental longevity (see Supplementary Table 2). We hypothesise that this may be due to a difference in the LD structure of the reference panels used for imputation.

We were able to detect significant genetic associations at two previously identified longevity/lifespan-related loci, FOXO3 and CDKN2A/B. For the other loci, we did not find evidence for replication (P > 7.8 × 10−4), despite having adequate power (≥ 0.8) for replication of all but one of the examined genetic variants (rs28926173) associated with the discrete longevity phenotypes. We were not able to calculate our power to replicate the variants associated with the continuous lifespan-related phenotypes, although we should have had adequate power to replicate variants with a minor allele frequency (MAF) > 12% and an OR > 1.1 (based on the 90th percentile versus all controls analysis). However, several of the variants associated with parental lifespan show a directionally consistent and nominal significant association with our phenotypes, indicating they may also be relevant for longevity. The failure to replicate previously reported loci could be due to the use of a different longevity phenotype then what was used in previous studies, the small effect size of some of the variants associated with parental lifespan, and the modest power of our study. The fact that we detect significant associations of variants in the FOXO3 locus is not surprising, since this locus was previously reported in the longevity GWA study from the CHARGE consortium7, from which many cohorts are included in these meta-analyses. So far, three functional longevity-associated variants have been identified at the FOXO3 locus (rs2802292, rs12206094, and rs4946935). For all of them, an allele-specific response to cellular stress was observed. Consistently, the longevity-associated alleles of all three variants were shown to induce FOXO3 expression38,39. The CDKN2A/B locus has previously been associated with parental lifespan and parents’ attained age in the UK Biobank as well as a diversity of age-related diseases13,20,40. The longevity-associated allele of the most significant variant at this locus (rs1556516) has also been associated with lower odds of developing CAD41. Although the molecular mechanism behind this association is still unclear, it is known that genes encoded at the CDKN2A/B locus are involved in cellular senescence42, a known hallmark of ageing in animal models43.

The gene-level association analysis identified several associations between increased (KANSL1, CRHR1, ARL17A, and LRRC37A2) or decreased (ANKRD31 and BLOC1S1) genetically driven tissue-specific gene expression with survival to the 90th percentile age. The increased expression of KANSL1, CRHR1, ARL17A, and LRRC37A2 on chromosome 17q21.31 is regulated by different genetic variants, indicating that these associations may be independent. More functional work is needed to determine the exact relationship between the altered genetically driven tissue-specific expression of these genes and longevity in humans.

A limitation of MetaXcan is that the underlying GTEx models might not have been adequately adjusted for age, which could be problematic for an age-related phenotype like longevity. However, MetaXcan has successfully been used to identify gene-level associations with age-related diseases and traits, such as Alzheimer’s disease and age-related macular degeneration25.

The genetic correlation analyses showed that survival to ages corresponding to the 90th and 99th percentile shared genetic associations with father’s age at death, CAD and T2D-related phenotypes, suggesting that survival to old ages may at least partially be explained by protective influences on the mechanisms underlying these traits. The genetic correlation with CAD and T2D-related phenotypes is expected, since it has previously been reported that individuals from long-lived families show a decreased prevalence of cardiovascular disease and T2D44,45. The higher genetic correlation of our longevity phenotypes with father’s in comparison to mother’s age at death may be explained by the difference in the prevalence of cardiovascular diseases and T2D between men and women in the last century46,47, which may be, at least partially, attributable to a difference in smoking prevalence48. Hence, the correlation of our longevity phenotypes with the parental age at death phenotypes from UK Biobank is likely due to the absence of death from specific diseases (i.e., those with a higher prevalence in men). For longevity-specific loci, on the other hand, one would expect that they will have beneficial effects on multiple diseases simultaneously, since long-lived individuals show a delay in overall morbidity49.

Our study design imposed an age gap between cases and controls to reduce outcome misclassification, which we expected could potentially increase power by increasing the genetic effect size. It has been correctly noted that longevity study designs that include an age gap between cases and controls result in an effect estimate that is based on an OR and a relative risk (RR) term, which could lead to the identification of genetic variant associations related to early mortality (OR), rather than survival past the case age threshold (RR) (for more details see Sebastiani et al.)50. However, we have presented evidence that imposing a case–control age gap did not greatly influence our results or prevent our replication of variant associations previously discovered using study designs without a case–control age gap. First, our sensitivity analysis indicated that reducing the age gap between cases and controls had a minimal effect on our results. Our sensitivity analysis compared results using dead controls, where all individuals had died before they reached the 60th percentile age, and all controls, which included dead controls and individuals whose age at last contact was below the 60th percentile age but whose age of death was unknown. There is likely to be some outcome misclassification of the living controls, since a small percentage may survive beyond the age corresponding to the 90th or 99th survival percentile. On the other hand, the age gap between cases and controls was narrower for all controls compared to dead controls. However, despite the narrower age gap, the suggestively significant results in all controls and dead controls comparisons with 90th percentile cases were essentially unchanged, and there was a very high genetic correlation between the results of these two meta-analyses, indicating that the age gap had little or no impact on our results. Second, if we had discovered a large number of genome-wide significant variant associations in our study, it could be argued that the OR, reflecting early mortality, contributed to some or all of them. However, the only genome-wide significant variant associations we detected were in the APOE locus, which have been identified using multiple study designs, including designs with no pre-specified age gap between cases and controls14, and the GPR78 locus. Third, it is unlikely that our study design prevented the replication of findings from previous GWA studies of survival to extreme ages (i.e., 99th percentile cases) that did not include a case–control age gap, since such studies would only identify variants associated with survival past the minimum case age and not with early mortality. For variants with no early mortality association, it would be expected that the association estimate in our study would have an OR equal to one and a RR greater than one. Nothing prevents our study design from also detecting this type of variant association, as our estimated association parameter reflects both the OR and RR.

The majority of the previously performed GWA studies of longevity used the survival of individuals to a pre-defined age threshold (i.e., 85, 90, or 100 years) as selection criterion to define long-lived cases. Although these studies used a consistent phenotype for each cohort included in the GWA study, this type of selection may gave rise to heterogeneity, given that survival probabilities differ between sexes and birth cohorts22. Moreover, it was recently shown that the heritable component of longevity is strongest in individuals belonging to the top 10% survivors of their birth cohort6. Hence, instead of using a pre-defined age threshold, we decided to select cases based on country-, sex- and birth cohort-specific life tables. For the definition of controls we used the 60th percentile age, since we wanted to include as many controls as possible (preferably from the same cohort as the cases), while leaving a large enough age gap between our cases and controls. Using the 1920 birth cohort as an example, the difference between the 60th and 90th percentile age is 14 years (men) or 11 years (women), which is quite substantial. The difference between the 70th and 90th percentile age, on the other hand, is considerably smaller (9 years (men) or 7 years (women)) and the living controls are more likely to reach the 90th percentile age, which increases the risk of outcome misclassification. Moreover, even when selecting the 60th percentile controls from much later birth cohorts (i.e., 1940) than the cases (i.e., 1900) the ages will not overlap.

Our study has several limitations. First, we did not analyse the sex and mitochondrial chromosomes, since we were unable to gather enough cohorts that could contribute to the analysis of these chromosomes. However, these chromosomes may harbour loci associated with longevity that we thus have missed. Second, although we included as many cohorts as possible, the sample size of our study is still relatively small (especially for the 99th percentile analysis) in comparison to GWA studies of age-related diseases, such as T2D and cardiovascular disease, and parental age at death11,51,52. Hence, this limited our power to detect loci with a low MAF (<1%) that contribute to longevity. Third, we did not perform sex-stratified analyses and may thus have missed sex-specific longevity-related genetic variants. The reason for this is that (1) we only identified a limited number of suggestive significant associations in our unstratified 90th and 99th percentile analyses, (2) our sample size is modest (especially when stratified by sex), and (3) thus far, there has been no report of any genome-wide significant sex-specific longevity locus.

Given that we have included nearly all cohorts with long-lived individuals with genome-wide genetic data in our study, it will be challenging to increase the sample size in future GWA studies using the same extreme phenotypes. Future genetic studies of longevity may therefore benefit from the use of alternative phenotypes or more rigorous phenotype definitions. Alternative phenotypes that could be used are the parental lifespan or healthspan-related phenotypes that were analysed in the UK Biobank or biomarkers of healthy aging20,53,54. One way to strengthen the longevity phenotype is by selecting cases from families with multiple individuals belonging to the top 10% survivors of their birth cohort6. Moreover, given the limited number of longevity-associated genetic variants identified through GWA studies and the availability of affordable exome and whole-genome sequencing, future genetic studies of longevity may also benefit from the analysis of rare genetic variants. Ideally, such studies should also try to include participants from genetically diverse populations. Most cohorts that are currently included in genetic longevity studies originate from populations of European descent, while some longevity loci may be specific for non-European populations, as exemplified by the previously reported genome-wide associations of genetic variants in IL6 and ANKRD20A9P in Han Chinese9. Moreover, a recent genetic study of multiple complex traits has shown the benefit of analysis of diverse populations55.

In conclusion, we performed a genome-wide association study of longevity-related phenotypes in individuals of European, East Asian and African American ancestry and identified the APOE and GPR78 loci to be associated with these phenotypes in our study. Moreover, our gene-level association analyses highlight a role for tissue-specific expression of genes at chromosome 5q13.3, 12q13.2, 17q21.31, and 19q13.32 in longevity. Genetic correlation analyses show that our longevity-related phenotypes are genetically correlated with several disease-related phenotypes, which in turn could help to identify phenotypes that could be used as potential biomarkers for longevity in future (genetic) studies.