The legacy of human-Neandertal interbreeding Non-African humans are estimated to have inherited on average 1.5 to 4% of their genomes from Neandertals. However, how this genetic legacy affects human traits is unknown. Simonti et al. combined genotyping data with electronic health records. Individual Neandertal alleles were correlated with clinically relevant phenotypes in individuals of European descent. These archaic genetic variants were associated with medical conditions affecting the skin, the blood, and the risk of depression. Science, this issue p. 737

Abstract Many modern human genomes retain DNA inherited from interbreeding with archaic hominins, such as Neandertals, yet the influence of this admixture on human traits is largely unknown. We analyzed the contribution of common Neandertal variants to over 1000 electronic health record (EHR)–derived phenotypes in ~28,000 adults of European ancestry. We discovered and replicated associations of Neandertal alleles with neurological, psychiatric, immunological, and dermatological phenotypes. Neandertal alleles together explained a significant fraction of the variation in risk for depression and skin lesions resulting from sun exposure (actinic keratosis), and individual Neandertal alleles were significantly associated with specific human phenotypes, including hypercoagulation and tobacco use. Our results establish that archaic admixture influences disease risk in modern humans, provide hypotheses about the effects of hundreds of Neandertal haplotypes, and demonstrate the utility of EHR data in evolutionary analyses.

As anatomically modern human (AMH) groups left Africa and began to spread across Europe and Asia ~60,000 years ago, they encountered other archaic hominins. The fossil record suggests that AMHs and several archaic hominins overlapped in space and time (1), and genomic analyses of modern and ancient humans and of extinct Neandertals and Denisovans have revealed interbreeding between these groups (2, 3). As a result, the genomes of modern Eurasians contain a small fraction (~1.5 to 4%) of DNA inherited from interbreeding with Neandertals around 50,000 years ago (4–6).

The patterns of surviving Neandertal DNA across modern Eurasian genomes indicate that introgressed Neandertal DNA experienced strong selective pressures. Surviving Neandertal lineages are significantly depleted in several genomic regions, such as on the X chromosome and the q arm of chromosome 7, suggesting that there are deleterious consequences of Neandertal DNA at many loci (5, 6). However, some Neandertal alleles are found at higher than expected frequencies and thus may have provided an evolutionary advantage to AMH populations (5–7). Consistent with this hypothesis, Neandertals are believed to have lived out of Africa long enough to adapt to the climatic, dietary, and pathogenic landscapes found at higher latitudes.

Indeed, isolated introgressed loci have been identified with potential roles in human adaptation (7, 8). Furthermore, recent studies of genomic regions enriched in Neandertal alleles have suggested potential effects on skin and hair phenotypes, lipid metabolism, depression, and other traits (5, 6, 9). However, whether introgressed Neandertal alleles have a significant functional effect on these traits in human populations has not been established, because of the difficulty of confidently identifying Neandertal-derived DNA and the expense of performing tests for trait association between individuals with and without Neandertal ancestry at specific sites.

We addressed these challenges by integrating the phenotype data present in electronic health records (EHRs) with high-resolution maps of Neandertal haplotypes across individual human genomes (Fig. 1A). We performed a large-scale assessment of the functional effects of DNA inherited from Neandertals on health-related traits in modern populations of European ancestry. In particular, we analyzed genotype and phenotype data from the Electronic Medical Records and Genomics (eMERGE) Network, a consortium that unites EHR systems linked to patient genetic data from nine sites across the United States (10). EHRs contain quantitative and qualitative data on individuals’ traits; however, algorithms are required to derive consistent phenotypes appropriate for use in genetic association testing from these records. For the majority of our analyses, we used a set of 1689 hierarchically related phenotypes (including 1087 leaf phenotypes) defined from the use of International Classification of Diseases (ICD-9) billing codes in the EHRs (11). We analyzed a set of 28,416 adults of European ancestry from across the eMERGE sites who had been genotyped on genome-wide arrays and had sufficient EHR data to define phenotypes. These individuals naturally fell into separate discovery and replication cohorts on the basis of their inclusion in the eMERGE Network Phase 1 (E1; N = 13,686 individuals) or Phase 2 (E2; N = 14,730 individuals) data releases (12).

Fig. 1 Analysis of EHRs reveals clinical effects of Neandertal alleles in modern humans. (A) Thousands of Neandertal alleles were identified in ~28,000 individuals of European ancestry across the eMERGE Network. We derived phenotypes for each individual from data in their EHRs. (B) To test Neandertal alleles in aggregate for phenotype associations, we computed the genetic similarity of all pairs of individuals over 1495 genotyped Neandertal loci and their phenotypic similarity over 46 EHR-derived traits. (C) We estimated the overall variance in risk explained by Neandertal alleles using mixed linear models in GCTA (15) and found that Neandertal alleles explain significant variance in several traits (Table 1). (D) To test individual Neandertal alleles for trait associations, we performed a discovery meta-analysis across eMERGE E1 sites with sufficient data. We then ran a replication meta-analysis over the independent eMERGE E2 cohort. This approach identified and replicated several associations (Table 2). The example forest plot illustrates the association of Neandertal SNP rs3917862 with hypercoagulable state in each site with >=20 cases for the separate discovery and replication analyses. (E) rs3917862 is located in an intron of P-selectin (SELP), a gene that mediates leukocyte action at injuries in the early stages of inflammation. The Neandertal allele is significantly associated (linear regression, P = 0.005) with increased expression of SELP in tibial artery data from GTEx.

To identify Neandertal alleles in the genotyping data available from eMERGE, we used a recent genome-wide map of ~6000 Neandertal haplotypes inferred by computing the S* statistic (13) and refining putative introgressed haplotypes by comparing sequenced individuals from the 1000 Genomes (1KG) Project (14) with the Altai Neandertal genome (3, 6). We defined ~135,000 high-confidence “Neandertal single-nucleotide polymorphisms (SNPs)” among the introgressed haplotypes by filtering out SNPs whose frequency significantly differed from the overall Neandertal haplotype frequency and removing haplotypes with fewer than four likely Neandertal-derived SNPs (11). This filtering was necessary to remove variants unlikely to derive from Neandertal admixture.

Neandertal variants have been hypothesized to influence many phenotypes in AMHs, including lipid metabolism, immunity, depression, digestion, hair, and skin, on the basis of the enrichment of Neandertal variants in regions of the genome relevant to these traits (3, 5, 6, 9). Accordingly, we first tested these hypotheses using genome-wide complex trait analysis (GCTA) to estimate the phenotypic risk explained by 1495 genotyped common (minor allele frequency > 1%) Neandertal SNPs for a set of 46 high-prevalence phenotypes from the hypothesized categories, using age, sex, and eMERGE site as covariates (Fig. 1,B and C) (15). Neandertal SNPs explained a significant [likelihood ratio test; false discovery rate (FDR) < 0.05 over all phenotype tests] percent of the risk in three traits in the E1 discovery cohort (Table 1): depression (2.03%, P = 0.0036), myocardial infarction (1.39%, P = 0.0026), and corns and callosities (1.26%, P = 0.01). Neandertal SNPs also explained a nominally significant (P < 0.1) percent of risk for nine additional traits, including actinic and seborrheic keratosis, coronary atherosclerosis, and obesity (Table 1).

Table 1 Neandertal alleles explain risk for human clinical traits. The eight traits for which Neandertal alleles explained a nominally significant proportion of variance in risk in both the E1 discovery and E2 replication analyses are listed (GCTA, P < 0.1). The depression association remained significant after controlling the false discovery rate at 5%. The Neandertal associations with actinic keratosis, mood disorders, and depression were also maintained in a two-GRM model that considered the risk explained by non-Neandertal variants. Phenotypes are sorted by their E2 P value. View this table:

Of the 12 nominally significant associations, 8 were replicated in the independent E2 data set, including actinic keratosis (P = 0.0059), mood disorders (P = 0.018), depression (P = 0.020), obesity (P = 0.030), and seborrheic keratosis (P = 0.045) at P < 0.1 (Table 1; likelihood ratio test). We also tested whether the percent of phenotypic variance explained by Neandertal SNPs remained significant in the context of non-Neandertal SNPs by including an additional genetic relationship matrix (GRM) computed from non-Neandertal SNPs across the rest of the human genome in the mixed linear model (11). Depression (P = 0.031), mood disorders (P = 0.029), and actinic keratosis (P = 0.036) were replicated with these stricter criteria in the independent E2 cohort.

These analyses establish the influence of Neandertal SNPs in concert on the variance in these traits. We estimated individual effects for each SNP by the best linear unbiased predictions (BLUPs); this indicated that a similar number of Neandertal SNPs increased and decreased risk for each associated phenotype (table S1) (11). To gain insight into the loci driving these associations, we analyzed the genomic distribution of the 10% of SNPs with the highest and lowest BLUPs for actinic keratosis and depression. We found enrichment (FDR < 0.05; hypergeometric test) for many functional annotations: most notably, keratinocyte differentiation and several immune functions for actinic keratosis and regions involved in neurological diseases, cell migration, and circadian clock genes for depression (fig. S1 and table S2) (11).

The significant replicated association of Neandertal SNPs with mood disorders, in particular depression, is intriguing because Neandertal alleles are enriched near genes associated with long-term depression (5), and human-Neandertal DNA and methylation differences have been hypothesized to influence neurological and psychiatric phenotypes (16, 17). Depression risk in modern human populations is influenced by sunlight exposure (18), which differs between high and low latitudes, and we found enrichment of circadian clock genes near the Neandertal alleles that contribute most to this association (11). The replicated nominal association of Neandertal SNPs with actinic keratosis (precancerous scaly skin lesions) further links introgressed alleles in AMHs to a phenotype directly related to sun exposure. It also suggests that the signatures of adaptive introgression and strong enrichment of Neandertal alleles near genes associated with keratin filament formation (5) and keratinocytes (6) reflect the influence of Neandertal alleles on a modern human phenotype. However, further genetic analyses are necessary to resolve the differences in the strength of this association between E1 and E2. These results establish the impact of Neandertal DNA on diseases in AMHs that involve traits potentially influenced by environmental differences experienced by non-African populations.

GCTA quantifies the overall influence of Neandertal SNPs together on traits in AMHs. To identify individual Neandertal loci associated with AMH phenotypes and potentially discover additional biological systems influenced by Neandertal admixture, we performed a phenome-wide association study (PheWAS) of these 1495 Neandertal SNPs with 1152 EHR-derived phenotypes with at least 20 cases in at least one site (Fig. 1D). PheWAS allows for large-scale characterization of the effects of variants of interest (19). We carried out two meta-analyses across the eMERGE Network sites: one over the discovery cohort and one over the replication cohort. We focus on the meta-analyses here (Table 2 and table S3), but a pooled analysis using the eMERGE site as a covariate produced largely consistent results (table S4).

Table 2 Individual Neandertal SNPs with significant replicating phenotype associations. Four locus-wise Bonferroni significant Neandertal SNP–phenotype associations replicated (with a fixed effect P < 0.05 and consistent direction of effect). Nominally significant replicating results can be found in table S3 and in the PheWAS Catalog (https://phewas.mc.vanderbilt.edu/neanderthal). Chr, chromosome. View this table:

Four Neandertal SNP–phenotype associations passed a locus-wise Bonferroni corrected significance threshold (P = 3.3 × 10–5) in the E1 meta-analysis and were replicated (P < 0.05) with the same direction of effect in the independent E2 meta-analysis (Table 2). The strongest signal was a Neandertal SNP (rs3917862, 6.5% European (EUR) 1KG frequency) in an intron of P-selectin (SELP) that was significantly associated with hypercoagulable state in both E1 and E2 (Table 2; Fig. 1D). This haplotype contains several genes directly involved in blood coagulation and inflammation, most notably SELP, which encodes a cell adhesion protein expressed on the surface of endothelial cells and platelets that recruits leukocytes to injuries during inflammation.The gene encoding factor V (F5), a coagulation cofactor associated with several coagulation defects, is located ~37 kilobases downstream. The Neandertal haplotype overlaps histone modifications suggestive of gene-regulatory activity in blood cells and vein epithelial cells (fig. S2). Using data from the Genotype-Tissue Expression (GTEx) Project (20), we found indications that the Neandertal allele at rs3917862 significantly increased the expression of SELP (P = 0.005) and F5 (P = 0.05) in arteries (Fig. 1E and fig. S2). The association may be influenced by the F5 Leiden (F5L) mutation associated with hypercoagulable state; however, the Neandertal allele appears to have an additional influence on risk (11). The Neandertal SNP is in low linkage disequilibrium (LD) with F5L (r2= 0.07, D′ = 0.42), and increases risk for venous thromboembolism, beyond the risk associated with F5L (21). Furthermore, manual review of the EHRs for 16 hypercoagulable state cases revealed a diverse set of causes, and only 4 out of the 11 individuals tested for F5L had the mutation. Due to the direct interaction of coagulation factors with pathogens, these genes have been common targets of positive selection across vertebrate evolution, and F5 has experienced positive selection in primates (22). Thus, it is possible that this Neandertal haplotype and the associated hypercoagulability provided an advantage in early AMHs outside of Africa.

The second replicating association was a SNP (rs12049593, 5.0% EUR frequency) in an intron of SLC35F3, a putative thiamine transporter, which associates with protein-calorie malnutrition. Thiamine is crucial to carbohydrate metabolism for all cells, particularly those with increased energy requirements (23). Variants in high LD with this SNP (r2 > 0.8, D′ = 1) are found in regions bearing enhancer histone marks in the gastrointestinal (GI) tract, brain, and other tissues. Decreased expression of this transporter in the brain or GI tract could exacerbate malnutrition or its symptoms. It is possible that new dietary pressures may have caused changes in carbohydrate metabolism to be beneficial in early human migrants out of Africa; indeed, there is evidence suggesting that Neandertal introgression probably influenced lipid catabolism in Europeans (9). More recently, the reduction of thiamine present in foods from the grain-refining process, as well as an increased intake of simple carbohydrates, make this a potentially harmful allele, because it could reduce thiamine availability although modern diets increase demand.

Another Neandertal SNP (rs11030043, 9.0% EUR frequency) is upstream of stromal interaction molecule 1 (STIM1) and is associated with a phenotype encompassing incontinence, bladder pain, and urinary tract disorders (fig. S4A). STIM1 is a ubiquitously expressed gene involved in intracellular calcium signaling. Variants in high LD with the Neandertal SNP are found in regions bearing enhancer histone marks and DNase I hypersensitive sites in the brain. Because of this, we examined whether this SNP was associated with gene expression levels in brain tissues in GTEx. The Neandertal allele is associated with significantly decreased expression of STIM1 in the caudate basal ganglia (P = 0.02; fig. S4B), a region of the brain connected to bladder dysfunction, particularly in those with neurological conditions such as Parkinson’s (24).

The last replicated association was between rs901033 (0.5% EUR frequency) and tobacco use disorder. This SNP is in an intron of SLC6A11, a solute carrier family neurotransmitter transporter that is responsible for reuptake of the neurotransmitter γ-aminobutyric acid (GABA). Nicotine addiction disrupts GABAergic signaling in the brain and reduces expression of SLC6A11 (25). This is the second Neandertal SNP to be associated with smoking risk (5).

To test whether Neandertal SNPs were enriched for association with specific classes of phenotypes, we compared the distribution of replicating phenotype associations for a set of 1056 LD-pruned (r2 < 0.5) Neandertal SNPs (table S5) with the associations found in five allele frequency matched non-Neandertal SNP sets (consisting of a total of 5280 SNPs) at a relaxed PheWAS discovery threshold (P < 0.001) (11). Overall, the Neandertal SNPs appear to influence significantly different classes of phenotypes (chi-squared test, P = 0.017; Fig. 2). In particular, Neandertal SNPs were consistently associated with more neurological and psychiatric phenotypes and fewer digestive phenotypes (Fig. 2) (11).

Fig. 2 Neandertal SNPs associate with different phenotypes than matched non-Neandertal SNPs. Each bar gives the difference between the number of replicated Neandertal SNP associations with a phenotype group (at a relaxed discovery threshold of P < 0.001) and the number expected from a PheWAS over five sets of non-Neandertal sites matched to the allele frequency of tested Neandertal SNPs. The phenotype distributions were significantly different (chi-squared test, P = 0.017), with more Neandertal SNPs associated with neurological and psychiatric phenotypes than expected and fewer digestive phenotypes. The enrichment and depletion were consistent across all five matched non-Neandertal sets (* indicates P < 0.05 for all five comparisons; binomial test) (11).

Given the enrichment for associations with psychiatric and neurological phenotypes, we tested whether Neandertal SNPs were significantly associated with changes in gene expression in previous expression quantitative trait loci (eQTL) analyses of the cerebellum and temporal cortex. Twenty-nine Neandertal SNPs were significant brain cis-eQTL in the cerebellum or temporal cortex (FDR < 0.05) (11). This represents significant enrichment for brain eQTL among Neandertal SNPs as compared to the non-Neandertal control SNPs (one-tailed binomial test; P = 1.68 × 10−4 for the cerebellum and P = 3.49 × 10−5 for the temporal cortex) (11). Taken together, the influence of Neandertal SNPs on depression risk (Table 1), the association of individual Neandertal SNPs with diseases with a neurological basis (Table 2), the enrichment for nominal associations with psychiatric and neurological phenotypes (Fig. 2), and the enrichment for brain eQTL in Neandertal SNPs suggest that Neandertal introgression influenced AMH brain phenotypes.

Our approach establishes a new method for understanding the phenotypic legacy of admixture between AMHs and archaic hominins. Using a large clinical cohort, we discovered functional associations between Neandertal alleles and AMH traits, influencing the skin, the immune system, depression, addiction, and metabolism. Furthermore, several lines of evidence suggest enrichment for associations between Neandertal alleles and neurological and psychiatric phenotypes, as well as the importance of differences in sun exposure between high and low latitudes. It is possible that some Neandertal alleles provided a benefit in early AMH populations as they moved out of Africa, but have become detrimental in modern Western environments.

EHR data, paired with DNA sequencing, hold promise for characterizing the phenotypic impact of regions identified through evolutionary analyses. However, there are currently limitations to this approach. It is difficult to extract nonclinical phenotypes from EHRs, and we were not able to analyze all Neandertal haplotypes due to the limited coverage of the available genotyping data. Nonetheless, as EHRs are increasingly linked to whole-genome sequence data and more sophisticated methods are developed for extracting phenotypes from the rich data stored in EHRs, we anticipate further insights into the functional effects of archaic introgression.

Supplementary Materials www.sciencemag.org/content/351/6274/737/suppl/DC1 Materials and Methods Supplementary Text Figs. S1 to S4 Tables S1 to S5 References (26–51)