Neanderthal ancestry in contemporary human populations

We searched for regional similarities to the Neanderthal genome in the genomes of 11 contemporary human populations, which have the best genome coverage in the 1,000 genomes project: three populations of African ancestry—HapMap African ancestry individuals from South West of the United States (ASW), Luhya individuals (LWK) and Yoruba individuals (YRI); three populations of East Asian ancestry—Han Chinese in Beijing (CHB), Han Chinese from South China (CHS) and Japanese (JPT); and five populations of European ancestry—CEPH individuals (CEU), HapMap Finnish individuals from Finland (FIN), British individuals from England and Scotland (GBR), Iberian populations in Spain (IBS) and Toscan individuals (TSI)13. For each pair of the human populations, the human genome sequences were compared with the Neanderthal and the chimpanzee genome sequences. We used the high-coverage Neanderthal genome sequence, obtained from the bone of one individual excavated in 2008 in the east gallery of Denisova Cave in the Altai mountains (Altai)14,15, combined with the low-coverage Neanderthal genome sequence, obtained from three individuals (Vindjia)10, to reconstruct a consensus Neanderthal genome sequence by excluding all sites with sequence variation among individuals. We further used the reference chimpanzee genome in combination with genome sequence data from 10 chimpanzees16 to exclude sites variable among chimpanzees. Only single nucleotide sites displaying sequence differences between the chimpanzee16,17 and the two reference Neanderthal genomes10,15 were used in the analysis (n=1,158,559).

Similar to Green et al.10, we used the D statistic to detect potential Neanderthal gene flow. For each pair of human populations, we defined two configurations for genomic sites: sites where the sequence of an individual from one human population (population A) matched the Neanderthal rather than chimpanzee genotype were assigned to the ABBA configuration; and sites where the sequence of the other human population (population B) matched the Neanderthal rather than chimpanzee genotype were assigned to the BABA configuration (Fig. 1a). For each population, the D statistic, which we will refer to as the fraction of Neanderthal-like sites (NLS), was calculated as the ratio of (#ABBA−#BABA) to (#ABBA+#BABA), where # stands for the number of genomic sites in the specific genotype configuration (ABBA or BABA; see Methods). While D-statistic values reflect relative similarity between the Neanderthal and the modern human genomes tested, they do not provide a quantitative estimate of Neanderthal ancestry, that is, a 5% D-statistic value reflects a higher similarity between population A and Neanderthal compared with that for population B, but does not signify a 5% level of Neanderthal ancestry in the population A genome.

Figure 1: Proportions of NLS in contemporary human populations. (a) Schematic representation of genomic distance calculations between contemporary human populations and Neanderthals. The genomes of out-of-Africa individuals were compared with the genomes of individuals of purely African ancestry (YRI). Single nucleotide differences from the Neanderthal genotype in an African genome were referred to as ‘ABBA’, while sites with the Neanderthal genotype in an out-of-Africa genome were referred to as ‘BABA’. (b) Average proportions of NLS in contemporary African (AF), European (EU) and Asian (AS) populations calculated based on sequence data from the 1,000 genomes project13; blue: genome wide (n=1,158,559 sites), red: LCP genes (n=498 sites). The error bars show the s.d. of the NLS proportion estimates. (c) Genomic distances between 11 contemporary human populations and Neanderthals; blue, genome wide; red, LCP genes. The maximal bar length corresponds to a NLS frequency of 30%. Placement of ASW and CEU individuals in sub-Saharan Africa and Western Europe, respectively, reflects their approximate historical geographical origins rather than their present location. Full size image

In agreement with previous observations10, the genomes of contemporary humans of European and Asian descent showed greater similarities to the Neanderthal genome than did the genomes of the three populations of purely African descent. On average, the NLS frequency was 6.1±0.2% for contemporary humans of European and Asian descent, thus indicating a substantial excess of NLS in contemporary out-of-Africa populations (Fig. 1b,c, blue bars; Supplementary Tables 1–3). This D-statistic estimate is similar to the ones reported by other studies (4.8±0.2%)10, with the higher values obtained in our study potentially arising from the additional filtering of genomic sites polymorphic in Neanderthals. Further, in agreement with other studies10, there was no substantial difference in the genome-wide frequencies of NLS between European and Asian populations, with a slight tendency for higher frequencies in Asians: 5.9±0.08 and 6.2±0.06%, respectively18,19.

For each pair of human populations, we searched for the presence of functional groups of genes showing an unusual excess, or paucity, of NLS. This analysis, based on gene groups compiled according to gene ontology terms20, and conducted using the gene set enrichment analysis (GSEA) algorithm21, yielded an unexpected observation; we indeed find significant clustering of NLS in specific functional groups, but these functional groups differed substantially between contemporary European and Asian populations (Supplementary Data 1). Functional groups showing NLS enrichment in Asian populations mainly represent immune and haematopoietic pathways. The strongest signal of NLS enrichment was, however, observed in contemporary Europeans and included two functional groups: the lipid catabolic process (LCP) and its nested term—cellular LCP (GSEA, permutations P<0.01, significance score >3; Fig. 2a). Specifically, genes in the LCP term had the greatest excess of NLS in populations of European descent, with an average NLS frequency of 20.8±2.6% versus 5.9±0.08% genome wide (two-sided t-test, P<0.0001, n=379 Europeans and n=246 Africans; Fig. 1b,c, red bars; Supplementary Table 4; Supplementary Fig. 1). Further, among examined out-of-Africa human populations, the excess of NLS in LCP genes was only observed in individuals of European descent: the average NLS frequency in Asians is 6.7±0.7% in LCP genes versus 6.2±0.06% genome wide (Supplementary Table 4).

Figure 2: Outstanding genetic features of lipid catabolism genes. (a) Clustering of NLS in functional categories in the genomes of contemporary humans of European (EU) and Asian (AS) descent. The significance scores are proportional to the NES and inversely proportional to the false discovery rates calculated for each gene ontology (GO) term in the biological process category using the GSEA algorithm. Sizes of circles and squares represent NLS fractions in Europeans and Asians, respectively, for each GO term. Numbers near the circles mark the top two GO terms with a functional enrichment significance score greater than three, based on the distribution of Neanderthal-like genomic sites in Europeans: (1) LCP, (2) cellular LCP. (b) Positive selection signals in LCP gene regions estimated using CMS scores. Black squares represent the frequency of sites with elevated CMS scores (>1), potentially indicating genomic regions under recent positive selection in LCP gene regions, normalized by the frequency of such sites in all annotated genes within the same population. The boxplots show the variation of normalized site frequency estimates obtained by 1,000 bootstraps over LCP gene regions (n=38). The boxes show quartiles and the median of the data, the whiskers extend to the minimum and maximum data values located within 0.5 interquartile range from the box. The numbers above the boxplots show the proportion of bootstrap values where the normalized site frequency of elevated CMS scores in Africans (AF) or Asians (AS) was greater than, or equal to, the normalized site frequency in Europeans (EU). Full size image

The excess of NLS observed in LCP genes for populations of European descent was based on a large number of sites (n=498), and was robust to bootstrapping across sites (P<0.01, 1,000 bootstraps, Supplementary Table 5; Supplementary Fig. 2). Notably, NLS were located in 23 independent genomic regions. Among the remaining 15 LCP genes that did not contain NLS, 8 did not contain sites showing divergence between Neanderthals and chimpanzees and the remaining 7 contained only a few such divergent sites (Supplementary Table 6). It is furthermore robust to the potential effects of DNA damage characteristic of ancient DNA samples, as excluding the C/T and A/G substitutions that may stem from deamination of cytosine residues in ancient DNA22,23 did not affect the results (Supplementary Table 4). Repeating the analysis using the genome sequences of African (ASW, LWK, YRI), European (CEU, FIN, GBR, IBS) and East Asian (CHB, JPT) individuals, which were sequenced to deeper coverage at the pilot stage of the 1,000 genomes project, as well as the high-coverage genome sequences of African (ASW, LWK, YRI, MKK—Maasai in Kinyawa, Kenya), European (CEU, TSI) and East Asian (CHB, JPT) individuals provided by the Complete Genomics human diversity set24, confirmed our observations (Supplementary Tables 7 and 8). Finally, repeating the analysis using the high-coverage (Altai) and the low-coverage (Vindjia) Neanderthal genomes, separately, resulted in similar findings (Supplementary Table 9).

Adaptive signature of Neanderthal sequences in LCP genes

The excess of NLS in LCP genes in the genomes of contemporary Europeans may be due to a rapid spread of Neanderthal alleles in European ancestors because of their adaptive significance. Specifically, one may hypothesize that, over time, Neanderthals acquired changes to lipid catabolism, which were beneficial for survival in the environmental conditions of prehistoric Europe and Central Asia. These adaptive variants may then have been acquired by the modern humans through introgression and rapidly brought to high frequency by positive selection. To test this hypothesis, we searched for signatures of positive selection in the genomes of contemporary humans of European, Asian and African decent using composite of multiple signals (CMS) scores25. High CMS values indicate genomic regions under recent positive selection based on three distinct signatures of selection: long-range haplotypes, differentiated alleles and high-frequency-derived alleles. We indeed found a significant excess of high CMS scores in the LCP gene regions of contemporary Europeans but not Asians or Africans (Fig. 2b). This effect was robust at different CMS score cutoffs and was specific to LCP: no significant excess of high CMS scores in individuals of European descent was observed in comparable genomic regions containing other metabolic genes (Supplementary Fig. 3). Furthermore, within the LCP term, high CMS scores found in contemporary Europeans were associated with genes containing the excess of NLS, but were not associated with other LCP genes (two-sample Wilcoxon test, P=0.0003; n=45 and 20; Supplementary Fig. 4).

Metabolic changes associated with Neanderthal ancestry

The observed signatures of positive selection suggest that genetic variants shared with Neanderthals resulted in adaptive changes in lipid catabolism in Europeans, but not in Asians. At a physiological level, these adaptive changes should affect the concentrations of LCP metabolites, and the expression levels of the corresponding metabolic enzymes, in a manner specific to Europeans. To test this, we analysed the lipid composition of prefrontal cortex (PFC) tissue in 14 adult humans of European, African and Asian descent, as well as 14 adult chimpanzees (Supplementary Data 2) using C 8 -reversed phase liquid chromatography coupled to high-resolution mass spectrometry26,27. Out of 4,243 detected mass spectrometric peaks, 1,314 could be computationally matched to lipid compounds belonging to 63 metabolic categories using metabolite annotation databases28,29. After elimination of low confidence matches supported by less than 5 mass spectrometric peaks, 16 metabolic categories, containing 1,253 peaks, remained and were used in further analyses (Fig. 3a; Supplementary Data 2).

Figure 3: Lipid concentration and gene expression divergence in LCP and other metabolic pathways. (a) Principal component analysis based on normalized intensities of 1,314 annotated mass spectrometric peaks corresponding to 63 metabolic categories. Each circle represents an individual: blue, Asians; grey, Africans; red, Europeans; black, chimpanzees. (b) The distribution of lipid concentration divergence estimates measured between chimpanzees and humans of African (AF, n=4 individuals), Asian (AS, n=5 individuals) and European (EU, n=5 individuals) descent for metabolic categories directly linked to LCP genes (red, n=1,090 mass spectrometric peaks) and metabolites in other metabolic pathways (grey, n=163 mass spectrometric peaks). To minimize the influence of environmental differences among populations, metabolic divergence in the LCP term was normalized to the divergence of all other metabolic pathways within the same population. The numbers above the red boxplots show the proportion of values from the LCP divergence distribution obtained by 1,000 bootstraps over individuals within populations that were smaller than, or equal to, the divergence values calculated based on other metabolic pathways represented by the grey boxplots. All boxes in this and the other panels show quartiles and the median of the data, the whiskers extend to the minimum and maximum data values located within 0.5 interquartile range from the box. (c) Principal component analysis based on the expression levels of 25,813 genes. Each circle represents an individual; colours are as in panel a. (d) The distribution of gene expression divergence estimates measured between chimpanzees and human populations for LCP genes directly linked to seven metabolic categories shown in panel b (red, n=6 expressed genes) and other LCP genes (grey, n=26 expressed genes). Normalization procedure and significance estimation were conducted the same way as for metabolite data presented in panel b. Full size image

Seven of these 16 metabolic categories were directly linked with genes in the LCP term (Supplementary Table 10) based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation30. In Europeans, the concentrations of lipids within these seven metabolic categories linked with LCP were more diverged from chimpanzees than the concentrations of lipids in the other nine metabolic categories not linked with LCP (q<0.0001; q value shows the proportion of the LCP divergence values that were smaller than, or equal to, the divergence values in other metabolic categories). By contrast, in contemporary Africans or Asians there were no differences between the concentration divergence of lipids associated with LCP and the lipids associated with other metabolic categories (q>0.1; Fig. 3b). This result was not driven by one or several metabolites, but represented a general property of this metabolite group, as shown by bootstrap analysis (q<0.0001) Further, this result was not caused by differences among population samples with respect to age, sex, tissue preservation or postmortem delay (Supplementary Table 11; Supplementary Fig. 5). We note that, while we cannot control for environmental differences among populations, our analysis is based on the relative divergence of seven metabolic categories associated to LCP and was normalized to the divergence of the other nine metabolic categories not associated with LCP within the same population. This normalization removes the influence of environmental factors affecting LCP and non-LCP metabolites to the same extent. Furthermore, our study design provides no indications that environmental effects should be particular to Europeans: all individuals of European and African descent used for lipidome analysis came from the same region within the United States, while all individuals of Asian descent came from central China.

Expression changes associated with Neanderthal ancestry

We next asked whether the greater concentration divergence of LCP metabolites observed in Europeans could be linked to a similarly accelerated expression level divergence of the corresponding enzymes. To test this, we measured gene expression levels in the 14 human PFC samples used in the lipid analysis, as well as 6 out of 14 chimpanzee PFC samples, using high-throughput RNA sequencing (RNA-seq). For each sample, we obtained an average of 15 million reads, 85% of which could be mapped31 uniquely to the corresponding genome (Fig. 3c; Supplementary Table 12; Supplementary Fig. 6). Using KEGG pathway annotation, six genes annotated in the LCP term and detected in the RNA-seq data could be directly linked to the seven LCP metabolic categories (Supplementary Table 10). For these six genes the expression divergence from chimpanzees, relative to the expression divergence of other LCP genes expressed in PFC, was the largest in Europeans (q<0.0001; q value shows the proportion of the divergence values that were smaller than, or equal to, the divergence values in other LCP genes), intermediate in Asians (q=0.04) and absent in Africans (q=0.59; Fig. 3d; Supplementary Fig. 6). This result was robust, as shown by bootstrap analysis, and was not caused by differences in age, sex, RNA quality or postmortem delay among the three human populations (Supplementary Fig. 7). Thus, accelerated metabolic divergence in the LCP term found in Europeans appears to be linked to an accelerated expression level divergence of the corresponding metabolic enzymes.

Notably, the gene regions of the six LCP enzymes linked to European-specific metabolic changes contained an even higher proportion of NLS in Europeans (31.6±4.1%) than all LCP genes (20.8±2.6%). Furthermore, these NLS were not distributed uniformly within the gene regions, but clustered in the vicinity of transcription start sites, suggesting that they may have a role in causing gene expression level changes seen in Europeans (two-sample Wilcoxon test, P=0.048; n=14 and 27; Supplementary Fig. 8). By contrast, individuals of Asian and African descent did not show any significant excess of NLS within the same gene regions (Fig. 4a).

Figure 4: Potential regulatory effects of NLS in LCP pathway. (a) Average proportions of NLS in contemporary African (AF), European (EU) and Asian (AS) populations calculated based on sequence data from the 1,000 genomes project13; blue, genome wide (GW, n=1,158,559 sites); red, all LCP genes (n=498 sites); black, LCP genes connected to European-specific metabolic changes (MC, n=114 sites). The error bars show the s.d. of the NLS proportion estimates. (b) Relative concentration levels of 2-lysophosphatidylcholine, in three contemporary human populations and in chimpanzees (CH). The boxplots represent the median and the variation of normalized, z-transformed metabolite concentrations in each sample group calculated by 1,000 bootstraps over individuals within populations (n=11 mass spectrometric peaks). The *** indicates significance of 2-lysophosphatidylcholine concentration difference between European (n=5) and chimpanzee (n=14) individuals (P<0.001) estimated by the bootstrapping procedure. (c) Gene expression levels of genes directly linked with 2-lysophosphatidylcholine according to KEGG annotation, in the three human populations and chimpanzees. The boxplots are as in panel b. The *** indicates significance of expression difference between European (n=5) and chimpanzee (n=6) individuals (P<0.001) for genes directly linked with 2-lysophosphatidylcholine (n=21) estimated by the bootstrapping procedure. Full size image

Possible functional implications of changes in LCP

While we find changes in lipid catabolism particular to Europeans at the metabolite concentration and enzyme expression levels, the significance of these changes at the organismal level remains to be investigated. Still, the changes observed at the molecular level provide some clues. Among the seven metabolic categories associated with LCP, 2-lysophosphatidylcholine has been implicated in a number of functions, including reactive oxygen species generation, apoptotic and non-apoptotic death, as well as glucose-dependent insulin secretion32 (Fig. 4b,c). Furthermore, genetic variants linked to obesity (DAVID33, Fisher’s exact test, P<0.01 after multiple testing correction, n=38) hypertriglyceridemia and coronary heart disease, as well as triglycerides and cholesterol levels (DAVID, Fisher’s exact test, P<0.01 after multiple testing correction, n=38) in genome-wide association studies34 show a significant enrichment of LPC genes containing an excess of NLS (Supplementary Table 13). Notably, frequencies of these diseases have been shown to differ between individuals of European descent and other human populations35. These observations support a contribution of Neanderthal genetic variants to the phenotype of contemporary Europeans.