Genetic clustering and uniparentally inherited markers

We report genome-wide data at a targeted set of 1.2 million single nucleotide polymorphisms (SNPs)18,27 for 59 Eneolithic and BA individuals from the Caucasus region. After filtering out 14 individuals that were first-degree relatives or showed evidence of contamination (Supplementary Data 1, Supplementary Note 3) we retained 45 individuals for downstream analyses using a cutoff of 30,000 SNPs. We merged our newly generated samples with previously published ancient and modern data (Supplementary Data 2). We first performed principal component analysis (PCA)28 and ADMIXTURE29 analysis to assess the genetic affinities of the ancient individuals qualitatively (Fig. 2). Based on PCA and ADMIXTURE plots we observe two distinct genetic clusters: one falls with previously published ancient individuals from the West Eurasian steppe (hence termed ‘Steppe’), and the second clusters with present-day southern Caucasian populations and ancient BA individuals from today’s Armenia (henceforth called ‘Caucasus’), while a few individuals take on intermediate positions between the two. The stark distinction seen in our temporal transect is also visible in the Y-chromosome haplogroup distribution, with R1/R1b1 and Q1a2 types in the Steppe and L, J, and G2 types in the Caucasus cluster (Fig. 3a, Supplementary Data 1, Supplementary Note 4). In contrast, the mitochondrial haplogroup distribution is more diverse and similar in both groups (Fig. 3b, Supplementary Data 1).

Fig. 3 Results from uniparentally inherited markers. Comparison of Y-chromosome a and mitochondrial haplogroup b distribution in the Steppe and Caucasus cluster Full size image

The two distinct clusters are already visible in the oldest individuals of our temporal transect, dated to the Eneolithic period (~6300–6100 yBP/4300–4100 calBCE). Three individuals from the sites of Progress 2 and Vonyuchka 1 in the North Caucasus piedmont steppe (‘Eneolithic steppe’), which harbour EHG and CHG related ancestry, are genetically very similar to Eneolithic individuals from Khvalynsk II and the Samara region18,22. This extends the cline of dilution of EHG ancestry via CHG-related ancestry to sites immediately north of the Caucasus foothills (Fig. 1c; Fig. 2d).

In contrast, the oldest individuals from the northern mountain flank itself, which are three first-degree-related individuals from the Unakozovskaya cave associated with the Darkveti-Meshoko Eneolithic culture (analysis label ‘Eneolithic Caucasus’) show mixed ancestry mostly derived from sources related to the Anatolian Neolithic (orange) and CHG/Iran Neolithic (green) in the ADMIXTURE plot (Fig. 2c). While similar ancestry profiles have been reported for Anatolian and Armenian Chalcolithic and BA individuals9,19, this result suggests the presence of this mixed ancestry north of the Caucasus as early as ~6500 years ago.

Ancient North Eurasian ancestry in Steppe Maykop individuals

Four individuals from mounds in the grass steppe zone, archaeologically associated with the ‘Steppe Maykop’ cultural complex (Supplementary Notes 1 and 2), lack the Anatolian farmer-related (AF) component when compared to contemporaneous Maykop individuals from the foothills. Instead they carry a third and fourth ancestry component that is linked deeply to Upper Paleolithic Siberians (maximized in the individual Afontova Gora 3 (AG3)30,31 and Native Americans, respectively, and in modern-day North Asians, such as North Siberian Nganasan (Supplementary Data 3). To illustrate this affinity with ‘ancient North Eurasians’ (ANE)21, we also ran PCA with 147 Eurasian (Supplementary Fig. 1A) and 29 Native American populations (Supplementary Fig. 1B). The latter represents a cline from ANE-rich steppe populations such as EHG, Eneolithic individuals, AG3 and Mal’ta 1 (MA1) to modern-day Native Americans at the opposite end. To formally test the excess of alleles shared with ANE/Native Americans we performed f 4 -statistics of the form f 4 (Mbuti, X; Steppe Maykop, Eneolithic steppe), which resulted in significantly positive Z-scores (Z >3) for AG3, MA1, EHG, Clovis and Kennewick for the ancient populations and many present-day Native American populations (Supplementary Table 1). Based on these observations we used qpWave and qpAdm methods to model the number of ancestral sources contributing to the Steppe Maykop individuals and their relative ancestry coefficients. Simple two-way models of Steppe Maykop as an admixture of Eneolithic steppe, AG3 or Kennewick do not fit (Supplementary Table 2). However, we could successfully model Steppe Maykop ancestry as being derived from populations related to all three sources (p-value 0.371 for rank 2): Eneolithic steppe (63.5 ± 2.9%), AG3 (29.6 ± 3.4%) and Kennewick (6.9 ± 1.0%) (Fig. 4; Supplementary Table 3). We note that the Kennewick related signal is most likely driven by the East Eurasian part of Native American ancestry as the f 4 -statistics (Steppe_Maykop, Fitted Steppe_Maykop; Outgroup1, Outgroup2) show that the Steppe Maykop individuals share more alleles not only with Karitiana but also with Han Chinese (Supplementary Table 2).

Fig. 4 Modelling results for the Steppe and Caucasus cluster. Admixture proportions based on (temporally and geographically) distal and proximal models, showing additional AF ancestry in Steppe groups (a) and additional gene flow from the south in some of the Steppe groups as well as the Caucasus groups (b) (see also Supplementary Tables 10, 14 and 19) Full size image

Characterising the Caucasus ancestry profile

The Maykop period, represented by 12 individuals from eight Maykop sites (Maykop, n = 2; a cultural variant ‘Novosvobodnaya’ from the site Klady, n = 4; and Late Maykop, n = 6) in the northern foothills appears homogeneous. These individuals closely resemble the preceding Eneolithic Caucasus individuals and present a continuation of the local genetic profile. This ancestry persists in the following centuries at least until ~3100 yBP (1100 calBCE), as revealed by individuals from Kura-Araxes from both the northeast (Velikent, Dagestan) and the South Caucasus (Kaps, Armenia), as well as MBA/LBA individuals (e.g. Kudachurt, Marchenkova Gora) from the north. Overall, this Caucasus ancestry profile falls among the ‘Armenian and Iranian Chalcolithic’ individuals and is indistinguishable from other Kura-Araxes individuals (Armenian EBA) on the PCA plot (Fig. 2), suggesting a dual origin involving Anatolian/Levantine and Iran Neolithic/CHG ancestry, with only minimal EHG/WHG contribution possibly as part of the AF ancestry9.

Admixture f 3- statistics of the form f 3 (X, Y; target) with the Caucasus cluster as target resulted in significantly negative Z scores (Z < −3) when CHG (or AG3 in Late Maykop) were used as one and Anatolian farmers as the second potential source (Supplementary Table 4). We also used qpWave to determine the number of streams of ancestry and found that a minimum of two is sufficient (Supplementary Table 5).

We then tested whether each temporal/cultural group of the Caucasus cluster could be modelled as a simple two-way admixture by exploring all possible pairs of sources in qpWave. We found support for CHG as one source and AF ancestry or a derived form such as is found in southeastern Europe as the other (Supplementary Table 6). We focused on mixture models of proximal sources (Fig. 4b) such as CHG and Anatolian Chalcolithic for all six groups of the Caucasus cluster (Eneolithic Caucasus, Maykop and Late Makyop, Maykop-Novosvobodnaya, Kura-Araxes, and Dolmen LBA), with admixture proportions on a genetic cline of 40–72% Anatolian Chalcolithic related and 28–60% CHG related (Supplementary Table 7). When we explored Romania_EN and Bulgaria_Neolithic individuals as alternative southeast European sources (30–46% and 32–49%), the CHG proportions increased to 54–70% and 51–68%, respectively. We hypothesize that alternative models, replacing the Anatolian Chalcolithic individual with yet unsampled populations from eastern Anatolia, South Caucasus or northern Mesopotamia, will likely also provide a fit to some of the tested Caucasus groups. Models with Iran Neolithic as substitute for CHG could also explain the data in a two-way admixture with the combination of Armenia Chalcolithic or Anatolia Chalcolithic as the other source. However, models replacing CHG with EHG received no support (Supplementary Table 8), indicating no strong influence for admixture from the adjacent steppe to the north. We also found no direct evidence of EHG or WHG ancestry in Caucasus groups (Supplementary Table 9), but observed that Kura-Araxes and Maykop-Novosvobodnaya individuals had likely received additional Iran Chalcolithic-related ancestry (24.9% and 37.4%, respectively; Fig. 4; Supplementary Table 10).

Characterising the Steppe ancestry profile

Individuals from the North Caucasian steppe associated with the Yamnaya cultural formation (5300–4400 BP, 3300–2400 calBCE) appear genetically almost identical to previously reported Yamnaya individuals from Kalmykia19 immediately to the north, the middle Volga region18,22, Ukraine, and to other BA individuals from the Eurasian steppes who share the characteristic ‘steppe ancestry’ profile as a mixture of EHG and CHG-related ancestry9,13. These individuals form a tight cluster in PCA space (Fig. 2) and can be shown formally to be a mixture by significantly negative admixture f 3 -statistics of the form f 3 (EHG, CHG; target) (Supplementary Fig. 2). This cluster also involves individuals of the North Caucasus culture (4800–4500 BP, 2800–2500 calBCE) in the piedmont steppe, who share the steppe ancestry profile, as do individuals from the Catacomb culture in the Kuban, Caspian and piedmont steppes (4600–4200 BP, 2600–2200 calBCE), which succeeded the Yamnaya horizon.

The individuals of the MBA post-Catacomb horizon (4200–3700 BP, 2200–1700 calBCE) such as Late North Caucasus and Lola cultures represent both ancestry profiles common in the North Caucasus: individuals from the mountain site Kabardinka show a typical steppe ancestry profile, whereas individuals from the site Kudachurt 90 km to the west or our most recent individual from the western LBA Dolmen culture (3400–3200 BP, 1400–1200 calBCE) retain the ‘southern’ Caucasus profile. In contrast, one Lola culture individual resembles the ancestry profile of the Steppe Maykop individuals.

Admixture into the steppe zone from the south

Evidence for interaction between the Caucasus and the Steppe clusters is visible in our genetic data from individuals associated with the later Steppe Maykop phase around 5300–5100 years ago. These ‘outlier’ individuals were buried in the same mounds as those with steppe and in particular Steppe Maykop ancestry profiles but share a higher proportion of AF ancestry visible in the ADMIXTURE plot and are also shifted towards the Caucasus cluster in PC space (Fig. 2d). This observation is confirmed by formal D-statistics (Supplementary Fig. 3). By modelling Steppe Maykop outliers successfully as a two-way mixture of Steppe Maykop and representatives of the Caucasus cluster (Supplementary Table 3), we can show that these individuals received additional ‘Anatolian and Iranian Neolithic ancestry’, most likely from contemporaneous sources in the south. We used ALDER32 to estimate an average admixture time for the observed farmer-related ancestry in Steppe Maykop outliers of 20 generations or 560 years ago (Supplementary Note 5).

Anatolian farmer-related ancestry in steppe groups

Eneolithic Samara individuals form a cline in PC space running from EHG to CHG (Fig. 2d), which is continued by the newly reported Eneolithic steppe individuals. However, the trajectory of this cline changes in the subsequent centuries. Here we observe a cline from Eneolithic_steppe towards the Caucasus cluster. We can qualitatively explain this ‘tilting cline’ by developments south of the Caucasus, where Iranian and AF ancestries continue to mix, resulting in a blend that is also observed in the Caucasus cluster, from where it could have spread onto the steppe. The first appearance of ‘combined farmer-related ancestry’ in the steppe zone is evident in Steppe Maykop outliers. However, PCA results suggest that Yamnaya and later groups of the West Eurasian steppe carry also some farmer-related ancestry as they are slightly shifted towards ‘European Neolithic groups’ in PC2 (Fig. 2d) compared to the preceding Eneolithic steppe individuals. The ‘tilting cline’ is also confirmed by admixture f 3 -statistics, which provide statistically significant negative values for AG3 and any AF group as the two sources (Supplementary Table 11). Using f- and D-statistics we also observe an increase in farmer-related ancestry (both Anatolian and Iranian) in our Steppe cluster, distinguishing the Eneolithic steppe from later groups. In addition, we find the Caucasus cluster or Levant/AF groups to share more alleles with Steppe groups than with EHG or Samara_Eneolithic (Supplementary Figs. 4 and 5). MLBA groups such as Poltavka, Andronovo, Srubnaya, and Sintashta show a further increase of AF ancestry consistent with previous studies9,22, reflecting different processes not directly related to events in the Caucasus (Supplementary Fig. 6).

We then used qpWave and qpAdm to explore the number of ancestry sources for the AF component to evaluate whether geographically proximate groups contributed plausibly to the subtle shift of Eneolithic ancestry in the steppe towards Neolithic groups. Specifically, we tested whether any of the Eurasian steppe ancestry groups can be successfully modelled as a two-way admixture between Eneolithic steppe and a population X derived from Anatolian- or Iranian farmer-related ancestry, respectively. Surprisingly, we found that a minimum of four streams of ancestry is needed to explain all eight steppe ancestry groups tested (Fig. 2; Supplementary Table 12). Importantly, our results show a subtle contribution of both AF ancestry and WHG-related ancestry (Fig. 4; Supplementary Tables 13 and 14), likely brought in through MN/LN farming groups from adjacent regions in the West. A direct source of AF ancestry can be ruled out (Supplementary Table 15). At present, due to the limits of our resolution, we cannot identify a single best source population. However, geographically proximal and contemporaneous groups such as Globular Amphora and Eneolithic groups from the Black Sea area (Ukraine and Bulgaria), representing all four distal sources (CHG, EHG, WHG, and Anatolian_Neolithic), are among the best supported candidates (Fig. 4; Supplementary Table 16). Applying the same method to the subsequent North Caucasian Steppe groups such as Catacomb, (Late) North Caucasus confirms this pattern (Supplementary Table 16).

Using qpAdm with Globular Amphora as a proximate surrogate population, we estimated the contribution of AF ancestry into Yamnaya and other steppe groups. We find that Yamnaya Samara individuals have 13.2 ± 2.7% and Ukraine or Caucasus Yamnaya individuals 16.6 ± 2.9% AF ancestry (Fig. 4; Supplementary Table 17)—statistically indistinguishable proportions. Substituting Globular Amphora with Iberia Chalcolithic does not alter the results profoundly (Supplementary Table 18). This suggests that the source population was a mixture of AF ancestry and a minimum of 20% WHG ancestry, a genetic profile shared by many European MN/LN and Chalcolithic individuals of the 3rd millennium BCE analysed thus far.

To account for potentially un-modelled ancestry from the Caucasus groups, we added ‘Eneolithic Caucasus’ as an additional source to build a three-way model. We found that Yamnaya Caucasus, Yamnaya Ukraine Ozera, North Caucasus and Late North Caucasus had likely received additional ancestry (6–40%) from nearby Caucasus groups (Supplementary Table 19). This suggests a more complex and dynamic picture of steppe ancestry groups through time, including the formation of a local variant of steppe ancestry in the North Caucasian steppe from the local Eneolithic, a contribution of Steppe Maykop groups, and population continuity between the early Yamnaya period and the MBA (5300–3200 BP, 3300–2200 calBCE).

Insights from micro-transects through time

The availability of multiple individuals from one burial mounds allowed us to test genetic continuity on a micro-transect level. By focusing on two kurgans (Marinskaya 5 and Sharakhalsun 6) with four and five individuals, respectively, we observe that the genetic ancestry varied through time, alternating between the Steppe and Caucasus ancestries (Supplementary Fig. 7), suggesting a shifting genetic border between the two genetic clusters. We also detected various degrees of kinship between individuals buried in the same mound, which supports the view that particular mounds reflected genealogical lineages. Overall, we observe a balanced sex ratio within our sites across the individuals tested (Supplementary Note 4).

A joint model of ancient populations of the Caucasus region

Our fitted qpGraph model recapitulates the genetic separation between the Caucasus and Steppe groups with the Eneolithic steppe individuals deriving more than 60% of ancestry from EHG and the remainder from a CHG-related basal lineage, whereas the Maykop group received about 86.4% from CHG, 9.6% Anatolian farming related ancestry, and 4% from EHG. The Yamnaya individuals from the Caucasus derived the majority of their ancestry from Eneolithic steppe individuals, but also received about 16% from Globular Amphora-related farmers (Fig. 5, Supplementary Note 6).