Authenticity of ancient-DNA results

Based on the shotgun sequencing, out of the total of 141 individuals sampled, 134 were included in mitochondrial capture. Mitochondrial genomes for 103 individuals passed the quality control thresholds, while 31 samples were excluded from further analyses due to insufficient data (less than fivefold mitochondrial coverage) or high contamination levels (Supplementary Table S1). Ancient-DNA yield for all 103 samples was studied with several criteria of authentication. All samples showed fragment sizes ranging between 40–250 bp, as expected for ancient DNA29. Fragments under 30 bp were filtered out as a mapping quality control. All samples had an average fragment length of 47 to 95 bp. The authentic ancient DNA is often fragmented compared to the modern DNA, and fragments as short as 50–65 bp are common. The samples included in the downstream analyses yielded between 1426 and 395345 unique human mitochondrial fragments with an average coverage ranging from 5-fold to 1683-fold. The first-base damage on the fragments varied between 5–36% on the 3′-end and 4–34% on the 5′-end. Previous studies have proven that cytosine deamination is influenced by the age of the sample30,31 and the mean temperature of the site31. Considering the climatic conditions in Finland, e.g., low mean temperature, and the relatively young age especially for the post-medieval samples, 3′ and 5′ damage values below 5% are plausible. No samples were therefore omitted from the study based on these criteria.

The contamination rates of the 103 samples were further evaluated by Schmutzi32. 36 samples had Schmutzi contamination estimates exceeding 5% and were excluded (Supplementary Table S1). The remaining samples were then analyzed with ContamMix33: the resulting crude contamination estimates as well as the a posteriori estimates of contamination along with their 95% confidence intervals (CI) from the MCMC are reported in Supplementary Table S1. The CIs ranged from 0% to 17.2%; in ten cases they exceeded 10%, even though estimates by Schmutzi had remained below 5%. These cases were visually inspected with Geneious 11.0.3 (www.geneious.com). For each of them, the majority call supported the previously assigned haplogroup.

Radiocarbon datings

For this study, we report new 14C dates for 42 individuals (Supplementary Table S1). Radiocarbon dates for nine individuals were determined previously (see Supplementary Table S1). Based on radiocarbon dates and/or dating of the context, the studied burial sites cover the timespan from the Roman Iron Age (300 AD) to historical times (19th century). For sites Levänluhta, Luistari, Hollola, Hiitola, Tuukkala, Pälkäne and Porvoo the highest posterior densities (HPD) for site’s start and end boundaries were determined. The mean values for obtained phase boundaries are presented in Table 1, and 68% and 95% HPD regions are presented in Supplementary Table S2 and in Supplementary Fig. S1. Intervals for mean values of boundaries obtained based on radiocarbon dates were in accordance with dates determined based on the archaeological context (Table 1).

MtDNA data and haplotypic variation

A total of 95 unique complete-mitogenome haplotypes were observed among the 103 complete sequences retrieved: three haplotypes were shared between sampling sites and five within a site. In the latter cases, the placement of the skeletal samples suggests that the shared haplotypes have been carried by different individuals, who may have been maternally related: identical haplotypes (haplogroup U5a2a1e) were obtained from remains of a c. 5-year-old child (grave 18, TU666) and an older woman (grave 7, TU655) from Hollola. Identical haplotypes (haplogroup H85) were also observed in a middle-aged adult (grave 6, TU661) and a c. 18-month-old child (grave 15, TU668) from Hollola. At the Hiitola site, identical haplotypes (haplogroup W6) were shared between two individuals from distinct graves (individual TU566 from grave 80 and individual TU675 from grave 30). At the Tuukkala site, two individuals showed identical haplotypes (haplogroup H10e, individuals TU631 and TU645). At Turku, two adults shared the same haplotype belonging to the basal haplogroup H (samples TU582 and TU588). Haplotypes for 103 individuals are presented in Supplementary Table S3.

As the subsequent statistical methods assume that samples derive from unrelated individuals, five samples - one of each identical haplotype pairs within sites (TU666, TU668, TU675, TU645 and TU588) - were removed from the subsequent analyses due to their possible maternal relatedness.

The mean number of pairwise differences, calculated from complete mitochondrial genomes, was highest within Porvoo (MNPD = 33.7 ± 16.8) and lowest within Renko (MNPD = 21.8 ± 10.8) (Supplementary Table S4). Due to the small number of individuals per site and utilization of unique complete mtDNA sequences, haplotype diversities (H) were relatively high (with mean 1.0 and standard deviation ranking from 0.0202 to 0.1768).

MtDNA haplogroup composition at the ancient sites

Burial site-specific haplogroup frequencies of the 98 complete mitochondrial sequences showed considerable between-site variation (Fig. 2 and Supplementary Table S5). The observed frequencies of the main haplogroups in the whole dataset resembled the prevalence among contemporary Finns. As today, haplogroups U and H were the most common, yet with slightly higher overall frequencies than today (U 33.7% vs. 24.1%, and H 41.8% vs. 33.2%). However, when grouped temporally into Iron-Age and medieval sites (IAM) and early-modern and modern sites (EMM), differences were observed: the IAM sites (i.e., Levänluhta, Luistari, Hollola, Hiitola and Tuukkala) demonstrated significantly higher overall prevalence of haplogroup U (40.9%) than the EMM sites (i.e., Pälkäne, Porvoo, Renko, Turku and Hamina, 18.8%) but also high inter-site variability. Among the EMM samples haplogroup H dominated (U 18.8%, H 46.9%).

Figure 2 (a) MtDNA haplogroup distribution at each site. Only unique haplotypes per site are included. Ages of the sites are presented based on the interval for mean values of phase boundaries for start and end distribution when available (i.e. for Levänluhta, Luistari, Hollola, Hiitola, Tuukkala, Pälkäne, and Porvoo, see Section 2.2.) and based on archaeological context for other sites. (b) MtDNA haplogroup frequencies when pooled according to chronological and geographical criteria. Only unique haplotypes within site are included. Iron-Age and medieval south-west includes Levänluhta, Hollola and Luistari; Iron-Age and medieval east includes Hiitola and Tuukkala; Early modern and modern includes Pälkäne, Porvoo, Renko; Frequencies for contemporary Finns from28. Full size image

This inter-site variability of the haplogroup U/H ratio had a clear spatial pattern also among the IAM samples. The western cluster (IAM south-west: Levänluhta, Luistari and Hollola) had average U and H frequencies of 58.3% and 27.8%, respectively, whereas the corresponding values in the eastern cluster (IAM east: Hiitola, Tuukkala) were 20.0% and 53.3%. In IAM east the highest frequency for an individual subhaplogroup was 30.0% obtained for H1. Strikingly, this U/H ratio is the opposite compared to contemporary eastern and western Finns.

Differences in haplogroup composition between the sites

Among the 12 Levänluhta samples, five individuals carried haplogroup (hg) U5b, four of which belonged to the sub-hg U5b1b1a. Additionally, the Levänluhta site included three individuals with hg U5a, resulting in a total frequency of 66.7% for hg U5. In contrast, with only two haplotypes of sub-hg H1, the frequency of hg H was well below values observed in modern European populations. The high U5b1b1a frequency resembles that observed today in Saami populations of northern Europe. This actually corresponds well to a related recent study that is showing the close genetic affinity between Levänluhta individuals and modern Saami18. However, the Levänluhta individuals also carried mtDNA haplogroups that are absent or rare among the Saami population today, U5a, H1 (0.0–4.0%34) and haplogroups K and T. The Levänluhta site clearly showed a unique composition, which resulted in significant genetic distances to all other ancient sites at sequence level, with Φ ST values of >10% (see below).

Individuals from the Hollola site, 14C dated to 955–1390 calAD (Table 1), also showed a high overall frequency of hg U (64.3%), similar to Levänluhta. However, differences in subhaplogroup distribution between Hollola and Levänluhta suggest a possible non-modern-Saami-like hunter-gatherer ancestry in this region. Interestingly, subhaplogroup U5b1b1a, typical among contemporary Saami, was not observed in Hollola. In contrast, most of the Hollola U haplotypes belong to haplogroups U4 and U5a (frequencies in Hollola 28.6% and 14.3%, respectively), which are rare or absent in Saami today34. Moreover, U4 is also rare in modern Finns while the frequency for U5a is around 6%28,35. Haplotypes belonging to different subhaplogroups of hg H were more common in Hollola than in Levänluhta, occurring in altogether five samples. Haplogroups K and T were absent in the Hollola sample.

A rather different picture emerged from the Luistari samples, showing a substantial genetic distance to Levänluhta (Φ ST = 0.134, p < 0.01). Haplogroup U5b1 was entirely absent, and the U haplotypes observed belong to subhaplogroup U4, U5b2 and U2. Lineage U2 is prevalent in some Uralic speaking groups today36. The overall haplogroup distribution in Luistari was more similar to the modern European populations dominated by agriculture-associated Neolithic haplogroups H and occurrences of T2 and W1 (see Introduction), than in Levänluhta and Hollola sites.

The two easternmost sites, Hiitola and Tuukkala, proved genetically distant from the western Levänluhta and Hollola sites, despite being approximately contemporaneous with the Hollola individuals. The Neolithic signal in the mtDNA gene pool of ancient Finns in general was much stronger in the east. Both Hiitola and Tuukkala samples showed high frequencies of hg H (61.5% and 47.1%, respectively), together with other Neolithic haplogroups J, K, W and X. Notably, these eastern sites shared three haplogroups: H1a7, H1a8a and H10g. According to GenBank searches these three haplogroups are rare in modern populations: for H1a7 four modern sequences were found, two in Finnish (KY620272 and MF686118), one in Swedish (KJ487971) and one in British (GU797829) populations. For haplogroup H1a8a only two matches were found, one among Finnish (JX153203) and one of an unknown origin (JQ701944), whereas three modern sequences were found for haplogroup H10g: two Finnish (KR732275 and MF497508) and one from Russia (GU122976). Notably, H1* are known to be common in modern Karelia37. The eastern sites also comprise rare subhaplogroups U1 (hg U1b2 in Hiitola) and U8 (hg U8b1a2b in Tuukkala), which are atypical for contemporary Finns.

Early modern and modern sites represents similar frequencies of U and H as the combined Iron Age and Medieval East (18.8% and 46.9%, respectively). Contrasting IAM sites and contemporary Finns, EMM sites harbors high prevalence of haplogroup T; frequency in EMM is as high as 21.9%, while in other Finnish populations the frequency is less than 8% (Supplementary Tables S5 and S8). Individual JK1954 from Hamina belonged to haplogroup C, which is lacking from contemporary Finns28 (Supplementary Table S5) and suggests possible eastern origin. Nevertheless, additional autosomal data is needed to confirm the genetic background of the individual JK1954.

When contrasted with haplogroup frequencies observed in contemporary Finns, our simulations (Supplementary Fig. S2) showed that the ancient sites are significantly different, and that these differences cannot be explained by sampling effects. This applied especially to haplogroup U5 in total and to subhaplogroup U5b in Levänluhta, hg U4 in both Luistari and Hollola as well as hg H1 in the Hiitola dataset.

Genetic distances among sites and to contemporary Finns

When we calculated genetic distances between sites, we observed that Levänluhta differed significantly from all the other sites, except Hollola (Φ ST = 0.05042, p = 0.02441) and Tuukkala (Φ ST = 0.04387, p = 0.06055) (Fig. 3a and Supplementary Table S6). The largest distance from Levänluhta was to the eastern Hiitola site (Φ ST = 0.15468). The distance between Levänluhta and contemporary Finns was smaller but still significant, with a distance to contemporary north-east (NE) Φ ST = 0.04077 and to contemporary south-west (SW) slightly higher Φ ST = 0.06473. While Luistari differed only from Levänluhta, the Hollola site differed both from Hiitola and the EMM (Φ ST = 0.05205 and Φ ST = 0.05135, respectively, p < 0.05 for both), but not from Levänluhta (Φ ST = 0.06445, p > 0.05) (Fig. 3a and Supplementary Table S6). Hiitola differed, in addition to Levänluhta and Hollola, from EMM and from both groups of contemporary Finns (Φ ST = 0.04111 for NE and Φ ST = 0.03437 for SW). When considering the genetic distances between individual sites, it has to be noted that the relatively low sample sizes might affect the Φ ST values and the results should be interpreted with caution. However, for pooled IAM and EMM sites (see Fig. 2b), for which the sample sizes are ≥30, the genetic distance calculations should not be that sensitive for bias caused by small sample sizes.

Figure 3 (a) Pairwise Φ ST distances for ancient and contemporary Finns. Early modern and modern Finns consists of individuals from Pälkäne, Porvoo, Renko, Julin and Hamina sites. Contemporary Finns are divided into south-west (SW) and north-east (NE) subpopulations according to28. Φ ST values are presented on a scale starting from zero (Φ ST values and p-values are presented in Supplementary Table S6). (b) Pairwise Φ ST distances for ancient and contemporary Finns. Iron-Age and medieval (IAM) sites are grouped into subpopulations: IAM south-west consists of Levänluhta, Luistari and Hollola; IAM east consists of Hiitola and Tuukkala. Early modern and modern Finns contain individuals from Pälkäne, Porvoo, Renko, Julin and Hamina sites. Contemporary Finns are divided into south-west (SW) and north-east (NE) subpopulations according to Palo et al.27 and Neuvonen et al.28. Full size image

Clustering the IAM sites further roughly according to their geographical location to IAM south-west (hg U more prevalent) and IAM east (hg H more prevalent) further demonstrated the pattern opposite to modern mtDNA diversity distribution (Fig. 3b). IAM south-west differed statistically significantly from contemporary SW (Φ ST = 0.01670, p = 0.00488) and EMM (Φ ST = 0.05350, p = 0.00098) but not from contemporary NE (Φ ST = 0.00036, p = 0.41895). In addition, EMM and contemporary SW differed from each other (Φ ST = 0.01140, p = 0.04102). Conversely, IAM east differed from the contemporary NE (Φ ST = 0.00849) more than from contemporary SW (Φ ST = 0.00514).

Haplotype level median-joining network (Supplementary Fig. S3) demonstrates that ancient and contemporary Finns exhibit in principle same main haplogroups, whereas the most notable differences are within the haplogroup frequencies between the ancient populations. Individuals from IAM eastern sites are more prevalent in the haplogroup H cluster, while individuals from IAM southwestern sites are more concentrated on the haplogroup U cluster. Contemporary Finns are in both clusters, indicating possible mixture of IAM southwestern and IAM eastern populations.

Main haplogroup frequencies in space and time

To evaluate the possible impact of spatial and temporal factors on the distributions of haplogroup U, largely associated with European hunter-gatherers, and farmer-associated haplogroup H within the IAM sites, we performed multinomial logistic regression analyses. In a stepwise forward analysis, the only statistically significant independent variable explaining the differences in the haplogroup composition was the distance from eastern reference point Lahdenpohja (compared to ‘H’ and ‘Others’ significance for Lahdenpohja was 0.013 and 0.103, respectively) (Supplementary Table S7). Neither the ages of the samples nor distance from the southern and western reference points were requisite for the best-fit model. However, the addition of the eastern reference point significantly improved the fit between model and data (p = 0.027). Based on the odds ratios, it is less likely that an individual from southwest belongs to haplogroups ‘H’ or ‘Others’ than an individual from an eastern archaeological site. Similar results were obtained when using hunter-gatherer associated haplogroups (U and V), farmer associated haplogroups (H, J, K and T) and ‘Others’ as categorically distributed dependent variables. We chose to include the haplogroup V as ‘hunter-gatherer’ while there is no direct evidence for association of hg V with the hunter-gatherers. This is assumed here because of V’s northern distribution and its high prevalence (up to 58%34) among the Saami, the archetypal nomadic population lacking many farmer-associated haplogroups34,38. Distance from the eastern reference point was the only predictor included in the model (with significance of 0.031 for farmer associated haplogroups and 0.082 for other haplogroups). Assuming that haplogroups U and H can be associated to hunter-gatherers and farmers, respectively, the results suggest a spread of the more central European like, farmer-related haplogroups spreading from the east. However, as mentioned above, association of hg V is unclear. Omitting V from the hunter-gatherer group does not change results noteworthily (Supplementary Table S7).

Genetic affinities of ancient Finns to other ancient and contemporary populations

To further explore the affinity of ancient Finns to other ancient and contemporary populations, we carried out principal component analysis (PCA) based on haplogroup frequencies. We plotted the first two components of the PCA plot for ancient Finns, 31 other ancient populations, contemporary Finns and Saami, which account for 55% of the total variance (Figs 4, S4). Interestingly, southwestern Iron-Age sites Levänluhta and Hollola fall close to hunter-gatherer populations from Baltic, Central and Southern Europe. In addition Levänluhta is located in proximity to modern day Saami. This suggests the hunter-gatherer type of maternal ancestry in these two sites. In contrast, eastern IAM sites Hiitola and Tuukkala, EMM sites and contemporary SW Finns clustered with European Neolithic, Bronze-Age and Iron-Age populations. The southwestern site Luistari, as well as the contemporary NE Finns, were located roughly between these two clusters, indicating a possible mix of maternal ancestry from hunter-gatherers and Neolithic farmers. However, as with the genetic distances presented in Section 2.6., the small sample sizes of ancient populations might distort the haplogroup frequencies to deviate from the original source population, subsequently affecting PCA. To evaluate the possible bias, we performed random subsampling of contemporary SW and NE Finns (fifty iterations, for each N = 15) and carried out PCA with the same reference populations as for Fig. 4. Supplementary Fig. S5 demonstrates the amount of variation induced.