Abstract The Etruscan culture is documented in Etruria, Central Italy, from the 8th to the 1st century BC. For more than 2,000 years there has been disagreement on the Etruscans’ biological origins, whether local or in Anatolia. Genetic affinities with both Tuscan and Anatolian populations have been reported, but so far all attempts have failed to fit the Etruscans’ and modern populations in the same genealogy. We extracted and typed the hypervariable region of mitochondrial DNA of 14 individuals buried in two Etruscan necropoleis, analyzing them along with other Etruscan and Medieval samples, and 4,910 contemporary individuals from the Mediterranean basin. Comparing ancient (30 Etruscans, 27 Medieval individuals) and modern DNA sequences (370 Tuscans), with the results of millions of computer simulations, we show that the Etruscans can be considered ancestral, with a high degree of confidence, to the current inhabitants of Casentino and Volterra, but not to the general contemporary population of the former Etruscan homeland. By further considering two Anatolian samples (35 and 123 individuals) we could estimate that the genetic links between Tuscany and Anatolia date back to at least 5,000 years ago, strongly suggesting that the Etruscan culture developed locally, and not as an immediate consequence of immigration from the Eastern Mediterranean shores.

Citation: Ghirotto S, Tassi F, Fumagalli E, Colonna V, Sandionigi A, Lari M, et al. (2013) Origins and Evolution of the Etruscans’ mtDNA. PLoS ONE 8(2): e55519. https://doi.org/10.1371/journal.pone.0055519 Editor: John Hawks, University of Wisconsin, United States of America Received: July 20, 2012; Accepted: December 24, 2012; Published: February 6, 2013 Copyright: © 2013 Ghirotto et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Study supported by the Italian Ministry for Universities Funds PRIN 2008 to GB and DC and FIRB 2008 (RBFR08U07M) to ER, DC and GB, by the “Futuro in ricerca” grant RBFR08U07M to ML, ER, GC, GD and DC, by the Fondazione Cassa di Risparmio di Ferrara and by Associazione Archeologica Odysseus Casale di Pari. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction The Etruscan culture is documented in Central Italy (current Tuscany and Northern Latium, formerly known as Etruria) between the 8th and the 1st century BC. Questions about the Etruscans’ origins and fate have been around for millennia. Herodotus and Livy regarded them as immigrants, respectively from Lydia, i.e. Western Anatolia, or from North of the Alps, whereas for Dionysius of Halicarnassus they were an autochthonous population [1]. Previous DNA studies, far from settling the issue, have raised further questions. The Etruscans’ mitochondrial DNAs (mtDNAs) appear similar, but seldom identical, to those currently observed in Tuscany [2], [3]. Assuming reasonable effects of genetic drift and mutation, these levels of resemblance proved incompatible with the notion that modern Tuscans are descended from Etruscan ancestors [4], [5]. Explanations for this result include the (extreme) possibility that the Etruscans became extinct, but also that their modern descendants are few and geographically dispersed, or that the ancient sample studied represents a small social elite rather than the entire population [4]. As for the Etruscans’ origins, ancient DNA is of little use, because pre-Etruscan dwellers of Central Italy, of the Villanovan culture, cremated their dead [1], and hence their genetic features are unknown. DNAs from modern humans and cattle in Tuscany show affinities with Near Eastern DNAs, which was interpreted as supporting Herodotus’ narrative [2], [6], but in these studies modern Tuscans were assumed to be descended from Etruscan ancestors, in contrast with ancient DNA evidence [5]. The claim that systematic errors in the Etruscan DNA sequences led to flawed genealogical inference [2], [7] is not supported by careful reanalysis of the data [8]. What previous studies overlooked is the potential genetic effect of population subdivision. If most Etruscans’ descendants lived in isolated communities in the last 2,000 years, their DNAs may still persist in some localities, but will escape detection unless they are sought at the appropriate (i.e., smaller) geographical scale. Indeed, previous work in another area of Italy [9] showed that modern populations separated by only tens of kilometers can differ sharply in their genealogical relationships with ancient populations. To investigate in greater geographical detail the biological relationships between contemporary and ancient populations, we thus sampled multiple burials in classical Etruria. MtDNA was extracted from bones, amplified and sequenced by a combination of classical methods and Next Generation Sequencing. After adding these sequences to the other Etruscan sequences produced in our lab [3] we compared them through methods of Approximate Bayesian Computation with those of relevant ancient and modern human populations. These include Medieval Tuscans (n = 27) [5], contemporary Tuscans from three sites in historical Etruria (Casentino, n = 122; Murlo, n = 86; Volterra, n = 114) [2] and from Florence [10] (n = 48) (Figure 1). The sample from Florence here represents a control, since no special relationships is expected between the DNAs of the Etruscans and those of the inhabitants of a large city, after millennia of immigration. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Geographic location of the samples considered in the ABC analysis. Triangles, Contemporary Tuscans (n = 370); Circles, Medieval Tuscans: 1. Massa Carrara (n = 3); 2. Florence, (n = 10); 3. Pisa, (n = 6); 4. Livorno, (n = 3); 5. Siena, (n = 4); 6. Grosseto (n = 1); Squares, Etruscans: 1. Castelfranco di Sotto (n = 1); 2. Volterra (n = 3); 3. Casenovole (n = 10); 4. Castelluccio di Pienza (n = 1); 5. Magliano/Marsiliana (n = 6); 6. Tarquinia (n = 9). https://doi.org/10.1371/journal.pone.0055519.g001 We thus tried to address two questions, namely (1) whether an analysis at the small geographical scale can provide evidence of a genealogical continuity between the Etruscans and some current inhabitants of historical Etruria, and (2) whether the observed degree of genetic resemblance between modern inhabitants of Tuscany and Western Anatolia has anything to do with the Etruscans’ origins. To answer, for each modern population we designed and compared three demographic models differing for the genealogical relationships with the ancient samples (see Material and Methods for details). We identified the model best fitting each set of the observed data, and then we moved to estimating, under an isolation-with-migration (IM) framework, the separation time between Tuscan and Anatolian populations [11], evaluating whether the estimated time can be reconciled with an Etruscan origin in Anatolia and a subsequent migration in Italy around the 8th century BC.

Discussion MtDNA data give much stronger support to a model of genetic continuity between the Etruscans and some Tuscans than to any other model tested, characterized by plausible population sizes and mutation rates. However, this clear picture emerges only when modern Tuscan communities are separately considered, highlighting the importance of population structure even at the small geographical scale. In a previous analysis of smaller samples we found no evidence of genealogical continuity since Etruscan times [5]. In this study, the larger sample sizes allowed us to separately investigate the relationships of each modern population with the Etruscans. A model of genealogical continuity across 2,500 years thus proved to best fit the observed data for Volterra, and especially Casentino, but not for another community dwelling in an area also rich with Etruscan archaeological remains (Murlo), nor (as expected) for the bulk of the current Tuscan population, here represented by a forensic sample of the inhabitants of Florence. Therefore, the present analysis indicates that the Etruscan genetic heritage is still present, but only in some isolates, whereas current Tuscans are not generally descended from Etruscan ancestors along the female lines. It also shows that there is no necessary correlation between the presence of archaeological remains and the biological roots of the inhabitants of the areas where these remains occur. Because Medieval Tuscans appears directly descended from Etruscan ancestors, one can reasonably speculate that the genetic build-up of the Murlo and Florence populations was modified by immigration in the last five centuries. As for the second question, the IM analysis shows that indeed there might have been a genealogical link between modern Tuscans and the inhabitants of what Herodotus considered the Etruscans’ homeland, Western Anatolia. However, even under the unrealistic assumption of complete reciprocal isolation for millennia, the likely separation of the Tuscan and Anatolian gene pools must be placed long before the onset of the Etruscan culture, at least in Neolithic times; if isolation was incomplete, the estimated separation must be placed further back in time. Consistent with this view is the observation that Etruscan and Neolithic mtDNAs are close to each other in the two-dimensional plot of Figure S4C; however, a formal test would be necessary to draw firm conclusions from the simple observation of a genetic similarity. Separation times were very close when estimated both using a sample from Western Anatolia, and an expanded sample including individuals from much of Anatolia, and so the choice of the Anatolian population does not seem to affect the results of this analysis. A general problem in ancient human DNA studies is the quality of the data; errors resulting from contamination, or from poor preservation of DNA in the specimens, are common. However, there are several reasons to be confident that the Etruscan sequences obtained in this study are authentic: (i) bones were recovered from burials according to the most stringent existing procedures and sent directly to the ancient DNA laboratory without manipulations; (ii) the mtDNA HVR-I motifs of the people who came in contact with the bones at any stage of the analysis do not match those obtained from the ancient samples (Table S1); (iii) the ancient samples were typed following the most stringent standard criteria for ancient DNA authentication; (iv) we used two different sequence determination procedures (classical methodology and high throughput methodology) and the results obtained from different extractions and different sequencing methodologies are concordant except in the regions of homopolymeric strings ≥5 bp that are problematic for the 454 pyrosequencing technology; in these cases, consensus sequences were determined considering only the results of the standard sequencing procedure; (v) sequences make phylogenetic sense, i.e. do not appear to be combinations of different sequences, possibly suggesting contamination by exogenous DNA. Using such ancient DNA data for testing complex evolutionary models has become possible with the development of ABC and other recent Bayesian inference methods [24], [25]. These models, albeit more articulate than those that can be tested otherwise, are still a necessarily schematic representation of the processes affecting populations in the course of millennia. Many phenomena that could not be incorporated in the models, such as immigration from other sources or additional demographic fluctuations, most likely occurred and left a mark in the patterns of genetic diversity. In addition, specific phenomena may have involved mostly or exclusively males, resulting in genetic changes that are not recorded in mtDNA variation. Still, if we rule out the unlikely hypothesis that the Etruscans’ and their descendants’ population history was radically different for males and females, the picture emerging from this study is rather clear. The additional tests we ran (Type I error, Table 3) show that, at these sample sizes, we had a high probability to identify the correct evolutionary model. As also suggested by the analysis of skull diversity [26], contacts between people from the Eastern Mediterranean shores and Central Italy likely date back to a remote stage of prehistory, possibly to the spread of farmers from the Near East during the Neolithic period [27], [28], but not necessarily so (we only estimated a minimum separation time between gene pools). At any rate, these contacts occurred much earlier than, and hence appear unrelated with, the onset of the Etruscan culture (Figure 5). We conclude that no available genetic evidence suggests an Etruscan origin outside Italy. While their culture disappeared from the records, the Etruscans’ mtDNAs did not; traces of this heritage are still recognizable. However, most current inhabitants of the ancient Etruscan homeland appear descended from different ancestors along the female lines, as clearly shown by the analysis of the urban (Florence) sample. Genetic continuity since the Etruscan’s time is still evident only in relatively isolated localities, such as Casentino and Volterra.

Materials and Methods DNA Extraction and Characterization of the Etruscan Samples We obtained 18 bone samples (each represented by two fragments of the right tibia) from a multiple burial from Casenovole, Southern Tuscany, near Grosseto. Their approximate age, based on archaeological evidence, is the 3rd century BC. The permit to genetically characterize these fossil samples came from Soprintendenza Archeologica per la Toscana (Archaeological Authority for Tuscany), Siena. The bone fragments were freshly excavated and collected according to the most stringent ancient DNA criteria [29] by one of us (EP) and can safely be regarded as belonging to different individuals (Minimum number of individuals estimated in the burial = 21). These fragments were processed in the ancient DNA facilities at the University of Florence using standard ancient DNA procedures [30]. After a first round of DNA extraction, the samples were subjected to multiple PCRs, cloning and cycle sequencing. In a successive step, DNA was independently reextracted from the samples that had given positive results in the previous analysis. In this case, after multiple PCRs, the amplicons were not cloned but ligated to the appropriate adaptor sequences and directly sequenced with 454/Roche technology. Low Molecular Weight DNA (LMW DNA) 454/Roche protocol was applied and a final procedure modification was added to increase the recovery of a single stranded library [31]. Libraries were quantitated using a quantification Real Time PCR (qPCR) by KAPA Library Quant Kits (KAPA Biosystems, MA, USA). Samples libraries were independently amplified on beads by emulsion PCR (emPCR), then enriched and counted beads were loaded onto 454/Roche PicoTiterPlate (PTP) divided in 16 regions. Sequencing was performed as in 454/Roche protocol and the obtained reads were filtered and mapped using the Cambridge reference sequence [32]. For each sample and amplicon, a masking procedure allowed to remove primer sequences from the reads and obtain a multi-alignment using the 454/Roche Amplicon Variant Analysis (AVA) software. A consensus was generated by custom scripting and then mapped on the mitochondrial DNA reference sequence (GenBank accession number: J01415). Complete mtDNA HVR-I sequences could be retrieved in all samples. At each site the most frequent nucleotide was observed in a range of 97.7–98.8% of the reads in the different samples. Unmapped reads were then analyzed in order to characterize them and we found that they are mostly primer dimers. Final consensus sequences of the 10 samples were determined by comparing results obtained from both standard procedures (575 Clones) and Next Generation Sequencing (127,837 reads). Four additional samples from Tarquinia, sequenced in 2004, but never published so far, brought to 14 the total of Etruscan samples typed for this study. Ancient and Modern mtDNA Diversity In all statistic analyses, we replaced the nucleotides occupying position 16180–16188 and 16190–16193 with the nucleotides in the CRS, because they contain two stretches of Adenines and Citosines known to result in apparent length polymorphism of the mtDNA sequence [33], [34]. Summary statistics were estimated by Arlequin ver. 3.5.1 [35]. The Fst values between the populations in the EUR dataset and the Etruscans were interpolated in a map representing using the Spatial Analyst extension in ArcGIS 10 (ESRI; Redlands, CA, USA) using the Kriging procedure. Genetic distances between the Etruscans and each population in the ANC, TUS and EUR datasets were visualized by Multidimensional Scaling (MDS), using the cmdscale function in the R environment [36]. Approximate Bayesian Computation Inferring demographic and evolutionary processes from genetic data requires the testing of models which are often too complex for their likelihoods to be derived. Approximate Bayesian Computation (ABC) [37] offers a valid alternative. Summary statistics estimated from the data are compared with those generated by simulation, and posterior distributions of the models’ parameters can be approximated by simulating large numbers of gene genealogies. We generated gene genealogies in which individuals are sampled at different moments in time using the Bayesian version of SERIALSIMCOAL [38]. At every iteration, the parameters of the model (population sizes, mutation rates, timing of demographic processes) were considered as random variables, and their values were extracted from broad prior distributions; ages and sizes of the samples were equal to those of the observed samples. We then calculated a Euclidean distance between observed and simulated statistics, and we ordered the simulations according to this distance. In total, 24 million simulations were run (1 million for each of 3 models, 4 modern populations in the TUS dataset and two demographic scenarios, respectively including or not including a recent bottleneck). All the procedures were developed in the R environment [36] using scripts from [39]. We selected the summary statistics via PCA, keeping for the ABC analysis those statistics which have shown to be more correlated with the parameters’ variance (Table S2). Demographic Models and Priors The three demographic models tested differ for the relationships between modern and ancient samples (Figure 4); under each model, each population in the TUS dataset was independently compared with the Etruscan and Medieval populations. All prior distributions were uniform and wide. The effective modern population size ranged between 100 and 200,000; for the time of the onset of the expansion (under Model 1) and the separation time (under Models 2 and 3) the priors ranged from 101 (one generation before the Etruscans) to 1,500 generations ago. Priors for the mutation rate encompassed the low value estimated from phylogenies [40], and the high value estimated from pedigrees [41], from 0.0003 to 0.0075 mutations per generation for HVR-I. The Medieval and the Etruscan effective population sizes were extracted from a prior distribution spanning from 100 to 50,000, as suggested in Guimaraes et al. [5]. Ancestral population sizes varied from 5 to 6,000 individuals. The entire procedure was repeated under a demographic scenario including a population bottleneck corresponding to the 14th century plague epidemics, in which an estimated one-third of the population was lost [42]. Model Selection and Parameter Estimation The posterior probabilities of the 24 combinations of models (3), modern populations (4) and demographic scenarios (2), were calculated either: (i) by a simple rejection procedure (AR) [43] for which we retained the 100 simulations associated with the shortest distance between observed and simulated statistics [44]; or (ii) by a weighted multinomial logistic regression (LR) [44] for which we retained the 50,000 simulations generating the shortest distance between the observed and simulated statistics. In both cases, we normalized the PPs so that their sum for all models being compared is 1. The parameters of the best-fitting model were estimated from the 2,000 simulations closest to the observed dataset, after a logtan transformation of the parameters [45] and according to Beaumont [37]. Additional Tests: Type I Error and Posterior Predictive Tests We estimated the probability that the true null hypothesis be rejected by evaluating the Type I Error, i.e. the proportion of cases in which 1,000 pseudo-datasets generated under each model are not correctly identified by the ABC analysis. In addition, to test whether the data can be actually reproduced under a specific demographic model, we carried out a posterior predictive test [9], [25]. For that purpose, we simulated 10,000 datasets according to the model with the highest probability using the estimated posterior parameter distribution, and we calculated a posterior predictive P-value for each statistic; these probabilities were then combined into a global P-value, taking into account their non-independence [46]. The Isolation with Migration (IM) Model We estimated the likely separation time between the Tuscan and Anatolian gene pools by Isolation with Migration (IM), a method generating posterior probabilities for complex models in which populations need not be at equilibrium [19]. Seven parameters were estimated from the data, namely the size of the ancestral and daughter populations (N A , N 1 , N 2 ), the rates of gene flow between daughter populations (m 1 , m 2 ), the time since the split (t), and the proportion of the members of the ancestral population giving rise to the first daughter population (s) [47]. Because any degree of genetic exchange increases the t estimate, after some preliminary tests we set to 0 the values of m 1 and m 2 . Most tests were run fixing the mutation rate at the value estimated in the ABC analysis (0.003 mutational events per locus per generation), but we repeated the whole IM analysis with both lower and higher values (respectively, 0.0014 and 0.0060 mutational events per locus per generation; [13], [23]) under a Hasegawa-Kishino-Yano (HKY; [48]) mutational model with inheritance scalar 0.25, as recommended for mtDNA data. For each mutation rate tested we ran several analyses starting from different random seeds, in order to assess the consistency of the results; moreover, to improve the exploration of the parameters’ space, and thereby the convergence, we coupled the Markov chains, running simultaneously 5 chains per run.

Acknowledgments Computational support for the data analysis has been provided by CINECA (Bologna) and CASPUR (Roma) HPC facilities. We thank Carlo Previderé for sharing with us unpublished data, Sibelle Vilaça for her help with the graphics, Alessandro Achilli, Andrea Benazzo, Mathias Currat, Martin Richards and especially Stefano Mona for discussion and suggestions.

Author Contributions Conceived and designed the experiments: SG DC GB. Performed the experiments: SG FT EF AS ML SV EP GC ER GDB. Analyzed the data: SG FT EF VC. Wrote the paper: SG DC GB.