First, we explored the genetic relationships of the European Romani with other worldwide populations using previously published genome-wide data sets (4,587 individuals and 51,328 shared SNPs; see the “Reference datasets” section in Supplemental Experimental Procedures ). In a first classical multidimensional scaling (MDS or principal coordinates analysis) [] based on identity-by-state (IBS) distances, worldwide individuals tend to be distributed in the first two dimensions (as in []), with European Romani located with other west Eurasian populations ( Figure 2 A and Figure S1 A available online). We then performed a second MDS focusing on west Eurasians using balanced sample sizes and geographic coverage ( Figures 2 B and S1 B). The first dimension separates Indians from non-Romani Europeans, Caucasus, and Middle East individuals, and locates in between the Romani Europeans, Central Asians, and Pakistanis. The second dimension places European Romani close to non-Romani Europeans with several Romani individuals included within the latter, which could be indicative of recent admixture.

(C) ADMIXTURE analysis at k = 2, k = 3, k = 5, k = 8, and k = 13 ancestral components with the same individuals in (B). Each vertical bar represents an individual and the proportion of each individual to the k ancestral components is shown in colors. See Figures S1 D and S1E for more ks and the names of the populations included in each of the Indian states shown in the figure.

15 Saitou N.

Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

ST

16 Weir B.S.

Cockerham C.C. Estimating F-statistics for the analysis of population structure.

17 Alexander D.H.

Novembre J.

Lange K. Fast model-based estimation of ancestry in unrelated individuals.

Next, we constructed a neighbor-joining tree [] based on Fdistances [], using sub-Saharan Africans (Yoruba) as an out-group. All European Romani groups (except the Welsh Romani) appear on the same branch and without any non-Romani European groups ( Figure S1 C), which would suggest a shared common origin of the European Romani. Welsh Romani appear to share ancestry with non-Romani Europeans and show evidence of strong genetic drift. However, putative recent admixture with other populations could modify the position of the European Romani with respect to the other populations in the tree. Therefore, we applied the ADMIXTURE clustering method [] to estimate the membership of each individual to a range of k hypothetical ancestral populations (k = 2 to k = 15, see Figures 2 C, S1 D, and S1E). At k = 2, a longitudinal gradient on the amount of ancestry of each component is observed from India to Europe (|Spearman's rho| = 0.935, p < 10, after exclusion of European Romani; Figure S1 F). European Romani show a lower frequency of the main ancestral component in Indians (dark blue) relative to populations from Central Asia and Pakistan (28% versus 47%, p < 10, Mann-Whitney test), and higher than Caucasus, Middle East and non-Romani European populations (28% versus 9%, p < 10, Mann-Whitney test). This result would suggest that the origin of the European Romani could be located in Central or South Asia (Pakistan and India). Notably, the main ancestry component present in Middle Easterners at k = 3 ( Figure 2 C, in dark green) shows the lowest average in the European Romani, followed by the Indian populations (3.6% and 6.3%, respectively). This result may indicate a low genetic contribution to the European Romani from the Near or Middle East. At k = 5, an ancestral component present mainly in European Romani emerges ( Figure 2 C, in red). At k = 8 (well-supported k, see Figure S1 G), this ancestry component (red) is almost absent from all non-Romani individuals (on average 1.52%; 95% confidence interval = 0%–5.5%). At this k, almost 25% of all European Romani show considerable amounts (above 30%) of the component mainly present in non-Romani Europeans ( Figure 2 C, in gray). Further population substructure within the European Romani is observed at k = 13. The new component ( Figure 2 C, in black) is mainly present in Croatian Romani (average ∼76%), less frequent in the remaining Balkan Romani (average 23% across Bulgarian, Serbian, and Greek Romani), and rare in Romani groups from northern and western Europe (e.g., 6.7% in Baltic and Iberian Romani).