An open question in the history of human migration is the identity of the earliest Eurasian populations that have left contemporary descendants. The Arabian Peninsula was the initial site of the out-of-Africa migrations that occurred between 125,000 and 60,000 yr ago, leading to the hypothesis that the first Eurasian populations were established on the Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypothesis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups, and these genomes showed clear hallmarks of an ancient out-of-Africa bottleneck. Similar to other Middle Eastern populations, the indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out-of-Africa bottleneck but before the major Neanderthal admixture events in Europe and other regions of Eurasia. When compared to worldwide populations sampled in the 1000 Genomes Project, although the indigenous Arabs had a signal of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when considering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations established by the out-of-Africa migrations.

All humans can trace their ancestry back to Africa (Cann et al. 1987), where the ancestors of anatomically modern humans first diverged from primates (Patterson et al. 2006), and then from archaic humans (Prüfer et al. 2014). Humans began leaving Africa through a number of coastal routes, where estimates suggest these “out-of-Africa” migrations reached the Arabian Peninsula as early as 125,000 yr ago (Armitage et al. 2011) and as late as 60,000 yr ago (Henn et al. 2012). After entering the Arabian Peninsula, human ancestors entered South Asia and spread to Australia (Rasmussen et al. 2011), Europe, and eventually, the Americas. The individuals in these migrations were the most direct ancestors of ancient non-African peoples, and they established the contemporary non-African populations recognized today (Cavalli-Sforza and Feldman 2003).

The relationship between contemporary Arab populations and these ancient human migrations is an open question (Lazaridis et al. 2014; Shriner et al. 2014). Given that the Arabian Peninsula was an initial site of egress from Africa, one hypothesis is that the original out-of-Africa migrations established ancient populations on the peninsula that were direct ancestors of contemporary Arab populations (Lazaridis et al. 2014). These people would therefore be direct descendants of the earliest split in the lineages that established Eurasian and other contemporary non-African populations (Armitage et al. 2011; Rasmussen et al. 2011; Henn et al. 2012; Lazaridis et al. 2014; Shriner et al. 2014). If this hypothesis is correct, we would expect that there are contemporary, indigenous Arabs who are the most distant relatives of other Eurasians. To assess this hypothesis, we carried out deep-coverage genome sequencing of 104 unrelated natives of the Arabian Peninsula who are citizens of the nation of Qatar (Supplemental Fig. 1), including 56 of indigenous Bedouin ancestry who are the best representatives of autochthonous Arabs, and compared these genomes to contemporary genomes of Africa, Asia, Europe, and the Americas (The 1000 Genomes Project Consortium 2012; Lazaridis et al. 2014).

Results

Y Chromosome and mitochondrial DNA haplogroups We next analyzed the Y Chromosome (Chr Y) and mitochondrial DNA (mtDNA) to assess the degree to which the Q1 (Bedouin), Q2 (Persian-South Asian), or Q3 (African) Qatari ancestry groups represent distinct subpopulations (Fig. 2). The Chr Y haplogroups showed almost no overlap between the Q1 (Bedouin) Qataris and Q2 (Persian-South Asian) Qataris, in which an Analysis of Molecular Variance (AMOVA) was highly significant (P < 0.018) (Supplemental Table VII). The Arab haplogroup J1 was the dominant haplogroup in the Q1 (Bedouin) Qataris, but this haplogroup was not represented at all among the Q2 (Persian-South Asian) Qataris (Fig. 2A). This confirmed that these are genetically well-defined subpopulations that are relatively isolated from one another (Omberg et al. 2012). There was also a strong partitioning of the Chr Y haplogroups when considering the Q3 (African) Qataris, both when considering Q1 (Bedouin) versus Q3 (African) (AMOVA P < 1 × 10−5) and Q2 (Persian-South Asian) versus Q3 (African) (AMOVA P < 0.028). The Q3 (African) had largely African haplogroups, a result consistent with the known recent African admixture of this subpopulation (Omberg et al. 2012). View larger version: Download as PowerPoint Slide Figure 2. Y Chromosome (Chr Y) and mitochondrial DNA (mtDNA) haplogroup assignments. The Chr Y and mtDNA haplogroups were determined for Q1 (Bedouin), Q2 (Persian-South Asian), and Q3 (African). (A) Pie charts of the haplogroup frequencies for Chr Y. (B) Pie charts of the haplogroup frequencies for mtDNA. The mtDNA haplogroups were less partitioned among the Qataris, although they still showed significant partitioning between each pair of subpopulations (AMOVA Q1 versus Q2 P < 0.035, Q1 versus Q3 P < 1 × 10−5, Q2 versus Q3 P < 0.017) and among all three considered simultaneously (AMOVA P < 1 × 10−5) (Supplemental Table VII). The mtDNA haplogroups also included more worldwide geographic diversity overall, indicating a different male versus female pattern of intermarriage among these subpopulations (Sandridge et al. 2010). Together the Chr Y and mtDNA haplogroups indicate that the Q1 (Bedouin), Q2 (Persian-South Asian), and Q3 (African) ancestry groups represent genetic subpopulations that not only reflect known migration history (Hunter-Zinck et al. 2010; Omberg et al. 2012) but that also represent units defined by a patrilocal society with strong historical barriers to intermarriage (Esposito 2001; Cavalli-Sforza and Feldman 2003), in which gene flow has been dominated by female movement (i.e., admixture occurring through females marrying into the relatively isolated subpopulations), as well as female influxes from other geographic areas.

X-linked and autosomal diversity To further analyze the relative male and female contributions to the genetics of the Qatari Q1 (Bedouin), Q2 (Persian-South Asian), and Q3 (African) subpopulations, we analyzed genome-wide ratios of X-linked and autosomal (X/A) diversity and X/A diversity ratios for genome intervals >0.18 cM from genes (Supplemental Table VIII; Supplemental Fig. 6). For both of these ratios, the Q1 (Bedouin) and Q2 (Persian-South Asian) were lower than for African populations but were higher than for Europeans and Asians. This points to a higher effective population size of females in the Q1 (Bedouin) and Q2 (Persian-South Asian), possibly a consequence of the out-of-Africa migrations, which were believed to be biased toward migration of males over females (Gottipati et al. 2011; Arbiza et al. 2014). The Q3 (African) Qataris had X/A diversity ratios that were higher, even when compared to African populations. This may be driven by a smaller male effective population size; a possible consequence of a polygamous culture and the ancestry of the Q3 (African) subpopulation that was a result of the historical slave trade into the region from Africa (Omberg et al. 2012). We also analyzed the relative ratios of X-linked and autosomal (X/A) diversity in nongenic regions of the female Q1 (Bedouin), Q2 (Persian-South Asian), and Q3 (African) genomes compared to females in African populations of the 1000 Genomes Project (Supplemental Table IX). The relative X/A ratios of both the Q1 (Bedouin) and Q2 (Persian-South Asian) to African populations were slightly higher than when comparing European to African populations (Gottipati et al. 2011; Arbiza et al. 2014). This could indicate a slightly less extreme set of bottleneck events encountered since the out-of-Africa migrations by the direct ancestors of the Q1 (Bedouin) and Q2 (Persian-South Asian) compared to the bottlenecks encountered by the direct ancestors of Europeans. The relative X/A diversity ratios of Q3 (African) to African populations were closer to one, consistent with the known African admixture of this subpopulation (Omberg et al. 2012).

Admixture analysis The signal of an ancient bottleneck in the Q1 (Bedouin) is not unexpected given previous analyses of genomic admixture that found <1% African ancestry in this subpopulation (Omberg et al. 2012) and studies of worldwide population structure, which have inferred that the Q1 (Bedouin) genomes have the greatest proportion of Arab genetic ancestry, even when compared to Bedouins from outside Qatar and to Arabs in surrounding countries, including Yemen and Saudi Arabia (Hodgson et al. 2014; Shriner et al. 2014). To confirm a similarly minute amount of African admixture for the Q1 (Bedouin) in our sample, we applied three methodologies: (1) an ADMIXTURE (Alexander et al. 2009) analysis of the genome-wide ancestry proportions in the 104 Qataris, the 1000 Genomes Project (The 1000 Genomes Project Consortium 2012), and Human Origins samples (Lazaridis et al. 2014); (2) an ALDER (Loh et al. 2013) analysis of the proportion and timing of African ancestry in these same populations; and (3) a SupportMix (Omberg et al. 2012) analysis of the population assignments of local genomic segments of the 96 Q1 (Bedouin), Q2 (Persian-South Asian), or Q3 (African) Qatari genomes. The ADMIXTURE analysis identified K = 12 ancestral populations as having the lowest cross-validation error (Supplemental Fig. 7A). At this level of resolution, the Q1 (Bedouin) had a high average (84%) proportion of ancestry that was also present in the Human Origins Bedouin B population at a high average proportion (93%) (Supplemental Fig. 7B,C), in which this same ancestry was also shared with Saudis, and at lower levels among other Middle Eastern populations. This ancestry therefore appears to be the signal of an indigenous Arab ancestral population. The Bedouin A population also shared this ancestry but at a lower average proportion (45%) and appeared to be more admixed overall. The Q2 (Persian-South Asian) shared a large proportion (45% on average) of ancestry that dominates in Iranians (46% on average), consistent with a Persian ancestral population (Omberg et al. 2012). The Q3 (African) shared the majority of ancestry with African populations as expected and were considerably admixed overall, again consistent with the known history of this subpopulation (Supplemental Fig. 7A; Omberg et al. 2012). The ALDER analysis determined the relative percentage of African (Yoruba) ancestry in the Q1 (Bedouin) (2.6% ± 1.37) and Q2 (Persian-South Asian) (5.0% ± 1.41) at levels on par with estimates for other populations sampled in the region (Supplemental Fig. 8; Supplemental Table X), including Human Origins Bedouin and Saudi. This confirmed that recent African admixture is limited to the Q3 (African) subpopulation (37.6% ± 0.9), in which this estimate is on par with African American populations. An estimate of the timing of African admixture placed the number of generations for Q1 (Bedouin) (15.2) and Q2 (Persian-South Asian) (14.0) slightly higher than Q3 (African) (9.3), consistent with the Q1 (Bedouin) and Q2 (Persian-South Asian) reflecting more distant African admixture events and with the Q3 (African) reflecting the historical timing of the African slave trade in the region (Omberg et al. 2012). The SupportMix analysis used six of the 1000 Genomes populations (two European, two Asian, and two African) (see Supplemental Methods for details) as ancestral proxy reference panels and produced a set of “best guess” admixture assignments based on highest similarity to these genomes. Although these 1000 Genomes populations do not include appropriate local populations most closely related to the Qataris needed for assessment of the true admixture composition of the genomes, the ancestry track length distribution of haplotypes assigned to African populations (Yoruba or Luhuya) provides a qualitative indicator of whether the subpopulations experienced recent admixture with African populations. As expected, the track lengths of the Q1 (Bedouin) and Q2 (Persian-South Asian) assigned to African 1000 Genomes populations were far shorter than those for Q3 (African) (Supplemental Fig. 9), again confirming that recent African admixture is limited to the Q3 (African) subpopulation.

Neanderthal ancestry We next analyzed Neanderthal admixture contributions to the ancestry of Q1 (Bedouin) compared to the Q2 (Persian-South Asian) and Q3 (African) Qataris, the 1000 Genomes populations, and the populations of the Human Origins samples using the F 4 ratio and Patterson's D-statistic (Fig. 4; Supplemental Fig. 10, Supplemental Table XI; Patterson et al. 2012). The results for both methods were highly correlated (Supplemental Fig. 10A). The Q1 (Bedouin; F 4 ratio = 0.026, D-statistic = 0.000) had more Neanderthal admixture than all African populations, including Q3 (African; F 4 ratio range = −0.017 to 0.024, D-statistic range = −0.031 to −0.003). The Q1 (Bedouin) also had Neanderthal admixture at levels comparable to Q2 (Persian-South Asian; F 4 ratio = 0.024, D-statistic = −0.003) and to other Middle Eastern populations, including other Bedouin populations (Human Origins Bedouin A F 4 ratio = 0.022, D-statistic = −0.003 and Bedouin B F 4 ratio = 0.024, D-statistic = −0.003) and Saudi (F 4 ratio = 0.026, D-statistic = −0.001). Interestingly, the Q1 (Bedouin) did not tend to have higher Neanderthal admixture levels when considering populations outside of the Middle East, where the bulk of European populations had higher Neanderthal admixture (F 4 ratio range = 0.018 to 0.041, D-statistic range = 0.003 to 0.010). Yet, the percentage of Neandethal admixture with the Q1 (Bedouin) was higher than expected if it could be entirely explained by later admixture events between the Q1 (Bedouin) and Europeans (observed F 4 ratio = 0.026 versus expected F 4 ratio = 0.00247). View larger version: Download as PowerPoint Slide Figure 4. Neanderthal ancestry in world populations. F 4 ratio estimation as implemented in ADMIXTOOLS 3.0 (Patterson et al. 2012) was used to calculate the Neanderthal ancestry proportion for each population in the combined data set of Qatari genomes, the 1000 Genomes Project, and Human Origins. The F 4 ratio estimates α, the proportion of Neanderthal ancestry in a population. Shown are the results for populations of interest, including highest and lowest scoring populations from each region (the 1000 Genomes Project, Africa; the 1000 Genomes Project, America; the 1000 Genomes Project, East Asia, the 1000 Genomes Project, Europe, Human Origins, Africa; Human Origins, America; Human Origins, Central Asia/Siberia; Human Origins, East Asia; Human Origins, Oceania; Human Origins, South Asia; Human Origins, West Eurasia), Middle Eastern populations (Human Origins), Q1 (Bedouin), Q2 (Persian-South Asian) and Q3 (African). Populations are color-coded by region, and a distinct color is used for each Qatari population. A full set of results is presented in Supplemental Figure 10 and Supplemental Table XI. The population codes are as in the 1000 Genomes Project (The 1000 Genomes Project Consortium 2012). The higher Neanderthal ancestry in the Q1 (Bedouin) Qatari compared to African populations places the divergence of ancestral Arabs after the out-of-Africa bottleneck. Given the current evidence of the geographic range of Neanderthal populations stretching from Europe and the Mediterranean through Northern and Central Asia (Fu et al. 2014; Hershkovitz et al. 2015), the lower Neanderthal Ancestry in the Q1 (Bedouin) Qatari compared to populations within the ancestral Neanderthal range is also consistent with an early divergence of the ancestors of indigenous Arabs from other lineages that populated Asia and Europe. Yet, since the Neanderthal admixture in the Q1 (Bedouin) cannot be entirely explained by admixture with Europeans, this indicates there was some admixture between Neanderthals and ancestors of the Q1 (Bedouin) in the region of the Arabian Peninsula.

TreeMix analysis We also analyzed the autosomes of the combined 96 Q1 (Bedouin), Q2 (Perisan-South Asian) or Q3 (African) Qataris, and non-admixed populations of the 1000 Genomes Project using the population split and mixture inference method TreeMix (Pickrell and Pritchard 2012) to assess the relative genetic similarity of populations based on high-density, genome-wide allele frequencies. The analysis returned an overall tree for the 1000 Genomes populations that mirrored those found previously (Shriner et al. 2014) with the addition of the Q1 (Bedouin) and Q2 (Persian-South Asian) clustering on the branch that includes Europeans (Pérez-Miranda et al. 2006) and the Q3 (African) clustering with African populations (Fig. 5). When migrations were allowed in the analysis, no migration events were observed between the Q1 (Bedouin) and African populations, even when allowing as many as five migration events (Supplemental Fig. 11). These results are also consistent with what is known of the migration history of the Arabian Peninsula, including migration both to and from Europe during ancient and more recent eras of civilization, where this resulted in detectable admixture from European populations in both the Q1 (Bedouin) and Q2 (Persian-South Asian) (Omberg et al. 2012). View larger version: Download as PowerPoint Slide Figure 5. TreeMix (Pickrell and Pritchard 2012) hierarchical clustering analysis of the Q1 (Bedouin), Q2 (Persian-South Asian), and Q3 (African) and the 1000 Genomes Project samples. Shown is a maximum-likelihood tree of population splits inferred without subsequent migration events, in which branch lengths estimate divergence between populations (Europeans in shades of purple: CEU, FIN, GBR, IBS, TSI; East Asians in shades of brown: CHB, CHS, JPT; Africans in shades of orange: LWK, YRI, with the Q1 [Bedouin] in red, Q2 [Persian-South Asian] in azure, and Q3 [African] in black). When allowing from one to five migration events in separate TreeMix analyses, none of the admixture loops connected the Q1 (Bedouin) with any African populations (Supplemental Fig. 10), consistent with the Q1 (Bedouin) having no recent African admixture.