Abstract Widely considered as one of the cradles of human civilization, Mesopotamia is largely situated in the Republic of Iraq, which is also the birthplace of the Sumerian, Akkadian, Assyrian and Babylonian civilizations. These lands were subsequently ruled by the Persians, Greeks, Romans, Arabs, Mongolians, Ottomans and finally British prior to the independence. As a direct consequence of this rich history, the contemporary Iraqi population comprises a true mosaic of different ethnicities, which includes Arabs, Kurds, Turkmens, Assyrians, and Yazidis among others. As such, the genetics of the contemporary Iraqi populations are of anthropological and forensic interest. In an effort to contribute to a better understanding of the genetic basis of this ethnic diversity, a total of 500 samples were collected from Northern Iraqi volunteers belonging to five major ethnic groups, namely: Arabs (n = 102), Kurds (n = 104), Turkmens (n = 102), Yazidis (n = 106) and Syriacs (n = 86). 17-loci Y-STR analyses were carried out using the AmpFlSTR Yfiler system, and subsequently in silico haplogroup assignments were made to gain insights from a molecular anthropology perspective. Systematic comparisons of the paternal lineages of these five Northern Iraqi ethnic groups, not only among themselves but also in the context of the larger genetic landscape of the Near East and beyond, were then made through the use of two different genetic distance metric measures and the associated data visualization methods. Taken together, results from the current study suggested the presence of intricate Y-chromosomal lineage patterns among the five ethic groups analyzed, wherein both interconnectivity and independent microvariation were observed in parallel, albeit in a differential manner. Notably, the novel Y-STR data on Turkmens, Syriacs and Yazidis from Northern Iraq constitute the first of its kind in the literature. Data presented herein is expected to contribute to further population and forensic investigations in Northern Iraq in particular and the Near East in general.

Citation: Dogan S, Gurkan C, Dogan M, Balkaya HE, Tunc R, Demirdov DK, et al. (2017) A glimpse at the intricate mosaic of ethnicities from Mesopotamia: Paternal lineages of the Northern Iraqi Arabs, Kurds, Syriacs, Turkmens and Yazidis. PLoS ONE 12(11): e0187408. https://doi.org/10.1371/journal.pone.0187408 Editor: Chuan-Chao Wang, Harvard Medical School, UNITED STATES Received: June 8, 2017; Accepted: October 9, 2017; Published: November 3, 2017 Copyright: © 2017 Dogan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The following YHRD (https://yhrd.org/) Accession Numbers were assigned for the five novel Y-STR datasets from the current study: Northern Iraq [Arab]: YA004212; Northern Iraq [Kurdish]: YA004213; Northern Iraq [Syriac]: YA004214; Northern Iraq [Turkmen]: YA004215; and Northern Iraq [Yazidi]: YA004216. All five Y-STR datasets are also available at the Figshare online digital repository (https://doi.org/10.6084/m9.figshare.5530510.v1). Funding: The author(s) received no specific funding for this work. Competing interests: The authors have declared that no competing interests exist.

Introduction Often considered as one of the cradles of human civilization, Mesopotamia encompasses the ancient fertile lands defined by the Tigris and Euphrates river systems. Today, these lands are largely situated in Iraq, which shares borders with Jordan to the west, Syria to the north-west, Turkey to the north, Kuwait and Saudi Arabia to the south and Iran to the east (Fig 1). Iraq has a population of ~40 million, comprising mainly of Arabs and Kurds, but also the Assyrians, Turkmens, Shabakis, Yazidis, Armenians, Mandeans, Circassians, and Kawliya minorities. Accordingly, population genetics of Iraqis is of interest not only because of this ethnic diversity, but also due to the fact that the country was home to the Sumerian, Akkadian, Assyrian and Babylonian civilizations, and ruled by the Persians, Greeks, Arabs, Mongolians, Ottomans and British [1, 2]. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Immediate geographic location of the Republic of Iraq (A modified version of the Iraq physiography map taken from the CIA’s World Factbook). Only the self-reported birthplaces of the volunteers (e.g. cities, towns, etc.) are shown on the map. https://doi.org/10.1371/journal.pone.0187408.g001 Among the Northern Iraqi populations, Arabs are regarded as a panethnicity that largely adhere to different sects of Islam and actually native to an immense geography spanning from the Atlantic coast of North Africa to the Horn of Africa in the East, as well as the entire Arabian Peninsula and a large portion of the Near East. Iraqi Arabs have been the majority in the region since the 3rd century AC, when the first Arab Kingdom was formed outside of Arabian Peninsula [3]. Arabs are estimated to comprise 75–80% of the entire Iraqi population, while Kurds, the largest ethnic minority in Iraq, comprise 15–20%, and furthermore the latter also constitutes the majority in Northern Iraq [1]. Kurds are of Indo-European origin, and speak the Kurdish language, a subgroup of Northwestern Iranian languages [4]. Kurdish people are considered to be one of the native inhabitants of Iraq, although there is no strict description on their precise origin [4]. Turkmens, also known as Turcomans, largely exists as a prominent minority beyond the immediate Southeastern borders of Modern Turkey, across Northern Syria, Northern Iraq and Northeastern Iran. Iraqi Turkmens are the third largest ethnic group in the country and mostly live in an area extending from northwest to southeast of Iraq, including the provinces of Mosul, Erbil and Kirkuk [5]. As in the case of other ethnic minorities in Iraq, precise population data are not available, but Iraqi Turkmens are estimated to constitute between 3% to 13% of the entire Iraqi population [6]. Yazidis, also known as Yezidis, are an ethnoreligous group largely inhabiting Northern Syria and Northern Iraq. A distinguishing feature of Yazidis among the other Mesopotamian populations is their religion, Yazidism or Yazdanism, which is linked with the ancient Mesopotamian religions and combines aspects of Zoroastrianism, Islam, Christianity and Judaism [7]. Finally, Syriacs, also known as Assyrians, Chaldeans and Arameans are also an ethnoreligious group native to Middle East, largely inhabiting a region from across modern Syria, Iraq and Iran. Syriacs are Semitic people that speak modern Arameic and adhere to different sects of Christianity. Syriacs are also an indigenous ethnic group of Modern Iraq, and are known to inhabit major cities, as well as in the mountainous regions to the east of Mosul, near Dohuk and Akra [8]. Recent estimates suggest that there are 133,000 Assyrians in Iraq, or less than 1% of total population [9]. At least from a population genetics perspective, the contemporary Iraqi populations remain almost unexplored. In such cases, investigations on the paternal and maternal lineages, which are based on the Y-chromosome and mitochondrial DNA, respectively, can provide very useful primers [10]. On the one hand, variations among different paternal lineages are best described in terms of Y-chromosomal haplogoups, which are in turn defined by unique combinations of Y-chromosomal single nucleotide polymorphisms (Y-SNPs). On the other hand, Y-chromosomal short tandem repeat markers (Y-STRs) are another highly useful set of markers and offer further advantages through their higher mutation rates compared to Y-SNPs, hence allowing more detailed investigations within each haplogroup. Over the last decade, in silico Y-chromosomal haplogroup assignment tools have also become available, which allow haplogroup assignment for a given paternal lineage based on Y-STR data alone and with accuracies over 95% [11]. The aim of the current study was to contribute to a better understanding of the genetic basis of the Northern Iraqi ethnic diversity through a comparative analysis of the paternal lineages belonging to five of the most populous ethnicities from the region. To achieve this, a total of 500 samples were collected from the Arab, Kurd, Turkmen, Yazidi and Syriac communities, and each was analyzed by 17-loci Y-STR haplotyping and then in silico haplogroup assignment. Systematic comparisons of the paternal lineages, not only among themselves but also in the context of the larger genetic landscape of the Near East and beyond, revealed the presence of intricate Y-chromosomal lineage patterns among the five ethic groups analyzed, wherein both interconnectivity and independent microvariation were observed in parallel, albeit in a differential manner.

Materials and methods A total of 500 buccal swab samples were collected from healthy and unrelated individuals, each of whom was aged 18 and above and belonged to one of the five major ethnic groups in Northern Iraq as follows: Arabs (n = 102), Kurds (n = 104), Syriacs (n = 86), Turkmens (n = 102) and Yazidis (n = 106). Determination of ethnicity was based on that of both parents. While the Arab, Kurdish and Turkmen samples were largely collected from among the students of the Salahaddin University in Erbil, the Syriac and Yazidi samples were mostly collected at various refugee camps in Erbil. Yet, the actual birthplaces of the volunteers encompassed a wider geography from Northern Iraq as depicted in Fig 1. All samples were collected with written informed consent and according to the principles of the Helsinki Declaration of the World Medical Association. Local translators were also available to ensure informed consent. Approvals for the study were provided by the Ethics Committee of the Department of Genetics and Bioengineering, as well as that of the Faculty of Engineering and Information Systems, both at the International Burch University. All sample collections in Northern Iraq were carried out through the College of Education-Scientific Department at the University of Salahaddin, which also approved the project, procured the requisite permissions from the local authorities and actively participated in the realization of the project. Genomic DNA extractions and 17-loci Y-STR haplotyping (DYS19, DYS385a/b, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and Y-GATA-H4) were carried out with the Life Technologies PureLinkTM Genomic DNA Mini Kit and AmpFlSTR® Y-filerTM Kit, respectively. Capillary gel electrophoreses were conducted on a Life Technologies ABI 3130 Genetic Analyzer. Alleles were assigned according to the current International Society for Forensic Genetics (ISFG) guidelines for forensic Y-STR analysis [12]. Samples with Y-STR haplotypes bearing bi-allelic patterns at loci other than DYS385a/b were further typed with autosomal STRs (Life Technologies AmpFlSTR® IdentifilerTM Kit) to ascertain their single-source status. All DNA extractions and typing were conducted at the Turkish Cypriot DNA Laboratory as previously described [13, 14]. Y-STR haplotyping and autosomal STR genotyping proficiencies were certified though participation in the YHRD Quality Control Exercise (2013) and ISFG English-Speaking Working Group Relationship Testing Workshop (2015). The following YHRD Accession Numbers were assigned for the five novel Y-STR datasets from the current study: Northern Iraq [Arab]: YA004212; Northern Iraq [Kurdish]: YA004213; Northern Iraq [Syriac]: YA004214; Northern Iraq [Turkmen]: YA004215; and Northern Iraq [Yazidi]: YA004216. All of the five Y-STR datasets are also available at the Figshare online digital repository (https://doi.org/10.6084/m9.figshare.5530510.v1). Haplotype and allele frequencies were calculated using the direct counting method. Statistical parameters of forensic interest, such as gene diversity (GD) and haplotype diversity (HD) were both calculated according to the Nei’s formula [15]. Analysis of molecular variance (AMOVA) and the subsequent visualization by multi-dimensional scaling (MDS) were carried out using the YHRD online tool [16]. The AMOVA/MDS genetic distance measures were based on Slatkin’s R st values, significance of which were ascertained with probability (P) values (10,000 permutations), which were revised following a Bonferroni correction to account for potential Type I errors [17]. In addition to the five novel Y-STR datasets from the current study, the following datasets from nearby and distant populations and with at least 17-loci Y-STR coverage were also included during AMOVA/MDS analysis (population sample size, YHRD Accession No.): Kuwait City, Kuwait [Arab] (n = 285, YA003763), Iraq [Iraqi] (n = 124, YA003858), Beirut, Lebanon [Lebanese] (n = 555, YA003785 & YA003859), Iran [Iranian] (n = 104, YA004237), Cyprus [Turkish Cypriot] (n = 380, YA003850), Cyprus [Greek Cypriot] (n = 344, YA004186), Cukurova, Turkey [Turk] (n = 249, YA003668), Southeastern Anatolia, Turkey [Turkish] (n = 150, YA003727 and YA004118), Marmara Region, Turkey [Turkish] (n = 385, YA004119), Afghanistan [Pathan] (n = 125, YA003701), Russian Federation [Russian] (n = 204, YA004184), Ulaanbaatar, Mongolia [Mongolian] (n = 261, YA004127), Dhaka, Bangladesh [Bangladeshi] (n = 348, YA003445), Beijing, China [Han] (n = 847, YA003197, YA003470, YA003861 and YA004160), Albania [Albanian] (n = 100, YA003096), Bosnia and Herzegovina [Bosnian] (n = 100, YA003787), Marche, Italy [Italian] (n = 165, YA003069), Upper Bavaria, Germany [German] (n = 200, YA003790), and Tanzania [Tanzanian] (n = 101, YA004196). Prior to the AMOVA/MDS analysis, the online YHRD tool removes all haplotypes with (a) null, (b) partial/intermediate alleles (e.g. DYS458*.2), (c) duplicated alleles (except for DYS385), etc. Yet, considering that (a) there are 86 haplotypes with DYS458*.2 in the combined dataset from Northern Iraq (Table 1), and that (b) DYS458*.2-bearing haplotypes are almost exclusively associated with the J1 haplogroup, to ensure the inclusion of the maximum number of haplotypes during AMOVA/MDS, all allelic data at the DYS458 locus was excluded instead (i.e. AMOVA/MDS analysis was carried out with 16-loci Y-STR datasets). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. The number of times each allelic microvariant, bi-allelic pattern and null allele were observed in each of the five ethnic groups from Northern Iraq and their overall frequency in the combined population. https://doi.org/10.1371/journal.pone.0187408.t001 A neighbor-joining (N-J) phylogenetic tree based on the Nei’s discriminant analysis (D A ) genetic distance metric and the allele frequencies of each dataset was constructed using the POPTREE2 software [18]. Bootstrap values were calculated based on 10,000 replications. Along with the five novel Y-STR datasets from the current study, the following population datasets with equivalent loci coverages were included during analysis: Cyprus [Greek Cypriot] (n = 344) [19]; Cyprus [Greek Cypriot II] (n = 574) [20]; Iran [East Iranian] (n = 200) and Iran [West Iranian] (n = 124) [21]; West Asia [Armenian, Erzurum origin] (n = 99), West Asia [Armenian, Hemsheni] (n = 89), West Asia [Armenian, Krasnodar] (n = 117), West Asia [Armenian, Adygei] (n = 49), West Asia [Armenian, Don] (n = 92) [22]; Greece [Greek] (n = 214), Iraq [Iraqi] (n = 124), Barcelona, Spain [Spanish] (n = 78), Bohemia, Czechia [Czech] (n = 72), Hungary [Hungarian] (n = 143), Upper Bavaria, Germany (German) (n = 200), Bosnia and Herzegovina [Bosnian] (n = 100), Marche, Italy [Italian] (n = 170), Sicily, Italy [Italian] (n = 157), Central Poland [Polish] (n = 102), Central England [English] (n = 81), Lebanon [Lebanese] (n = 505), Beijing, China [Han] (n = 246), Ibadan, Nigeria [Yoruba] (n = 81), Kinyawa, Kenya [Maasai] (n = 100), Philippines [Filipino] (n = 169), Southern India, India [Tamil] (n = 126) and Tokyo, Japan [Japanese] (n = 59) [23]; Iraq [Iraqi II] (n = 400) [24]; Lebanon [Maronite] (n = 196) [21]; Cyprus [Turkish Cypriot] (n = 380) [13]; Afghanistan [Turkmen] (n = 73) [25]; Uzbekistan [Turkmen] (n = 83) [26]; Marmara Region, Turkey [Turkish] (n = 385) [YHRD Accession No.: YA004119]; Cukurova, Turkey [Turk] (n = 249) [27]; and Southeastern Anatolia, Turkey [Turkish] (n = 86+64) [28] and [YHRD Accession No.: YA004118]. 17-loci Y-STR-based in silico haplogroup assignments were made using the 21-haplogroup batch processing version of the Whit Athey algorithm [29]. Validation of the in silico haplogroup assignments were carried out using a second algorithm called NevGen Y-DNA Haplogroup Predictor (www.nevgen.org). A stand-alone Python program was implemented, which called the NevGen haplogroup prediction AJAX API directly for each haplotype to allow automated processing of all Y-STR haplotypes. Prior to the NevGen analysis, null alleles, intermediate/partial alleles and multi-allelic patterns (except for DYS385) were each assigned a value of ‘0’. Median-joining network (M-JN) analyses were carried out using the Network v.5.0.0.1 software (www.fluxus-engineering.com) as previously described [13]. Briefly, (a) all haplotypes with intermediate/partial alleles and/or multi-allelic patterns were removed prior to analysis, (b) a default epsilon parameter value of zero was used, and (c) maximum parsimony post-processing was applied again with the default parameters. Time to the most recent common ancestor (TMRCA) estimates were done on the resultant M-JN trees by selecting a proposed central ancestral node and then all the other nodes in the remaining network as the descendant nodes. Each TMRCA estimate was done in duplicate based on a generation time of 25 years, and the genealogical and evolutionary Y-STR mutation rates of 0.00267 and 0.00069, respectively, both per locus per generation [30–33].

Discussion HD values ranging between 0.97456 and 0.99739 were observed for the Syriac and Kurdish population datasets, respectively, and intermediate values for the remaining three ethnic groups analyzed (Table 2). An immediate difference between the 17-loci Y-STR datasets obtained was that in the number of haplotype replicates observed, both at intra and inter population levels, and as reflected by the UH values observed: Arabs (78.43%), Kurds (80.77%), Syriacs (36.05%), Turkmens (72.55%) and Yazidis (22.64%). Such low UH values observed for the Syriac and Yazidi ethnic groups are perhaps reflective of the well-documented isolation and/or strict, religious endogamy in these communities [7, 35]. The observed DC values for each population dataset also exhibited significant variations, ranging from 47.17% for Yazidis to 89.42% for Kurds and intermediate values for the other three ethnicities (Table 2). A somewhat counteracting effect was the observation of numerous rare genetic variations that could potentially help during forensic investigations and may also provide novel insights from an anthropological perspective (Table 1). Although based on two different genetic distance metrics, namely R st and Nei’s D A , and also analyses comprising largely different population datasets, AMOVA/MDS (Table 3 and Fig 2) and N-J phylogenetic tree (S2 Table and Fig 3) analyses seemingly revealed concordant results whereby each of the new population datasets from the current study were found to be distinct in the sense that they all exhibited differential clustering with each other and those from other nearby/distant populations. To provide further insights from an anthropological perspective, haplogroup assignments were made with the popular Whit Athey haplogroup assignment algorithm, the results of which were then further validated through the use of a second algorithm, namely the NevGen Y-DNA Haplogroup Predictor (S3 Table). Observation of a ‘gross discrepancy rate’ of 10.2% and a ‘corrected discrepancy rate’ of only 5.8% suggested that such in silico haplogroup assignment tools could perhaps provide some insights when proper Y-SNP data is not available. So, with great caution, the following relevant conclusions were made based on such in silico produced data alone. The R (25%) and J (39%) macrohaplogroups were found to account for over 60% in total for the combined dataset from Northern Iraq, which is consistent with the fact that both macrohaplogroups are thought to originate from the Near East as pre-Last Glacial Maximum events that subsequently spread to Europe during late Mesolithic and early Neolithic time, respectively (Table 4 and Fig 4) [36, 37]. In contrast, significant variations were observed in the actual distribution of specific sub-clades of these and other macrohaplogroups among the five different ethnic groups from Northern Iraq, perhaps akin to other highly admixed and/or divergent populations from the Near East [13, 37–39]. While there are a number of earlier studies on the paternal lineages of various Kurdish populations, these correspond to smaller population samples and/or loci coverages than that in the current study [39–43]. One of these earlier studies included Y-SNP-based haplogroups distribution for four Kurdish populations in total from Turkey, Georgia and Turkmenistan, where J2 and R were observed up to 32% and 37%, respectively [42]. In a more recent study focusing on different ethnic groups from Iran, haplogroups J2 and R were both observed at 24% in Kurds, wherein R1a alone accounted for 20% [39]. Consequently, results from these earlier studies are in good agreement with those for Northern Iraqi Kurds from the current study, wherein J2 subclades were found to account for 22%, while lineages R1a and R1b together accounted for 21%, and with R1a at 17%. Y-chromosomal data on various Arabic-speaking populations across a wide geography ranging from North Africa to West Asia are also available in the literature, often pointing out to the heterogeneous nature of these populations and reflective of their panethnic composition. Y-chromosomal haplogroup distributions in Marsh Arabs from the eastern part of Iraq were also investigated, wherein J1 was found to be the most prevalent lineage with its three markers accounting for 81% in total [44]. Hence, results from the current study on the Northern Iraqi Arabs are in good agreement with those for Marsh Arabs because J1 lineages accounted for around 39% in the former, constituting the highest not only in this ethnic group, but also among all five analyzed. Considering that J1 is thought to originate from a geographical zone that includes northeastern Syria, northern Iraq and eastern Turkey, from where it expanded to the rest of the Near East and North Africa, such high prevalence of J1 among Iraqi Arabs is indicative of their indigenous nature [45]. There are also a number of earlier investigations on the paternal lineages of various Turkmen populations [25, 26, 39, 46]. However, a distinction should perhaps be made between the Turkic populations from Turkmenistan in Central Asia and elsewhere, such as in Northern Iraq and Northern Syria. At least the Northern Iraqi Turkmen, although still Turkic and thus with historical links with Central Asia, have even closer links with the Turkic populations from Anatolia and/or Azerbaijan/Northwestern Iran. Earlier investigations on the Turkmen population in Afghanistan, Uzbekistan and Iran, suggested that haplogroup Q was the most prevalent accounting for 34%, 73% and 43%, in that order [25, 26, 39]. An earlier study from the Turkmenistan population per se also exists, albeit of relatively poor Y-SNP typing resolution, whereby the most prevalent haplogroups observed were P(xR1a), J and N(x3) with the frequencies of 52%, 24% and 10%, in that order [46]. Results from the current study suggest that haplogroup distribution for the Northern Iraqi Turkmen population is more similar to that of other Northern Iraqi populations, such as Kurds, as well as Turkish populations in Southeastern Anatolia and Cyprus [13, 37]. Results from the current study also suggested that, the paternal lineages of the Northern Iraqi Syriacs are rather homogenous, and exhibit signs of a strong population bottleneck, a situation perhaps even further emphasized due to strict endogamy known to be practiced in this ethnic group (Table 2). This also seems to be the case for the Northern Iraqi Yazidis, where strict endogamy is also practiced in a relatively small and isolated population of around half a million people [7, 47]. In the case of Northern Iraqi Syriacs, significant R st genetic distances were observed with all other nearby populations, except for the Yazidis from the current study, and Iraqis, Iranians, Italian (Marche) and Turkish populations from Cukurova, the Marmara Region and Southeastern Anatolia in general (Table 3, Fig 2). In contrast, the Northern Iraqi Yazidis were found to have non-significant R st genetic distances with all other four ethnic groups from the current study, as well as those from Albania, Cyprus, Iraq, Iran Lebanon and Italy (Marche), as well as the Turkish populations from the Marmara Region and Southeastern Anatolia (Table 3, Fig 2). Consequently, despite corresponding to isolated and homogenous populations, contemporary Syriacs and Yazidis from Northern Iraq may in fact have a stronger continuity with the original genetic stock of the Mesopotamian people, which possibly provided the basis for the ethnogenesis of various subsequent Near Eastern populations. Such an observation seems to be in line with genetic distance calculations based on a different method, namely Nei’s D A genetic distance, whereby the Northern Iraqi Syriac and Yazidi populations from the current study were found to position in the middle of a genetic continuum between the Near East and Southeastern Europe. Earlier Y-chromosomal haplogroup distribution data on Syriacs from Northern Iraq (n = 7) and Iran (n = 48 and 55) suggested an overall dominance by the R and J haplogroups [35, 39, 45]. In particular, in the most recent study with the highest haplogroup resolution (n = 48), R1a, R1b, J1 and J2 sub-clades were found to account for 8%, 29%, 15% and 15% in that order among Assyrians from Iran [39]. In this respect, the results from the current study, albeit on Northern Iraqi Syriacs (n = 86) are in good agreement because J and R subclades were observed at 36% and 41%, respectively, where R1a, R1b, J1 and J2 sub-clades accounted for 11%, 30%, 12% and 24%. Unfortunately no previously published data exists on the Y-chromosomal haplogroup distributions in Yazidis from Northern Iraq or elsewhere, hence precluding comparisons with those from the current study. Results from the current study suggest dominance by R haplogroup subclades among Yazidis, where R1a and R1b account for 9% and 21%, respectively. M-JN and associated TMRCA analyses on haplotypes with J1, J2a1b, R1a and R1b haplogroup assignments among Northern Iraqis all suggested in situ radiation as a plausible model to explain the diversity of the corresponding paternal lineages. This is because there were seemingly: (a) a number of star-like descent clusters in the J1 network, exclusively or partially comprised of Arab haplotypes, which dominated the overall network, (b) two star-like descent clusters in the R1b network, one comprising Syriac and the other Yazidi haplotypes, which also both dominated the overall network, and (c) two star-like descent clusters in the J2a1b network, one comprising Syriac / Kurdish and the other Yazidi haplotypes, although the overall network was dominated by Kurdish haplotypes. In conclusion, data presented herein constitutes a significant primer for further population studies and forensic investigations in Northern Iraq, such as the missing person identification efforts due to past and present conflicts. Novel insights into the molecular anthropology of Near Eastern populations are also expected due to hitherto scantity of genetic data from this corner of the world of immense historical importance. However, it should be noted that the major limitation to this study is the lack of Y-SNP genotyping.

Supporting information S1 Table. 17-loci Y-STR haplotypes observed in the Northern Iraqi populations (n = 500). https://doi.org/10.1371/journal.pone.0187408.s001 (DOCX) S2 Table. Pairwise genetic distance matrix based on Nei's D A values between the five major ethnic groups from Northern Iraq and representative nearby and distant populations. https://doi.org/10.1371/journal.pone.0187408.s002 (XLS) S3 Table. In silico Y-chromosomal haplogroup assignments for the Northern Iraqi samples by the Whit Athey 21-haplogroup prediction and the NevGen Y-DNA haplogroup predictor algorithms (n = 500). https://doi.org/10.1371/journal.pone.0187408.s003 (DOC) S1 File. Table A: Allele frequencies of the 17 Y-STR loci for the combined Northern Iraqi population (n = 500). Table B: Allele frequencies of the 17 Y-STR loci for the Northern Iraq Arab population (n = 102). Table C: Allele frequencies of the 17 Y-STR loci for the Northern Iraq Kurdish population (n = 104). Table D: Allele frequencies of the 17 Y-STR loci for the Northern Iraq Syriac population (n = 86). Table E Allele frequencies of the 17 Y-STR loci for the Northern Iraq Turkmen population (n = 102). Table F: Allele frequencies of the 17 Y-STR loci for the Northern Iraq Yazidi population (n = 106). https://doi.org/10.1371/journal.pone.0187408.s004 (DOCX)

Acknowledgments We thank all the volunteers who donated samples and all the local contributors who helped with the sample collections. We also thank Dr. Huseyin Sevay for implementing the stand alone Phyton program for automatically retrieving NevGen haplogroup predictions, and for the help with the preparation of input files for the POPTREE N-J phylogenetic analysis.