In this study, in order to address the compelling question as to whether, the maternal gene pool of J&K was conserved across millennia, as expected from other phylogenetic studies of Indian populations, we targeted the mitogenomes, the strongest genetic tool for tracing direct maternal inheritance19, of 83 individuals belonging to different ethnic groups across Jammu and Kashmir.

From the data set, we identified a high diversity of maternal HGs in the region as the 83 mitogenomes included types from haplogroup (HG) to M, U, H, W, R, K, F, D, T, A, C and I. These major HG, included the lineages from as far as both Europe and Central Asia. Further, the phylogenetic diversity documented here has not been observed before in Indian populations. From the 83 mitogenomes, we identified 19 novel lineages, belonging to the following groups: M3a1c, M3a1d, M3a2b, M3a2b1, M65a3, M65a3a, M5a1b1, M5a1b1a, M5a6, M39b1a, W6e, F1c1a2a, U2a1a1, U2c2, U2c2a, U2c2a1, K2a5c, A21a and R0a2o. HGs were assigned to each individual according to the nomenclature of PhyloTree.org20 (mtDNA tree Build 17). These novel lineages were only designated when at least 2 individuals could be found belonging to that lineage (Table 1). Interestingly, some of the lineages called for a rearrangement of the existing phylogenetic tree (mtDNA tree build 17) to accommodate the previously unknown diversity. To avoid confusion, we named the lineages to follow the existing nomenclature of phylotree (build 17) but highlighted the branches that need rearrangement in the manuscript. In addition, a few individuals were observed with distinct haplotypes than those in existing phylotree (mtDNA tree build 17) and in literature. However, as these set of variants were observed only once, and thus were considered as possible private polymorphisms, and were added as new branches in the existing maternal phylogenetic tree (supplementary Data 1 and 2). It is possible that these branches are potentially new haplogroups with deep (Pleistocene) time differentiation (Fig. 1), yet we lacked replicate samples and further evidence to be characterized as such, due to a smaller sample size in the present preliminary study from the region.

Table 1 Novel HGs with common variants in the samples. Haplotypes shared by at least two unrelated individual were considered to constitute ‘new haplogroups’. Full size table

Figure 1 Phylogenetic tree and Bayesian Skyline plot based on complete mitogenomes of Jammu and Kashmir. The branching patterns of phylotree and spike in BSP shows expansion of maternal gene pool of Jammu and Kashmir past Last Glacial Maximum and early Holocene around a time period of 15,000–8000 YBP (highlighted). The solid line in the middle in BSP is the median estimate, while outer thin lines show the 95% highest posterior density (HPD) limits. Good convergence was achieved by applying the HKY72 and strict clock models79, With the prior mutation rates having a mean of (2.67 × 10−8)73. Full size image

Distribution of Indian specific haplogroups

Previous literature has reported that 60% of the maternal lineages are from the ancient macrohaplogroup M in the Indian subcontinent21. Further, some of the sub-HGs of M are in-situ and deep rooted in India22. These basal branches of macrohaplogroup M, which are old and highly diverse, suggest the initial settlement of India was likely during the earliest waves of modern human from Africa23. We found novel lineages of HG M in the J&K region in high frequency and diversity. For example: two new sub-HGs of M3a1 and M3a2 found in South India23 were observed as Sub-HGs, M3a1c and M3a1d, defined by variant motif T152C, A9051G, A9218G and A7670C, T14208C, A15649G, T16234C, respectively along with one additional sub-branch. Similarly, M3a2 can now be classified into two HGs: M3a2b and M3a2b1, defined by variant C7967T and T15458C, respectively, with one sub-branch. One sub-branch was added to each HG: M3c1a, M3c2, M4 and M65b. We also found novel sub-haplogroups in M65a, M65a3 further differentiating into M65a3a, and with additional three sub-branches in M65a1, as per existing tree (mtDNA tree build 17). However introduction of these branches call for rearrangement of the tree as: M65a2 to M65a1a; M65a3 to M65a1b; M65a3a to M65a1b1 (Supplementary Data 1). Incidentally, HG M65 is believed to have arisen in the area of the Indian subcontinent during the Upper Paleolithic period. However, it is reported to be present in people inhabiting present day Tibet, Himalayas, India and Pakistan24. Five sub-branches were added to M30 and one to M30b in the present study. One sub-branch was added to M5a1a; whereas, M5a1b that is reported in South India25, was also found in J&K with two sub-HGs: M5a1b1 and M5a1b1a, defined by variants C3954T, T9833C and A15902G, respectively. Two sub-branches were also added to M5a6 and one to M5a and a new sub-haplogroup, M5a6a, was added to M5a6, defined by variants T4500C and G10589A. M39 has been documented in the tribal populations of India23. We found sub-branch of M39b1, classified to M39b1a and defined by variants T58A, A1446T, G3531A and T16304C. Despite the fact that the frequency of HG M is higher in the South than North India10,26, we have observed diverse and distinct sub-HGs of HG M in this preliminary study (Supplementary Data 1), indicating a deep in time differentiation and migration to the region. These observations highlight the possibility that a lot has to be discovered in the region, which may have great implications in understanding the phylogeny of HG M in India.

Also seen in previous work from India, we found HG U to be predominant and the second most frequent and highly diverse group in the present study. HG U is known to be distributed geographically from Europe and North Africa through India and Central Asia21,27,28. It is the second most frequent maternal lineage in India as it is also in Europe29. The autochthonous HGs like U2a, U2b and U2c were observed in Indian populations28,30,31. To these HGs, newer sub HGs are added in the present study. U2a, reported to be present in Pakistan32, showed a new sub-haplogroup, designated as U2a1a and defined by variant T15629C. Two sub-branches were added to the HG U2b2 and one sub-branch to U2c1. Basal HGs U2c2, U2c2a, U2c2a1 were also observed and defined by variants: C13934T, and T4772C, C16188T, A16207G, and T63C, C64T, G9554T, A15954G, G16213A, respectively. A sub-branch was added to U2e1′2′3 and U7a3. Further, two sub-branches were added to U7a, U7a3b; and one sub-branch each to U7a3a, U7b, U5a1i1 and U1a1c1d (Supplementary Data 2). HG U7 has been found in various parts of India, Pakistan and Iran10 but is also widely distributed across Europe, near East and South Asia33. Additionally, one sub-branch was added to each HG R5a2 and R30b1; both R5 and R30 are common Indian HGs31.

Presence of non-Indian haplogroups in J&K

We identified various other HGs (sub-branches and some new HGs) of R, W, H, I, T, K, C, A, F and D but with relatively less frequency. These HGs appeared in low frequency individually, which in the present study may be due to a small sample size. However, all these groups taken together represented 29/83 studied samples (34.9%), thus more than a third of lineages were contributed from outside of India. Some of these HGs were mainly the West Eurasian and North-East Asian HGs, rarely have been found in India. Moreover, given the extent of diversity, these are some of the most interesting observations made in the present study. Various phylogenetic analyses (phylogenetic tree and Bayesian analysis), indicate an introduction of this genetic component during post Last Glacial Maximum (<14000 YBP) to early Holocene period (>8000 YBP) (Fig. 1), which may coincide with dispersal of language and adoption of agriculture34. Thus, the diversity observed could have arisen as a result of large scale female migration to the region in post-settlement times, but could also be differentiation of many of the lineages in the region, in-situ, since Pleistocene settlement. In order to make this differentiation, we need further and more extensive study of the region, including possible ancient DNA contributions.

In the present study, various new HGs belonging to macro haplogroup N were observed. A new sub-haplogroup was added to R0a2: the R0a2o, defined by variant T11152C and two sub-branches (Supplementary Data 2). R0 is a West Eurasian haplogroup35, and has also been reported in population of Iran36. Further, one sub-branch each was added to HGs, W4 and W6, and two sub-braches to W3. Additionally, a new sub-haplogroup, not reported earlier, was defined as W6e with basal variant G143A and further differentiation in the region with variants G7521A, T16189C, C16355T and T16362C (Supplementary data 2). The W haplogroup has been reported to be the one of the dominant groups in Iran, Pakistan10; and is also widely distributed with low frequencies over European continent, the Near East and West Asia37. Similarly haplogroup H, commonly present in Caucasus and widely spread within Europe with a rare presence in India38, was observed in the present study. Further, one sub-branch was added to each HG H2a and H6a1b: and two sub-branches to each sub-haplogropu H2b and H41a. Also, one sub-branch was added to each, I4b and T2g1 (I4b and T2 are present in Europe, the Near East and the Caucasia39,40). Another novel sub-haplogroup of K2a5: K2a5c, defined by A15799G at basal position with further differentiation possessing variants C9356T and A3397G C7241T was reported for the first time in India. HG K2a5 has predominantly been documented in West Eurasians41. European gene pool has been affected by major gene flow from steppe region of North of the Caucasus and has resulted in a shift in the genetic profile of Europeans during Bronze Age42,43.

It has also been suggested that groups from the steppe regions of Eurasia were the source population of Indo-European languages to Europe42 and also likely to South Asia44. This linguistic spread could explain the presence of these European lineage in South Asia. The other plausible migration of maternal European gene pool to this region could have been from the Southern coast of Iran, Afghanistan; and present Pakistan through Baluchistan, acting as conduits for human dispersals36,45. These geographically adjacent regions were connected with the State of Jammu and Kashmir and were also regions that saw similar historical gene flow from Europe as India did. therefore, the extent of diversity, we propose, may either indicate differentiation of various population groups in the neighboring region36 or be the remnants of natural diversity associated with various migrations of high magnitude through various invaders, like Alexander the Great, the Persian Empire, Arabs, Turks, Syrians and Afghans. Thus, the historical migrants too could have introduced the genetic component of West and Central Asia to this region12 which interestingly was not exclusively that of males.

In addition, many Eastern Asian haplogroups were also observed. Ancient branches of HG C(C4a1) present in Eastern Asia, may have been introduced to J&K with movement south from central and Eastern Asia with the concurrent expansions in Northern Asia46. HG A one of the major haplogroup in Tibetans47 and a common haplogroup in Northern and Eastern Asia48, was found in the form of one new sub-haplogroup, A21a, defined by C12603T and T16092C variants. The two HGs A and C have also been reported in Tibet with a high frequency among Sherpa population47. Likewise, HG F1c, which is predominantly present in the Tibetans49 and Chinese populations50 was observed in J&K in sub-branches as F1c1a2a, defined by variants A234G and G1927A. Within haplogroup D4 seen in Han Chinese populations51, we added sub-branches to HG D4g2a1c and D4q. The presence of these lineages representing North East Asian gene pool in J&K may have been contributed through the Silk Route when people migrated in and out of the region.

Affinity with the Global Populations

In order to visualize the maternal closeness of J&K population with various population of the world, genetic distances amongst different populations were estimated as Fst values52. In addition to J&K samples from present study samples from population sets, Africa(AFR)53, Caucasian(CAU)54, China(CH)50, Europe(EUR)53, India(IN1)31, India(IN2)55, Japan(JAP)56, assorted Jew(JEW)57, Onge(ONG)58, Pakistan(PAK)32, Papua New Guinea(PNG)59 and Native Americans(AME)53 were included for the analysis. Fst values were further used to generate Multi Dimensional Scaling (MDS) plot (Supplementary Fig. 1). First plot was generated by taking population of JK as whole. It was observed that JK population set clusters in between Asian and European population sets (Supplementary Fig. 1a). To have a better understanding of genetic affinity of samples bearing known Indian and non-Indian haplogroups found in J&K population, samples were divided into two sets JK1 and JK2. JK1 set constituted the HGs known in literature majorly as Indian population specific haplogroups (M, U2 & U7). JK2 set was comprised of all other HGs (A, C, D, F, H, I, K, R, T, U1, U5, W), found in the study. Interestingly, JK2 clustered with European and Caucasian population set (Supplementary Fig. 1b).

Maternal population expansion in the region

The initial settlement of humans in South Asia is reported to have occurred between 40,000–70,000 years before present (YBP)60, whereas the initial maternal colonization in Indian subcontinent is reported to have occurred around 40,000–45,000 YBP61. We found various lineages of macrohaplogroup M and N, clearly indicating high diversity and maternal gene flow to and from the region as discussed and also indicated by large negative values of Fu’s Fs statistics and the highly significant values of Tajima’s D (Supplementary Table 1). This substantiated the pattern of a maternal population expansion in Jammu and Kashmir population. Bayesian Skyline Analysis (BSA) and phylogenetic tree branching patterns indicated a gradual population growth over 35,000 YBP but an expansion episode can be detected around 15,000–8000 YBP, which could be attributed to an expansion after the Last Glacial Maximum (Fig. 1) that could be insitu differentiation or associated with agriculture and language dispersal also indicated by some of the archeological evidences from the region.

Coalescence Age estimates of major Haplogroups

We tried to estimate the coalescent ages (expressed as years before present, YBP) of major haplogroups found in our sample set using BEAST software62. The most frequent haplogroups of macrohaplogroup M in the present study were M3a, M5, M30, and M65; whereas, of macrohaplogroup N were U, U2, U7, W and H (Supplementary Table 2). Most of these coalescence time periods in the present study were in agreement with the estimates reported in literature. The coalescence time period of M3a has been reported as 16400 YBP22, and observed as 16838.1 YBP with 95% HPD of (12654.3–22671.5). Similarly, M5a has been estimated with an age of 23100 YBP22, whereas we observed 15453.4 YBP with 95% HPD of (11878.9–20377.2). In case of M30, reported to have originated 15400 YBP22, showed in our estimates its age as 13663.9 YBP with 95% HPD of (10237.1–18628.2), in the region. The coalescence age of M65 which has been reported as 20600 YBP34, in our study has been observed as 18115.9 YBP with 95% HPD of (13786.7–24394.0).

One of the major haplogroups, haplogroup U and some of its lineages which have been reported to be differentiated in the Indian subcontinent21,63,64, showed its coalescence age of 33568.2 YBP with 95% HPD of (29494.3–39721.8) in J&K. The mean age of HG U has been reported as 46531.141 and the coalescence age of U2 has been reported as 42805.741 YBP. The age for the latter when calculated in the region was 28503.0 YBP with 95% HPD of (24666.1–33724.1). Further, HG U7 which has been known to be differentiated in south Asia with the age of 15600 YBP33, showed the coalescence age estimate of 13060.5 with 95% HPD of (10014.2–17556.7), in the present study. In contrast, the absolute coalescence age estimates of European haplogroups in J&K were not consistent with those in Europe, though 95% HPD values overlapped; West Eurasian haplogroup W has been reported in Europe with the age of 18400 YBP37, we observed an age of 15033.1 YBP with 95%HPD of (11539.9–20022.5). While as, Haplogroup H which has been estimated to originate around (12846.0) YBP41, but our estimates show it around 15747.3 YBP in J&K with 95% HPD of (11917.5–21518.1). Thus, the diversity and deep in time coalescence ages of known European HGs in the region highlight the importance of the region for exploring its plausibility as differentiation ground of many maternal lineages that might have migrated to Europe from the region post glacial maximum. Overall, this age estimation has provided an overview of the expansion of mtDNA haplogroups in the region, indicating population expansion in the J&K region at different time intervals, most likely with post-settlement gene flow east from western Eurasia to India.