I am often asked by people online to give an “elevator pitch” as to the genetic history of the Indian subcontinent. At this point we’ve got ~90 percent of the story I think. Modern humans arrived in the Indian subcontinent ~50,000 years ago, and pushed onward to East Asia, but over the past ~10,000 years massive changes have occurred genetically due to the intrusion of populations form the northwest and northeast, with likely total cultural turnover. What do I mean by this? First, it’s highly probable that all of the extant language families of the Indian subcontinent are rooted in lineages which were present outside of the Indian subcontinent before the Holocene. In other words, during the Ice Age the ancestral linguistic entities which gave rise to Indo-European, Dravidian, and Austro-Asiatic, were present outside of confines of India, Pakistan, Bangladesh, Sri Lanka, Nepal, and Bhutan. The only exception here are the languages of the indigenous peoples of the Andaman Islanders.*

Older historical works on South Asia often have a preface which suggests that the Austro-Asiatic Munda languages, and those of the Dravidians, were deeply indigenous to the region, to be marginalized in the north and west of the subcontinent by Indo-Aryan dialects which arrived relatively recently. This strikes me as likely wrong in terms of broad brush impressions. I now believe that Peter Bellwood was probably correct to argue in First Farmers that the arrival of Dravidian languages to the subcontinent was mediated through the arrival of agriculturalists, and perhaps may not have predated the Indo-Aryans by very much time at all in most of the subcontinent. I am even more confident that the Munda people are descended from a group with relatively recent origins on Southeast Asia, approximately contemporaneous with, though likely marginally preceding, the arrival of Indo-Aryans. What you see in South Asia today when it comes to linguistic-cultural agglomerations is the jostling of groups whose origins are all exogenous and date to the post-Neolithic period. Though the Pleistocene genetic heritage of South Asia persists to a great extent, as culturally coherent units I doubt there is much of the Pleistocene left in the region (with the exception again of the Andaman Islands).

Let’s talk about the Munda people first. Most of South Asian social-demographic analysis focuses on a divide between two disparate elements. Culturally, Indo-Aryan vs. Dravidian. Religiously, Hindu vs. Muslim. Genetically, Ancestral North Indian (ANI) vs. Ancestral South Indian (ASI). These dyads are useful analytically, but they elide the more richly textured diversity of the subcontinent (in the case of Muslim vs. Hindu, neither groups, especially the “Hindu” category, are very homogeneous). According to a new paper, A late Neolithic expansion of Y chromosomal haplogroup O2a1-M95 from east to west, as much as ~15% of the Y chromosomal lineages of South Asia may be attributed to these populations. This group uses quite old-fashioned methods. That is, they’re about 10-15 years old, an eon in modern genetics! Basically the focus is on fast evolving microsatellite lineages, and the patterns of variation thereof. But, the power of the paper is the massive data set, which has strong representation of many populations. By looking at thousands of individuals from some regions they were able to observe patterns with a very high degree of confidence as to their representativeness of a given group.

The following table illustrates what I’m talking about:

The cultural-historical debate is whether the ￼Austro-Asiatic languages are indigenous to South Asia or not. The balance of the evidence now seems to be that they are not. What likely occurred is that the Austro-Asiatic languages waxed with the rise of an agricultural Diaspora, whose locus of origin was in what is today the southern regions of China proper. More precisely, the Austro-Asiatic languages may have spread with rice farming across Southeast Asia and eastern South Asia. Likely they were the first on the scene in Southeast Asia, as Bellwood reports in First Farmers and First Migrants that archaeology and anthropometrics can detect admixture between the farmers arriving from the north and native hunter-gatherers in places like the Red river valley in northern Vietnam ~4,000 years ago. The frequency of O2a1-M95 for regions and populations is subdivided very precisely in the above paper, and it is clear that in island Southeast Asia its proportions match those in an earlier paper on autosomal inferences of Austro-Asiatic ancestry. Populations in eastern Indonesia and in the Philippines have minimal numbers of males carrying lineages of O2a1-M95, while the densely populated island land of Java has frequencies of ~50%.

The clincher for why O2a1-M95, and therefore Austro-Asiatic populations, are likely exogenous to India genetically would be the genetic diversity of the lineages. In short, there is tentative information from the variation on the microsatellites that the coalescence of the diverse lineages in Laos are the deepest by a few thousand years. But there was another paper from a few years back which makes my confidence in these results higher￼, Population Genetic Structure in Indian Austroasiatic speakers: The Role of Landscape Barriers and Sex-specific Admixture, which presented autosomal data which was very persuasive to me. In particular, the derived variation of EDAR which is present in very high frequencies among Northeast Asians and Amerindian populations, is present at about ~5% frequency among Munda groups. Among Dravidian populations in South India according to the 1000 Genomes Browser the frequency is less than 1%, while it is absent among populations in Northwest India, aside from those with clear East Asian admixture.

Next we address the issue of the Dravidian languages. A new paper in Human Genetics, West Eurasian mtDNA lineages in India: an insight into the spread of the Dravidian language and the origins of the caste system, points to an association between particular mtDNA lineages in South India and southern Iran, in particular the region which was once inhabited by the Elamites, who have been posited to have an association with the Dravidian languages. I don’t put particular stock in the philological association between Dravidian langauges today and Elamite; I can’t judge it with any degree of certainty or competency. But the genetic data is certainly suggestive. Here’s the portion which is relevant:

The autochthonous subhaplogroups—HV14a1 and U1a1a4 uniquely found in contemporary Dravidian speakers share their ancestry primarily with the Near East-Iran populations (Derenko et al. 2013). The coalescence times of HV14a1 and U1a1a4 were estimated to be ~10.5–17.9 kya. The shared ancestry of the Dravidian of South India and Iranian of Near East populations has been shown in the HV14 and U1a1 phylogeny (Fig. 1a) and their time estimates are consistent with the proto-Elamo-Dravidian language diffusion. hypothesis which emphasized that the proto-Dravidian language evolved over 15 kya, specifically in western Asia before the beginning of agricultural development ~11 kya. This language was introduced by Neolithic pastoralists, and was thought to be associated with the spread of these west Eurasian-specific mtDNAs to peninsular India (Pagel et al. 2013). The Y-chromosome haplogroup L1a has added a further dimension to this hypothesis. The subclades of haplogroup L such as L1a, L1b, and L1c were found predominantly in Iranian populations of western Asia (Grugni et al. 2012). In India, only the L1a lineage was observed and was largely restricted to the Dravidian-speaking populations of south India (Sahoo et al. 2006; Sengupta et al. 2006). The coalescence time (~9.1 kya) (Sengupta et al. 2006) and the virtual absence in Indo-Aryan speakers in north indicate that the L1a lineage arrived from western Asia during the Neolithic period and perhaps was associated with the spread of the Dravidian language to India

There has long been a presumption to assume that the Dravidian languages are primal to South Asia. But that was before modern genomics revolutionized our understanding of Indian genetic history. More or less all South Asian populations are a fusion between a deeply indigenous strain which distant affinities to the peoples of eastern Eurasia (ASI), and a group very close to the ones typically found in Western Eurasia (ANI). There are no pure indigenes. South Indian tribal populations, who are presumed to be the closest to indigenous groups are at least ~25% ANI, if not more. To presume that the Dravidian languages are indigenous to South Asia one would have to assume that this exogenous element was absorbed by the cultural substrate, something I find implausible on cross-cultural grounds (more dominant South Asian social elites, even ones of pure Dravidian extraction, such as the Reddy group, have higher fractions of ANI). Additionally, Dravidian languages themselves are not particularly variegated, as one might expect if there was deep local structure, as is the case in inland Papua and pre-Columbian America.

Of course the title of this post has to do with males, so with that, let’s look back to a paper which was first posted on the web last year (though finally “published” this March), The phylogenetic and geographic structure of Y-chromosome haplogroup R1a. Here’s the important part:

…Using the 8 R1a lineages, with an average length of 48 SNPs accumulated since the common ancestor, we estimate the splintering of R1a-M417 to have occurred rather recently, ~5800 years ago (95% CI: 4800–6800). The slowest mutation rate estimate would inflate these time estimates by one-third, and the fastest would deflate them by 17%. With reference to Figure 1, all fully sequenced R1a individuals share SNPs from M420 to M417. Below branch 23 in Figure 5, we see a split between Europeans, defined by Z282 (branch 22), and Asians, defined by Z93 and M746 (branch 19; Z95, which was used in the population survey, would also map to branch 19, but it falls just outside an inclusion boundary for the sequencing data4). Star-like branching near the root of the Asian subtree suggests rapid growth and dispersal. The four subhaplogroups of Z93 (branches 9-M582, 10-M560, 12-Z2125, and 17-M780, L657) constitute a multifurcation unresolved by 10 Mb of sequencing; it is likely that no further resolution of this part of the tree will be possible with current technology. Similarly, the shared European branch has just three SNPs.

The authors emphasize that the TMRCA has a wide confidence interval. I don’t think so. There’s now a fair amount of work on sequencing R1b and R1a lineages which are very common across Eurasia, and one thing is clear: they’re star-shaped phylogenies which are likely reflecting massive population expansions relatively recently (see A recent bottleneck of Y chromosome diversity coincides with a global change in culture). Additionally, they note that the “Asian” (which includes South, Central, Southwest Asia) and the European branches of R1a1a are relatively well separated, and, the greatest diversity of R1a1a can be found in Iran.

I doubt that R1a1a was associated with one ethno-linguistic group at the end of the last Ice Age. It is present at relatively high frequencies in low caste and tribal populations in South India, so I am skeptical of an exclusive association with Indo-Europeans, though in Europe it may actually be that it arrived only with Indo-Europeans. But, the fact that R1a1a is so common all across Eurasia points to a genetic-cultural revolution. Just as Haplogroup O2a1 is almost certainly rooted in populations outside of South Asia before the Holocene, so is the case with R1a1a. They came with groups of men who brought a new dominant lifestyle. From the west came wheat and cattle. From the east, rice.

The latest research suggests about half the ancestry of modern South Asians dates to the Pleistocene. That is, it predates 10,000 BC. The majority of the mtDNA lineages are from this ancestral element. But culturally this group likely had minimal influence. One question which comes to mind is whether the ASI ancestry is from many groups, or, from only a few which were assimilated into an expanding group of agriculturalists. If the former, then one expects that the ASI ancestral segments which exhibit a tendency toward regional structure. I suspect thought that this is not the case, that the genetic landscape of modern India is characterized by overlapping populations which are all hybrids of different regional groups which only recently expanded. The pattern of Munda groups in South Asia, surrounded by Dravidian and Indo-European speaking groups, points one to the possibility that these groups were pioneers of some sort, but eventually lost.

* Language isolates like Kusunda and Nihali may date to the era before the Holocene, but without relatives we can’t really make a good guess. Possible relationships of Kusunda to Andaman or Papuan languages strike me as implausible due to the time depth of separation.