With genetic data currently available, it is difficult to deduce the direction of migration either into India or out of India during the Bronze Age

On June 17, The Hindu published an article by Tony Joseph (“How genetics is settling the Aryan migration debate”) on current genetic research in India and stated that “scientists are converging” on the Aryan migration to the Subcontinent around 2000-1500 BC. This conclusion was mainly based on the results obtained from the paternally inherited markers (Y chromosome), published on March 23, 2017 in a scientific journal, BMC Evolutionary Biology, by a team of 16 co-authors including Martin P. Richards of the University of Huddersfield, which compiled and analysed Y chromosome data mainly from the targeted South Asian populations living in the U.K. and U.S. However, anyone who understands the complexity of Indian population will appreciate that Indians living outside the Subcontinent do not reflect the full diversity of India, as the majority of them are from caste populations with limited subset of regions.

Under-representation

A recent paper by Dhriti Sengupta and colleagues (‘Genome Biology and Evolution 2016’; 8:3460-3470), showed that the South Asian populations included in the “1000 Genomes Project” under-represent the genomic diversity of the Subcontinent. Tribes are one of the founding populations of India, any conclusion drawn without studying them will fail to capture the complete genetic information of the Subcontinent.

Marina Silva/Richards et al. argued that the maternal ancestry (mtDNA) of the Subcontinent is largely indigenous, whereas 17.5% of the paternal ancestry (Y chromosome) is associated with the haplogroup R1a, an indication of the arrival of Bronze Age Indo-European speakers. However, India is a nation of close to 4,700 ethnic populations, including socially stratified communities, many of which have maintained endogamy (marrying within the community) for thousands of years, and these have been hardly sampled in the Y chromosome analysis led by Silva et al., and so do not provide an accurate characterisation of the R1a frequencies in India (several tribal populations carry substantial frequency of haplogroup R1a).

Equally important to understand is that the Y chromosome phylogeny suffered genetic drift (lineage loss), and thus there is a greater chance to lose less frequent R1a branches, if one concentrates only on specific populations, keeping in mind the high level of endogamy of the Subcontinent. These are extremely important factors one should consider before making any strong conclusions related to Indian populations. The statement made by Silva et al. that 17.5% of Indians carry R1a haplogroup actually means that 17.5% of the samples analysed by them (those who live in U.K. and U.S.) carry R1a, not that 17.5% of Indians carry R1a!

Genetic affinities

Indian genetic affinity with Europeans is not new information. In a study published in Nature (2009; 461:489-494), scientists from CSIR-Centre for Cellular and Molecular Biology (CCMB), Hyderabad, and Harvard Medical School (HMS), U.S., using more than 5,00,000 autosomal genetic markers, showed that the Ancestral North Indians (ANI) share genetic affinities with Europeans, Caucasians and West Asians. However, there is a huge difference between this study and the study published by Silva et al., as the study by CSIR-CCMB and HMS included samples representing all the social and linguistic groups of India. It was evident from the same Nature paper that when the Gujarati Indians in Houston (GIH) were analysed for genetic affinities with different ethnic populations of India, it was found that the GIH have formed two clusters in Principal Component Analysis (PCA), one with Indian populations, another an independent cluster. Similarly, a recent study (‘Neurology Genetics’, 2017; 3:3, e149) by Robert D.S. Pitceathly and colleagues from University College of London and CSIR-CCMB has analysed 74 patients with neuromuscular diseases (of mitochondrial origin) living in the U.K. and found a mutation in RNASEH1 gene in three families of Indian origin. However, this mutation was absent in Indian patients with neuromuscular diseases (of mitochondrial origin). This mutation was earlier reported in Europeans, suggesting that these three families might have mixed with the local Europeans; highlighting the importance of the source of samples. Another study published in The American Journal of Human Genetics (2011; 89:731-744) by Mait Metspalu and colleagues, where CSIR-CCMB was also involved, analysed 142 samples from 30 ethnic groups and mentioned that “Modeling of the observed haplotype diversities suggests that both Indian ancestry components (ANI and ASI) are older than the purported Indo-Aryan invasion 3,500 YBP (years before present). As well as, consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians”.

We agree that the major Indian R1a1 branch, i.e. L657, is not more than 5,000 years old. However, the phylogenetic structure of this branch cannot be considered as a derivative of either Europeans or Central Asians. The split with the European is around 6,000 years and thereafter the Asian branch (Z93) gave rise to the South Asian L657, which is a brother branch of lineages present in West Asia, Europe and Central Asia. Such kind of expansion, universally associated with most of the Y chromosome lineages of the world, as shown in 2015 by Monika Karmin et al., was most likely due to dramatic decline in genetic diversity in male lineages four to eight thousand years ago (Genome Research, 2015; 4:459-66). Moreover, there is evidence which is consistent with the early presence of several R1a branches in India (our unpublished data).

The Aryan invasion/migration has been an intense topic of discussion for long periods. However, one has to understand the complexity of the Indian populations and to select samples carefully for analysis. Otherwise, the findings could be biased and confusing.

With the information currently available, it is difficult to deduce the direction of haplogroup R1a migration either into India or out of India, although the genetic data certainly show that there was migration between the regions. Currently, CSIR-CCMB and Harvard Medical School are investigating a larger number of samples, which will hopefully throw more light on this debate.

Tony Joseph responds: There is a technical point in suggesting that the South Asian populations included in the “1000 Genomes Project” under-represent the complete genomic diversity of the Subcontinent and, therefore, the 17.5 % R1a frequency the ‘BMC Evolutionary Biology’ study arrived at may not be precise. That a sample under-represents the complete genomic diversity of India could be said of virtually any study whatsoever, including the studies that the authors of the rejoinder have done. The point about the Marina Silva/Martin P. Richards et al. study is that its conclusions about the chronology of multiple migrations into South Asia are not dependent upon the precise percentage of R1a population — they remain robust whether the R1a percentage is 12.5 % or 17.5% or 22.5 %. The precision of the percentage or the impugned under-representation would have been an issue if the study were to make detailed conclusions about, say, how the Bronze Age migrations spread across different regions in India. Since it is not doing that, under-representation ceases to be a material issue. In an email to me on May 29, weeks before my article was published, this is what Prof. Richards said about the sample: “It’s true that some of the 1000 Genomes Project (1KGP) sequences that we analysed for genome-wide and Y-chromosome data were sampled from Indians in the U.K. and U.S., and lack tribal groups, which might well be an issue for a detailed regional study of the subcontinent (our mtDNA database was much larger). But we are simply looking at the big picture across the region (what was the role of Palaeolithic, Neolithic and Bronze Age settlement, primarily) and the signals we describe across the five 1KGP sample sets are clear and consistent and also fit well with the lower-resolution data that has been collected in the past (e.g. for R1a distributions). By putting everything together, we feel the sketch of the big picture that we propose is very well supported, even though there will certainly be a huge amount of further analysis needed to work through the regional details.” The second argument that the rejoinder makes, as summed up in its last paragraph, is that ‘Out of India’ is a possible explanation for the genetic spread that we observe. This is helpful insofar as it accepts that the genetic spread that we observe does need an explanation. But the problem with proposing ‘Out of India’ as that explanation is the following: it is not as if the ‘Out of India’ hypothesis is new; it has been around for decades. But the rejoinder makes no reference to a single peer-reviewed genetic study that makes a serious case for ‘Out of India’. If the hypothesis were tenable at all, shouldn’t there have been many peer-reviewed papers by now making the case and fleshing out the details?

K. Thangaraj is with the CSIR-Centre for Cellular and Molecular Biology, Hyderabad, and G. Chaubey is with the Estonian Biocentre in Tartu, Estonia

Tony Joseph is a writer and former editor of ‘BusinessWorld’. Twitter: @tjoseph0010