Genetics is about to answer a question that has vexed historians for a century. The author examines the range of possible answers and their implications

Who built the Indus Valley civilisation? There are few questions more fundamental to our understanding of Indian history than this. On the answer to it hang many details of the country’s past: How did we come to be as we are — culturally, ethnically and linguistically? And what explains the way we are spread out geographically in the subcontinent?

Although this question has always been asked, the correct answer to it has proved resistant to the wiles and charms of historians, archaeologists, linguists and philologists for nearly a century, ever since Harappa and Mohenjo-daro were discovered in the 1920s. The fault doesn’t lie with the remarkable men and women who built what was easily the largest civilisation of their time. They had a script of their own and they left behind enough of their writings on tablets, seals, tools, pottery, and ornaments. But we haven’t managed to decipher it yet. So we don’t know who they were, where they came from, what language they spoke, what kind of social organisation or rulers they had, or even what their names were. Perhaps the only thing we know about their identity is that their trading partners in West Asia referred to their land as “Meluhha.”

Also Read How Indus Valley Civilisation coped with climate change

There has been no shortage of informed and uninformed guesses about their identity, though. Some have argued that they were speakers of a proto-Dravidian language (the predecessor of Tamil, Telugu, Kannada, Malayalam, and others). Some say that they were the ancestors of today’s Munda tribals. And some believe that they were Vedic Aryans who spoke Sanskrit or proto-Sanskrit (a language that belongs to the Indo-European family which includes English, Persian, German, Italian, and so on). But nobody has ever had enough evidence to settle the issue one way or the other, and so the debate has raged on endlessly and often bitterly.

Meanwhile, our understanding of our own history and heritage has remained fragmentary and contradictory, with a shadow of doubt and resentment settling over the whole issue.

A project with a difference

All this could now change thanks to the science of genetics and four ancient skeletons excavated from a village called Rakhigarhi in Haryana. The four people to whom these bones once belonged — a couple, a boy and a man — lived roughly 4,600 years ago when the Indus Valley civilisation was in full bloom. Rakhigarhi is today a quiet little place amidst lush green fields in Hissar district, about a seven-hour drive from New Delhi, but when those four people were alive, it would have been counted as among the biggest cities of the Indus Valley civilisation.

The site was excavated and the skeletons were recovered in the beginning of 2014 by a team of archaeologists led by Vasant Shinde, Vice Chancellor of Deccan College, Pune. For the 61-year-old Shinde, this project is the culmination of a long and distinguished career in archaeology that has seen him lead excavations at important Harappan and other sites across the country. But Rakhigarhi is a project with a difference.

In the three-and-a-half years since its excavation, Professor Shinde has brought together scientists from Indian and international institutions like the Centre for Cellular and Molecular Biology, Hyderabad (CCMB), Harvard Medical School, Seoul National University, and the University of Cambridge to work on different parts of the project, including extracting and analysing DNA from these ancient people, reconstructing their faces, and studying the remains of their habitation to understand their daily habits and ways of life.

The DNA analysis will also help figure out their height, body features, and even the colour of their eyes. In other words, we will know, rather intimately, and with a fair degree of certainty, who lived in the Indus Valley city of Rakhigarhi. It is in that sense that the Rakhigarhi ancient DNA project is unlike any other archaeological excavation that has been done in India. As Professor Shinde says: “We may have excavated a lot of burials, structures, pottery and seals. But what is new in that?”

The last time Professor Shinde tried to take ancient DNA from an Indus Valley site was when he led an excavation at Farmana in 2007-2010. Farmana is also in Haryana, about 100 km away from Rakhigarhi, and the team excavated probably the largest Harappan burial site, with more than 70 burials. But that didn’t turn out well.

“One of our aims was to understand the Harappan population and we wanted to get DNA for that. So we excavated the burials and tried to extract the DNA but we failed miserably. We even got some scientists from Japan. They also failed, even though they had used some advanced techniques. Then we realised that our method was wrong, in the sense we had kept the burial site open for too long — one-and-a-half to two months, so that people could see that we were not there to dig out treasures. This is a different kind of treasure for us. A lot of people came, and contamination also happened. Then big rains came and everything got flooded. So we realised that we had done it wrongly.”

What Professor Shinde and his team learned was that once the skeletons are excavated, they should be documented and packed for analysis immediately. So that is what they decided to do in Rakhigarhi, where they started excavating for skeletons. But the problems didn’t end there, as the box below explains. The efforts of geneticist Niraj Rai, now with the Birbal Sahni Institute of Paleosciences in Lucknow, and earlier with the CCMB, were critical to the attempt to decode the ancient DNA.

The research is expected to be published in a leading international journal in a month or so, and is awaited by the scientific community around the world with a kind of anticipation that is rarely witnessed. We do not yet know what it will say. But here’s a look at a few scenarios, some likely and some quite unlikely, and how differently each could impact our understanding of our history. The reality could be more complex than each of the scenarios depicted here, but they are a good starting point.

Scenario 1: The Harappans as Vedic Aryans

In the ancient DNA from Rakhigarhi, scientists identify R1a, one of the hundreds of Y-DNA haplogroups (or male lineages that are passed on from fathers to sons). They also identify H2b — one of the hundreds of mt-DNA haplogroups (or female lineages that are passed on from mothers to daughters) — that has often been found in proximity to R1a.

There is no reason whatsoever to think that this would be the research finding, but if it is, it would cause a global convulsion in the fields of population genetics, history and linguistics. It would also cause great cheer among the advocates of the theory, which says that the Indus Valley civilisation was Vedic Aryan.

The global churning would be caused by the fact that such a finding would go against the current understanding of the spread of Indo-European languages across Eurasia and also against current genetic evidence. R1a is the haplogroup most closely associated with Indo-European language speakers in a vast swathe of the Eurasian landmass, ranging from Ireland and the U.K. to Italy, France, Germany, Poland, Russia, Iran and northern India. In the majority of European countries, especially in central and eastern Europe, R1a has a frequency of 40-60%. In India, it has a frequency of about 17.5% — it is most common among north Indian Brahmins and least common among the tribals and the northeastern populations.

The best globally accepted theory of Indo-European language spread is that proto-Indo-European, or the ancient language from which all other Indo-European languages arose, was spoken in or near the Pontic Steppes in Central Asia by horse riding, chariot-driving pastoralists who had also acquired mastery over bronze technology. With the advantage that these new practices and technology conferred on them, they began spreading out to Europe around 3,000 BC and to South Asia around 2,000 BC, carrying their language and culture with them. Ancient DNA findings from Central Asia in recent times have given a significant boost to this theory. For example, R1a-Z93, a sub-haplogroup of R1a that was found in ancient DNA from the Srubnya and Andronovo cultures in Central Asia, matches exactly the R1a-Z93 sub-haplogroup that is commonly found today in India.

So, if the geneticists do find R1a among the ancient residents of Rakhigarhi (and also the mt-DNA haplogroup H2b), it will deal a blow to the currently accepted chronology of migrations from the Steppes. It would mean that R1a-carrying Indo-European language speakers were already well present in India around 2,600 BC when the Indus Valley civilisation was flourishing. For the advocates of the Vedic Aryans-as-Harappans theory, the finding would be the long-awaited confirmation of what they have always asserted without proof: The Indus Valley civilisation was Vedic, and the Aryans were those who built it.

Their opponents could still argue that the presence of R1a in Rakhigarhi might only be representative of a small, early band of newly arrived Aryans who may have merged with the local population and been buried in what must have been to them an alien civilisation. No matter how this debate proceeds, there is no question that finding R1a in the Rakhigarhi samples would give a significant boost to the theory that the Indus Valley civilisation was Vedic Aryan.

Scenario 2: The Harappans as West Asian migrants who may have brought the Dravidian languages to India

Scientists discover Y-DNA haplogroups J2 and L1a among the Rakhigarhi residents, along with mt-DNA haplogroups such as HV, K1 and T1. All these haplogroups are often associated with the origins and spread of agriculture and urbanisation in the earliest cradle of human civilisation, the Fertile Crescent in West Asia. This is a crescent-shaped region that would cover parts of today’s Iraq, Iran, Syria, Egypt, Turkey and Israel, among others. A finding such as this would not cause a major upset to any existing understanding of Asian history, but it would disprove the theory that the Indus civilisation was built by Vedic Aryans. This is because the non-discovery of R1a and the discovery of haplogroups with West Asian affinities would suggest that when the Indus Valley civilisation was thriving, Indo-European language speakers were not present on location.

Who would be cheered to hear such a result, though? That is hard to answer. One could say that the advocates of Dravidian language speakers would be, because there are grounds to think that the migrants from West Asia may have spoken a language or languages closely related to today’s Tamil, Telugu, Kannada, and so on. But the difficulty is that for the Dravidian partisans to cheer this result, they would also have to accept the fact that they too were migrants to India, much like the Sanskrit or proto-Sanskrit-speaking Aryans who arrived many millennia later.

What are the grounds for theorising that the West Asian migrants might have brought Dravidian languages with them? There are two. One, the presence of the Dravidian language, Brahui, in Pakistan’s Balochistan region. This fact leaves a tantalising possibility that there was a time when a Dravidian-related language was spoken widely in this region, before the Aryan migrations forced a language shift. And two, linguistic research findings that show similarities between Dravidian languages and Elamite, the extinct language of Elam, an ancient civilisation with its capital at Susa, an area that would fall in the southwest region of Iran today. These grounds are yet to be fully proved — there is, for instance, an argument that Brahui speakers are comparatively recent migrants from Central India to Baluchistan. But if we assume that these grounds are valid, the question that arises would be: Why did Dravidian languages survive and thrive in south India, while getting almost wiped out in the north?

The argument could be that when the Indus Valley civilisation collapsed around 2,000 BC (for various reasons that may include climate change), there were large-scale movements of population, some of which may have been to the south where their language and culture took hold and flourished, perhaps replacing other languages that were being spoken there by the early descendants of the original Out of Africa (OOA) migrants who settled in India around 60,000 BC.

On the other hand, the Dravidian language-speaking Indus Valley people who remained in the north were probably overwhelmed by the later migrations there of Indo-European language speakers, resulting in a language shift towards the latter. The fact that one of the latest sites of the Indus Valley Civilisation to be discovered was in Maharashtra in a deserted village called Daimabad in Ahmednagar district provides some support to the theory of a migration of Indus Valley citizens southwards after the collapse of the civilisation in the Indus Valley itself. The late Harappan culture at the Daimabad site is dated to 2,300-1,800 BC.

But this, of course, is not the only possible explanation. It is conceivable that the West Asians who migrated to the Indus Valley spoke a language or languages closer to, say, Sumerian or Akkadian rather than proto-Dravidian, and that these languages went extinct possibly under the impact of later migrations.

Scenario 3: The original settlers of India as Harappans

Scientists discover Y-DNA haplogroup H and mt-DNA haplogroups M2 and M36 in the Rakhigrahi ancient DNA. All these haplogroups are indisputably autochthonous, or indigenous. In other words, they are descendant lineages of the original OOA migrants. These lineages are spread far and wide across India today, though they vary significantly in their distribution. Female mt-DNA haplogroups that are descended from the OOA migrants dominate the Indian population with a frequency of 70-80% today, while Y-DNA lineages of the same descent are present at a far lower percentage, of around 10-40%, depending on the population group. This asymmetry is not necessarily surprising — male lineages die out and get replaced at a faster rate than female lineages because of the male-biased nature of human conflicts and wars, at least from the Neolithic period onwards.

Considering the widespread presence of OOA lineages among the Indian population even today, it would not be surprising if the scientists were to find H and M lineages in ancient Rakhigarhi. It would suggest that agriculture and urban civilisation in the Indus Valley were, by and large, an indigenous development, not necessarily driven by large-scale migrations from anywhere else.

Scenario 4: The Mundas in the Indus Valley

Scientists discover Y-DNA haplogroup O2a and mt-DNA haplogroup M4a in the Rakhigarhi ancient DNA. These haplogroups are associated with the speakers of Austro-Asiatic languages such as Mundari, Santali and Khasi. These haplogroups and related languages are also present in Southeast Asia. In India, speakers of these languages are currently found mostly in Central and East India. Even though a prominent philologist of Harvard University, Michael Witzel, has argued the case for a language close to Munda (which he calls Para-Munda) being one of the languages of the erstwhile Indus Valley, a finding of this nature will come as a surprise to most others. One inconsistency is that the agricultural backbone of the Indus Valley civilisation was barley and wheat, while the Austro-Asiatic language family is spread across regions (such as Southeast Asia) where rice is the most important cultivar. In fact, one theory is that rice cultivation was pioneered in China from where it spread to Southeast Asia and was brought to India by the Austro-Asiatic language speakers. So, if the geneticists do find haplogroups O and M4a in Rakhigarhi, some of our current understanding of Indian history may have to be revised.

With this, both the likely and unlikely scenarios of what the scientists could find stand broadly covered. Theoretically, one can add one more possibility – the discovery of Y-DNA haplogroups O3e and mt-DNA haplogroup M33a, both of which are associated with speakers of Tibeto-Burman languages in eastern India such as the Garo, Naga and Tani. But this can be ruled out practically because current consensus is that Tibeto-Burman speakers arrived in India much later and, therefore, they couldn’t have been in the Indus Valley in 2600 BC.

Whatever the scientists find, there is a possibility that the results would lead to much recrimination, as is already happening over interpretations of medieval Indian history. But to fight over the findings would be pointless. Every single Indian, like the vast majority of the human population, is a descendant of migrants and almost every Indian carries multiple lineages. (The possible exception to this are the Onge tribes in the Andaman and Nicobar islands who escaped the large-scale genetic mixing that occurred in India between 2,000 BC and the early centuries of the Common Era, as suggested by recent genetic research). Some of our lineages come from the original OOA migrants, some from Neolithic migrations from West Asia, some from Neolithic migrations from East Asia, some from Bronze Age migrations from the Steppes, and some from migrations that happened even later.

No matter what the geneticists come up with, the fact will remain that the sources of our civilisation are multiple, not single. Diversity is built into the way our population groups arrived, evolved and mingled. Our cultural traditions, myths, beliefs, practices, languages, and physical attributes are all, at the same time, both indigenously evolved and adapted as well as acquired from elsewhere, much the same as it is for every other civilisation on earth. Our uniqueness lies in the way we have created a common fabric out of it all, over time. What the genetic findings should help us do is understand the multiple sources of our civilisation, so that we can cherish our common heritage in a deeper, more meaningful manner.

How the DNA was captured

It has been known for long that the key to many puzzles of ancient human history lies in ancient DNA (aDNA). But it was only within the last eight years or so that technology advanced enough for geneticists to confidently sequence aDNA extracted out of human skeletons that are thousands or even tens of thousands of years old.

But one problem still remained: DNA preserves far better in cold climates than in warm climates and, therefore, all the early aDNA studies were done on fossils recovered from cold regions. Extracting and analysing aDNA in Africa, India or West Asia remained a formidable challenge. The science of genetics had to wait for one more leap before it could tackle this problem too. This happened sometime in 2014, when it was found that DNA taken from the inner ear region of the petrous bone could yield up to 100 times more DNA than other skeletal elements – a vital advantage, especially in poor DNA-preservation contexts. This discovery was followed by the development of new techniques to enrich the extracted DNA and filter out microbial and non-informative human DNA. These new methods were put to use first in a path-breaking study published in 2015 titled “Genomic insights into the origin of farming in the ancient Near East”, co-authored by geneticist David Reich of the Harvard Medical School.

Using these techniques required new tools and new skills, and scientist Niraj Rai, then with the CCMB and now with the Birbal Sahni Institute of Paleosciences in Lucknow, spent a few months in Harvard Medical School in 2016. Rai, who has done extensive work in population genetics, has been the leading Indian scientist directly involved in the ancient DNA analysis from Rakhigarhi and is now working on other ancient DNA samples from around the country. “The most difficult challenge was always ensuring the integrity of the DNA,” he says. “The petrous bone discovery was a turning point. Without it, this may not have been possible.”

Tony Joseph is a writer and former editor of BusinessWorld. Twitter: @tjoseph0010