In early September, the journals Cell and Science published two long-awaited papers based on the science of ancient DNA. The papers confirmed and expanded on what has been reported in the recent past about human settlement in South Asia. The study in Cell is based on the DNA from a single sample, a female, from a 4,500-year-old burial site in the Harappan city of Rakhigarhi. A summary of the results includes three key findings: the individual was from a population that is the largest source of ancestry for South Asians; Iranian-related ancestry in South Asia split from the Iranian-plateau lineages more than twelve thousand years ago; and the first farmers of the Fertile Crescent—a region that was the cradle of the Egyptian, Phoenician, Assyrian and Mesopotamian civilisations—contributed little to no ancestry to later South Asians. This indicates that not only were that the greater part of most South Asians’ genome is derived from the Harappan people, but also that farming may well have an independent origin in the region. Most South Asians carry some ancestry derived from steppe pastoralists, ranging from less than ten percent to a little over twenty percent. This ancestry is entirely absent in the Harappan genome, suggesting that the steppe pastoralists migrated to the subcontinent in substantial numbers after the decline of the Harappan civilisation. Both papers clearly spell out the likelihood that the steppe pastoralists brought the Indo-European languages to the subcontinent.
Hartosh Singh Bal, the political editor of The Caravan, spoke to Vagheesh Narasimhan, the lead author of the Science paper. Narasimhan is a post-doctoral fellow at the Reich Laboratory, at Harvard Medical School. His research is focused on using ancient DNA to understand human migration and evolution. They discussed the research methodologies that led to the laboratory’s findings, possible correlations between languages and genetics, and what the results say about when the steppe pastoralists arrived in India. The geneticists’ work so far, Narasimhan said, maps the settling of the subcontinent till 1000 BC. But a similar study of more recent samples is necessary to answer questions central to an understanding of Indian society today, including the origins and the evolution of the caste system.
Hartosh Singh Bal: Could you tell us about the work on human migration and ancient DNA that has been going on at the Reich lab?
Vagheesh Narasimhan: We’ve been working with ancient DNA for the past five years or so, and prior to that we’ve been working with modern data. Revolutionary new technology with the ability to sequence ancient samples emerged around ten or so years ago, with the sequencing of first Neanderthal genome in 2010. Since then we’ve started utilising this technology not just to study very deep ancient history but also to study more recent events. In 2009, our lab first began examining Indian genetic history. We sequenced a large number of samples from modern India and tried to reconstruct population history from these. We know today that’s a very challenging thing to do, and it leaves a lot of ambiguity about what happened in the past. Directly having ancient DNA sequences—that is, genomes of individuals who were buried in the ground thousands, if not tens of thousands, years ago— allows us to examine how humans moved through space and time. Along with radiocarbon data which gives very precise temporal information of the samples, we can then examine exactly how human migrations happened. So that’s the history of the field and how we started to look at India.
HSB: Sampling ancient DNA poses real challenges, and particularly so in the subcontinent. Why is this?
VN: For two reasons. One is technical: ancient DNA degrades. You have DNA that lies in the soil or skeletons and this degrades just by natural processes—you have less and less of it. The length of the DNA fragments that remain becomes shorter and shorter, and it becomes so short that it is no longer valuable to use them in analysis. Warm and humid climates provide additional challenges as the rate of degredation increases under such conditions. The second issue is that we hadn’t thus far had access to the kind of material that’s been available say, in Europe or Central Asia, or even the Americas. But based on the success that we’ve had with several of these projects, we hope to put in proposals with relevant authorities including the Anthropological [Survey of India] and the Archaeological Survey of India. We have also been getting access to hundreds of samples from Pakistan, so it would be really great if we can get access to samples from India as well.
HSB: When you do manage to extract a necessary fragment of ancient DNA however small, what are the next steps?
VN: We take 75 milligrams of bone powder—that’s less than the size of the nail of your pinkie finger. People say it is a destructive analysis process but actually the amount of material we take is much less than other procedures that have been used over the decades including radiocarbon dating. We take utmost care on each sample to ensure that sampling is as non-invasive and non-destructive as possible. Members of the lab have pioneered techniques to do this and published guides to best practices in archeological sampling.
The second thing is that our bodies contain a large amount DNA including on our skin and handling samples directly with our bare hands can overwhelm the amount of ancient DNA present in the sample. To control for this, we work in a lab in which there is a clean room which isolates people working on the sample, so there’s no way for the sample to be contaminated by us. We have a very sophisticated process to make sure that the environment that we’re working with is clean. By clean, I mean it doesn’t have various other environmental DNA that has been floating around that could contaminate what’s in the sample. Airflow mechanisms are isolated from the rest of the building so the air that is filtered into that environment is really specific to that room where we’re performing the extractions. UV light and surface cleaning with bleach and other reagents are further procedures we follow to remove environmental DNA contamination.
Outside of the laboratory, we also have various bioinformatic checks to make sure that there’s no contamination in the analysis that we’re doing. Ancient DNA has characteristic signatures of DNA damage and we can use this information to separate them from modern DNA contaminant. Thus, to summarise, we take this powder which we prepare specifically; work the powder into DNA sequence libraries using a chemical process where we crack the cells present in the bone—bone is also a type of tissue in the human body—and obtain the DNA from there. We then sequence this DNA and then analyse it using statistical methods.
HSB: The lab has also examined the settling of Europe. Is this work then part of a larger programme that extends beyond South Asia to examine the Indo-European question?
VN: Absolutely, we’re just interested in building an atlas of ancient DNA all over the world. We’re trying to find out how human beings change and evolve over time all over the world. I think that’s an interesting question, we don’t know anything about it, and during the past few years we have made great strides in understanding this question. Prior to these recent studies, the vast majority of our work and samples came from Europe. Our work has repeatedly yielded surprises that have upended orthodoxies in the literature, including showing that populations in Western Europe have largely been replaced in successive waves, not just once, but twice. Taking the British Isles for example. We first showed that hunter-gatherer populations there were replaced by populations related to the first farmers of western Anatolia around five or six thousand years ago. Then we showed that these farming populations, who were responsible for constructing major sites such as Stonehenge, were themselves almost entirely replaced in the Bronze Age by incoming populations from Central Europe, who in turn had about fifty percent of ancestry from the East—the Eurasian Steppe.
The Indo-European question is a sort of deeper question. It is not a question posed to genetics, it is a question that was originally posed to linguistics because it is trying to ask why languages widely spoken from Dublin to Delhi, as people like saying, are from a common language family. This has been a mystery for over two hundred years and people have had multiple explanations for this. There are two main explanations people in academic circles give: one is to do with the spread of farming from the near east, and the second is by the spread of steppe pastoralists from the Pontic-Caspian Steppe.
Now we have data from ancient DNA which starts to examine these questions. We try and look in various parts of the world where Indo-European languages are being spoken, including in Europe, and parts of Europe where Indo-European and non-Indo-European have been historically attested. So, it’s like Greece and the Mediterranean on one hand and Spain on the other, where today non-Indo-European languages continue to be spoken. We also look in Central Asia, in parts of Russia, in the Altai mountains, as well as in India and Iran.
HSB: How do you correlate an issue like language to genetics?
VN: You cannot. There is never going to be direct proof because bones don’t speak. However, I think it is important to understand that there are actually three different aspects to what we’re trying to understand. One is archaeological—what does the material culture on the ground look like? The second question is linguistics—what are the languages these people are speaking in a particular place in a particular time? The third is genetics—how are people moving around? These need not all be connected and they don’t always follow each other. People could be speaking one language and they would be participating in another particular culture. However, the fact that Indo-European languages are spoken so widely owes itself an explanation. How is it possible that, for example, Sanskrit and Greek are more closely related to each other than Tamil and Sanskrit, which are much closer geographically at least today? This connection of the movements of people both into Europe and into India at some point in time and the processes by which this happens—that is, it mirrors the linguistic shared features in the linguistic family—helps to explain why it could have been that the movement of people helped to spread Indo-European languages in the ancient world.
HSB: I want to return to the Indo-European question by keeping our discussion chronological. So, before looking at the steppe pastoralists, could I ask you what ancient DNA tells us about the settling of the subcontinent from the very beginning? Your work talks of a genetic substrate common to most Indian genomes today, which you call the Andamanese hunter-gatherer. What does this correlate to, what does it reflect, and where would we find some traces of it today?
VN: By substrate I assume you mean the ancestry we say is related to Andamanese hunter-gatherers. I think the word “related” itself needs clarification. There is the misconception that, when we describe ancestral sources, it implies one population moved to another part of the world. Here we use “related” to refer to the fact that two populations descend from a common ancestor. In this case, the ancestry in modern Indians is very deeply related to the Andamanese hunter-gatherers. By deeply, I mean this ancestry type diverged about 30,000 years ago from the Adamanese hunter-gatherers today. We use the Andamanese hunter-gatherers as a proxy because we don’t have any source populations for that ancestry type which are not mixed with any other group. Thus, the Andamanese hunter-gatherers are used as a proxy population to reflect what the ancestry of the first hunter-gatherers of India used to look like. So, this particular ancestry refers to the first group of people who peopled the southeast of India at some point of time. We don’t know when exactly, but it is a substrate that permeates through most of India.
HSB: If there is no existing population that corresponds to this substrate, we would not be able to say who they resembled or what kind of language they spoke?
VN: That’s exactly right. We don’t know.
HSB: But this substrate is the deepest substrate you find in the ancient DNA of the population you term the Indus Valley periphery population. What is this Indus Valley periphery population?
VN: This is a very interesting question. You mention that this is the deepest substrate but our work is now actually showing that there’s another group of hunter-gatherers who must have lived somewhere in the broad vicinity of the northwest of the subcontinent. This substrate is related to the early hunter-gatherers and farmers of the Iranian plateau, but deeply diverged from them just as those in the southeast would have been deeply diverged from the Andamanese hunter-gatherers.
The Indus Periphery population is actually a mixture of these two different types of hunter-gatherer ancestries which must have lived broadly in the vicinity of the southeast and the northwest of South Asia. At present we do not have any data from early hunter-gatherer populations from a geographic area ranging from the eastern fringe of Iran to Sri Lanka and therefore, we only infer their range based on observing these ancestries in later populations, such as those from the Indus Valley civilisation. So, the extent and distribution of these types of ancestries in ancient times is yet to be determined. What we do know is that these two ancestries mix around six to eight thousand years ago, forming a gradient of ancestry that we call the Indus Periphery or Indus Valley cline.
HSB: And is this Indus Valley Periphery population, in light of the ancient DNA sample from the Indus Valley site of Rakhigarhi, a good match for the population of the Indus Valley?
VN: That is exactly what we see. The Rakhigarhi samples sits right in the middle of that ancestry cline between these two groups of hunter-gatherers.
HSB: So the Indus Valley population is a mix of these two populations with a greater contribution from the northwest hunter-gatherers?
VN: That is exactly right.
HSB: And is this proportion still visible in most of the modern Indian population?
VN: Yes, but most modern Indian populations actually have a much higher Andamanese hunter-gatherer related proportion so there’s additional mixing that happens maybe around three thousand years ago. If we take someone from modern Haryana today, they’ll have a much more Andamanese hunter-gatherer ancestry compared to what we see in the Rakhgarhi samples. Seemingly, this mixing of ancestries, which began several millennia before the Indus Valley civilisation reached its maturity, continued several millennia after its decline. Regardless, both of these ancestry types taken together are the primary source of population for virtually every modern Indian today. Between eighty and hundred percent of the population of India is basically derived from ancestry related to that from Indus Valley civilisation, which in turn, had ancestry from hunter-gatherer populations which lived broadly in the northwest and southeast of the subcontinent.
HSB: When we are talking of the Indus Valley Population and Rakhigarhi being one sample, how do the 11 periphery samples back this up and make this a more robust result?
VN: People are too concerned about it being a single sample and I want to talk about this directly. There are again misconceptions here. The first [thing to know] is that a single genome not just integrates information across thousands of that person’s ancestors, it is actually extremely informative about the genetics of the population. It is not necessary to sample hundreds of individuals, you actually end up just getting repeat information in many cases.
The second thing is that we also have multiple individuals, 11 outlier individuals from the periphery of the subcontinent at sites with demonstrated cultural contact with the Indus Valley civilisation: Shahr-i-Sokhta, on the border between Baluchistan and Eastern Iran, and Gonur Tepe in Turkmenistan from the contemporary Bactria-Margiana Archeological Complex [the archaeological designation for the Bronze Age in Central Asia]. These individuals appear to be migrants to these sites based on archeological context, isotopic data and genetic ancestry. Taken together, these 11 samples and the single individual from Rakhigarhi are the source population for 110 samples that we have access to from the Swat Valley in Pakistan and thousands and thousands of samples that we have access to from modern-day India.
In fact, we show that they are the only possible source population from over five hundred samples that we have from Central Asia and Iran which are geographically proximate to India. That these others are not the source populations and only these samples are the source populations is telling you that this must be reflecting a type of ancestry that must have been present across northwestern India in the Bronze Age. Therefore, these 11 samples along with the single individual from Rakhigarhi confine the ancestry in a way. Even with additional sampling it is likely that we are only going to be sampling from this genomic distribution.
HSB: Would the fact that the one sample from Rakhigarhi matches 11 other samples rule out the possibility of that one sample being an archaeological exception in some ways?
VN: The point is, if it was an archaeological exception, you have to ask why that population fits as a source population for billions of Indian today and why it matches an ancestry gradient setup by 11 other samples which appear to be migrants to Central Asia from South Asia.
HSB: This brings us back to the steppe pastoralists. What do we learn from the fact that these 11outliers and the Rakhigarhi sample did not have any genetic contribution from the steppe pastoralists?
VN: Nothing other than the fact they don’t have any such ancestry. This ancestry must have come later—it is almost ubiquitous in all modern-day South Asians and in hundreds of samples that we have from the Swat Valley, just a thousand years after the Rakhigarhi sample.
HSB: Does this place limits on when the steppe pastoralists could have come to South Asia?
VN: Yes, in two ways. From Central Asia, we actually have direct radiocarbon dates. We also have genetic data from almost three hundred samples from the steppes and about two hundred samples from Turkmenistan, Uzbekistan, Tajikistan and Iran. So, we have a lot of information from parts of the world that are intermediate from the Pontic-Caspian steppes to India. We can actually observe the movements of people happening in time along with their ancestry types descending through that space. We apply the same procedure that we used in Europe, where we showed that people from the Eurasian steppe moved into Eastern Europe, then Central Europe, and finally to the fringes of Western Europe and so on.We can actually watch this ancestry shift through time.
Similarly, we can track the movement of this ancestry into the periphery of South Asia only around 2000 BC—that’s the first line of evidence. The second line of evidence is from the 110 samples that we have from the Swat valley in the far northwest of the region. All of those samples have a proportion of steppe ancestry and we can directly date when that steppe ancestry arrived in the genomes of those individuals using a statistical technique. And when we do that, we also obtain a date that’s very consistent with what we are directly observing from the movement of people where we actually have ancient DNA of steppe pastoralists as they move south. Taken together, we make an estimate as to when this process happened.
HSB: And this is a date well after the decline and end of the Indus Valley civilisation?
VN: Yes, the broader process must be happening after the decline.
HSB: We have had later historical expansions from the steppes region down to India—the Shakas, the Huns. But this is not the same population?
VN: No. We actually directly examine this in the paper. And all of those populations left very little impact or almost no impact on the genetic makeup of South Asia today. Similarly, later movements during the Islamic period did not contribute much either. By and large, the majority of the populations in India are unaffected by movements after the Bronze Age historically.
HSB: What is the degree of genetic contribution from steppe pastoralist in modern Indians?
VN: It is a maximum of around twenty percent.
HSB: Could this degree of contribution have been the result of a small band of people arriving into the subcontinent?
VN: It is not possible. It is likely to have been a large number of people and I think this is similar to what we see in Europe—for example, in the British Isles, there is a 90-percent population replacement and a 100-percent replacement in the male lineages. Likewise, in Spain, where there is 40-percent replacement genome-wide and a 100-percent replacement in the Y chromosomes, in the male lineages. So, in Europe, where there are also substantial farming populations, we have already seen evidence for population turnover. But steppe pastoralists have made much more significant impacts in an extremely male biased way in that part of the world.
HSB: And this male bias is a pattern that is also seen in South Asia?
VN: Yes, but not to the same marked extent. We also have evidence for the first time of a female-biased migration into Iron Age populations in the Swat Valley.
HSB: How is this migration of a pretty substantial population into South Asia linked to the Indo-European question?
VN: This is a bigger question and has to do with how we are linking ancestries with language. We start with the assumption—it is a big assumption, but I’ll try and convince you it is true—that the spread of languages, at least in pre-state societies, must have occurred due to movements of substantial numbers of people. It is not that [at that time] we had Skype where we could learn Chinese from anywhere in the world. The question has to be: why would you choose to shift language just randomly by chance? It is much more plausible that movements of large numbers of people were likely the mechanism by which language is spread in the ancient world. So, if we find evidence where this occurs and it correlates strongly with language distribution, it is likely to be the most plausible explanation for our observations. That is the guiding principle that we’re using.
We know where Indo-European languages are spoken today. They go all the way from the British Isles to North India. We try and understand if there the common ancestry type that links all of these geographical locations in the world where Indo-European languages are spoken today or historically attested.
What we find are two things: one, there’s a steppe-pastoralist ancestry that connects all of these regions or a large majority of such regions, and this ancestry is seen at significant levels. The second is that the manner in which that ancestry spreads reflects well known linguistic information about how certain languages share linguistic features as opposed to others and the movement of people reflects that sharing of different linguistic features. The third line of evidence is that in regions of the world where Indo-European and non Indo-European languages are attested—say, in India where we have the Dravidian and Indo-European languages—we find that the individuals who are speaking Indo European languages are also enriched for steppe-pastoralist ancestry.
This doesn’t happen just in India but it happens in Greece where the earliest Indo European languages of Europe are attested, between the Minoans and the Mycenaeans. The Mycenaeans are an Indo-European speaking population, we don’t know actually what the Minoans speak but we think [it is] likely not to be Indo-European. There again we see that the difference between the steppe-pastoralist ancestry. You see it in the Basques and other Spanish individuals where Basque is not an Indo-European language but Spanish is an Indo-European language. The difference between the two ancestries is steppe ancestry. We see it in different parts of Italy where historically attested Indo-European and non-Indo-European languages have been observed and again, you see a difference between Indo-European and non-Indo-European populations and their association with steppe ancestry in terms of genetic continuation. We see it again in Northwest China in Xinjiang where Tocharian languages [an extinct branch of Indo-European languages] are no longer spoken but were spoken in the past. It would be very bizarre if all of these things happened just by chance and the strand of steppe-pastoralist ancestry was not associated with Indo-European languages.
HSB: So can we say with some certainly that the language of the Vedas, any Sanskrit or proto-Sanskrit, arrived in India with the steppe pastoralists?
VN: Well, that’s a different question. You are asking whether Sanskrit arrives with them.
The point is that, the geography of the composition of the Rig Veda [the earliest Sanskrit text] is something that is inferred from philological data, and it is not clear where exactly it is, though it is highly likely that the events described therein are restricted to Northwest South Asia. If it is, then sort of the earliest evidence we have for Sanskrit anywhere, is seen within South Asia, and by that period of time it’s already divergent from language families we see in eastern Iran. So the divergence must have happened at some point and where it happened geographically is not yet known.
HSB: One of the main authors of these papers, Vasant Shinde, has in various press conferences claimed that the Indus Valley culture was a Vedic culture or was the source of much of Vedic culture. Of course, the two statements are very different. I think what he is saying is that it was the Vedic culture. Is it a statement you would agree with?
VN: I think it depends on what you mean by Vedic culture. We can take an analogy from Europe again, where we have a lot of data about ancestry shifts and material culture before and after the arrival of steppe pastoralists. There, the Bell-Beaker culture [the term refers to the early European Bronze Age, deriving its name from the inverted bell-beaker vessel used during it] of Central Europe is one that originated largely in Western Europe, but was associated with skeletons that had upto fifty-percent ancestry related to the incoming pastoralists from the Eurasian Steppe. Thus, in Europe we have an unambiguous example of people with ancestry from the steppe making profound demographic impacts on the regions into which they spread but they adopted important aspects of local material culture. Their cultural practices and agricultural lifestyle reflect indigenous traditions and material culture that is far from what you are seeing from the Eurasian steppe.
Similarly, if you ask archeologists about transitions in culture in India, the evidence suggests that there is very little change in the material culture from the Harappan period to the post-Harappan period. The way people are eating, the way people are burying their dead, the way crops and grains are being grown and so on and so forth remain basically the same. Moreover, there is a striking difference in the cultural practices of the populations from the Central Steppe and populations in the Vedic period in India, including the most prevalent Vedic rituals of Soma and Homa, for which there is no precedent on the Steppe. Thus, in in South Asia, just as in Europe, the arriving steppe pastoralists who mixed into local populations clearly adopted local cultural practices, which we call today Vedic culture.
HSB: What you are saying would rule out the possibility of an Indo-European language being spoken in the Indus Valley civilisation.
VN: Yes, I would agree.
HSB: After this large migration of steppe pastoralists into South Asia, there is a huge transition in the Indian population, a mixing that goes on for over the next thousand years and sort of explains the genetics of much of the Indian population today. What is underway during this period?
VN: That process is not actually well known. We know that this ancestry arrives. We know that from a single geographic space: the Swat Valley. We know that this ancestry is patchy when it first arrives, including at 1000 BC. Some populations have more, some populations must have had less, and the dynamics of how that ancestry spreads is not known. I think we specifically say that we don’t actually understand this process at all.
We need samples from the Iron Age all over India, from the post-Harappan period, to actually understand how the dynamics of this process happens. We know how it happens in Europe from lots of data that we’re collecting now, but we’re not actually sure how it happens in India.
It could be that this ancestry first integrates into the northwest, and there’s a sort of polity or state society that’s established there and thereafter, it expands into other parts of India. Or it could be that soon after its arrival, this ancestry slowly trickles into the neighboring geographical locations and then forms this ancestry gradient that we observe in modern India today. There’s really no way to know without ancient DNA.
HSB: How does this great mixing, which seems to have influenced almost all our populations from the south to the north, just literally freeze over into the caste system?
VN: At some point in time the mixing stops happening and we’re trying to understand why that happens or when it happens. Without data it is difficult to say, and as I mentioned, we don’t have data for any time after 1000 BC from any part of South Asia. Even from that time, the only data we have is from the very extreme northwest. What we have so far is we have one Harappan genome and another 11 outlier samples and 110 samples from the far northwest and then we have modern samples. We’re being as conservative as possible and just describing large scale events at this point. And additional samples that we hope to obtain and study will enable us to address many more detailed questions.
India is an extremely diverse and complex place, linguistically, genetically and culturally, and studies like this are helping to bring new lines of evidence to bear. For far too long, far too much has been written about these topics with very little data. We hope that in bringing new quantitative lines of evidence we can address directly population transformation—or lack thereof—that occurred not just in India but across the world. Another important aspect of our work on population genetics is its relationship to medical genetics. Genetics of populations in India has largely been understudied, relative to that of Europe, and studies like this are hoping to close that gap, and we hope to leverage this rich information to improve human health.
This interview has been edited and condensed.