

Whenever I post about Indian genetics there are really weird comments that pop up which go like this: “this guy doesn’t know anything about genetics, he totally ignores the research [usually published in the late 2000s and utilizing mtDNA haplgroups] of [Indian researcher that I don’t really know] who has proven [something which has been superseded long ago].” Usually these are at my Facebook account, though they also pop-up on Twitter. Often I’ll indulge this people, but usually I just ignore them. If your world-view needs to be supported by mtDNA haplogroup analyses published in Human Biology, more power to you! Or if eight marker autosomal microsatellite studies from 2005 is the last you want to hear about genetics…by all means.

As it is today in a few hours you can really resolve what’s going on with questions about Indian genetics, or whether the Chinese are genetically differentiated, as long as you don’t have too strong of an agenda, can get data, and don’t go sniffing for particular results. A few weeks ago a friend who is from a Tamil Brahmin background asked me if I knew anything about the genetics of this group. Well, a bit. Above and to the left is a bar plot with admixture fractions from Harappa DNA Project . You can see that the Tamil Brahmins are homogeneous. This suggests that they’re an endogamous community with genetic coherency.

But how do they relate to other South Indians and other Brahmins? This is a question that is politically fraught. I really don’t care though, because I’m not Indian, and even if I was, I still wouldn’t care. I don’t have Zack’s data set, but I do have three Tamil Brahmin genotypes. You can see them on the PCA plot above. The North Indian data set is all Punjabi, while the South Indians are a mix of non-Brahmin Tamils and Telugus, from the 1000 Genomes. The rest is from the Estonian Biocentre data. The results are clear, you can see that Tamil Brahmins are strongly shifted toward the North Indian cluster but in comparison to Uttar Pradesh Brahmins they are South Indian skewed. The most parsimonious explanation taking into account their generally agreed upon communal history of migration from northern India is that they are predominantly a northern origin caste with some admixture from the local substrate. This seems entirely reasonable with how we know demographic processes work.

Using TreeMix I ran 20 plots each of two different data sets with Tamil Brahmins. All the plots are here (tar.gz). But below are two representative plots.

In the first set of plots the Tamil Brahmins tend to be near the positions of the North Indian groups, but have a consistent migration edge from near the Velamas. From what I can tell the Velamas are not a marginal group, but somewhat elite. It seems entirely reasonable that native gene flow into Brahmins coming from the north would be from local high status populations, since the Brahmins themselves were coming into the region as a priestly elite to serve the rulers of South India and sanctify their domains. Usually I read something about the assimilation of local religious elites, so that’s probably what happened. Also, note that Uttar Pradesh Brahmins consistently receive gene flow from Chamars, a Dalit caste in Uttar Pradesh. I suspect what’s going on her is that the Chamars are representative of the pre-Indo-Aryan population, and the Indo-Aryans amalgamated with local elites as they pushed the Aryavarta beyond the Punjab. There are allusions which can be interpreted this way in the older Hindu texts.

The second set of plots is a little more confused. The positioning of the various groups is a little schizophrenic, and you can see gene flow edges back and forth attempting to make the “fit” of the topology better. The position of the Tamil Brahmins is next to the Chamar here, but they are getting a lot of gene flow (nearly 50%) from the Uttar Pradesh Kshatriya, again indicates that the group is a composite. The Chamar make direct contributions to both Uttar Pradesh high castes.

A major shortcoming of these analyses is a paucity of good source populations for these gene flow edges. A lot of the public data is from obscure tribal groups who are somewhat inbred, and so often drift into long branches. The 1000 Genomes data has no ethnic label, so you are pooling a lot of different groups together. For whatever reason we know a lot more about the genetics of the Tharu people or the Kol than we do about the Brahmins of Tamil Nadu or Uttar Pradesh, or the Kayastha of West Bengali.

Finally, I was curious about runs of homozygosity. If the South Indian Brahmins went through a bottleneck of some sort, and have been endogamous, they’d have built up some of these. I have three 23andMe South Indian Brahmin samples, along with a Kayastha from Uttar Pradesh, and myself. I took the HapMap populations and intersected SNPs so that I got 750,000. Below is a density plot of total kb of runs of homozygosity of HapMap populations, as well as vertical lines which show where some individuals come out. I was struck that the South Indian Brahmins had 24, 25, and 26, runs respectively using default cut offs. The Kayastha from UP had 19. And I had 11. I think my relative lack is due to two factors. First, the last few generations above me in my pedigree have seen a lot of intermarriage between what in different parts of India would be different jatis (it doesn’t map totally to Muslims, but I do have a fair number of Hindu ancestors in the last few hundred years and sort of know their caste by the surname). Second, I’m Bengali, with a lot of East Asian ancestry, so without inbreeding that’s going to break apart a lot of blocs which might otherwise exist in the genome because of population admixture. If you are curious about the GIH, Gujarati population, there are a lot of Patels in that sample. They’re skewing the distribution up.

CEU = Utah White

GIH = Gujarati

CHB = Beijing, Chinese

ASW = African Americans from Oklahoma City