A Cape Coloured family

I've mentioned the Cape Coloureds of South Africa on this weblog before. Culturally they're Afrikaans in language and Dutch Reformed in religion (the possibly related Cape Malay group is Muslim, though also Afrikaans speaking traditionally). But racially they're a very diverse lot. In this way they can be analogized to black Americans, who are about ~75% West African and ~25% Northern European, with the variance in ancestral proportions being such that ~10% are ~50% or more European in ancestry. The Cape Coloureds though are much more complex. Some of their ancestry is almost certainly Bantu African. This element is related to the West African affinities of black Americans. And, they have a Northern European element, which likely came in via the Dutch, German, and Huguenot settlers (mostly males). But the Cape Coloureds also have other contributions to their genetic heritage. Firstly, they have Khoisan ancestry, whether from Bushmen or Khoi. This is well known in their oral memory. The the hinterlands of the Cape of Good Hope are beyond the ecological range of the Bantu agricultural toolkit, so the region was still dominated by the Khoisan when the Europeans arrived. But there are also other suggestions of ancestry from Asia. The existence of the Cape Malays, whose adherence to Islam derives from the Muslims slaves brought by the Dutch, hints at likely relationships to the populations of maritime Southeast Asia. Finally, there are the Indians. This element is not too well recalled in cultural memory. But the Dutch brought many slaves from India as well as Southeast Asia. The Dutch first governor of the Cape Colony had a maternal grandmother who was an Indian slave, by various accounts Goan or Bengali (the town of Stellensbosch is named for him). No doubt it was far more likely that the usual lot of the descendants of Indian slaves during the Dutch era would be to be absorbed into the melange of the Coloured population than assimilated into what later became the Afrikaners.

Why is this aspect of Cape Coloured ancestry forgotten? I think part of the reason is that there is a large South African Indian community present today, but that community post-dates the Dutch period, and arrived with the British. When South Africans think of Indians they think of these people. Interestingly when the new genetic studies confirming Indian ancestry came on the scene I was "corrected" several times by Indians themselves when reporting this part of the Coloured heritage. They were under the impression I must be mistaken, as no one was familiar with the Cape Coloureds having Indian ancestry. Unfortunately pointing to PCA and STRUCTURE plots did not clear up the confusion.

In any case, thanks to the African Ancestry Project I now have three unrelated Coloured samples (I have more, but they are related). Since AAP is Afrocentric I thought it would be appropriate to run the Coloured samples separate first. So that's what I did.

First, the methodology. I took the Gujaratis, Utah whites, Chinese from Denver, and Luhya (Bantu) from Kenya, and merged them with the Bushmen from the Henn et al. thick-marker data set. I also decided to add in the Yemeni Jews from Behar et al., mostly to check that the West Eurasian ancestry of the Cape Coloureds was in fact Northern European. I limited the Gujarati sample to those from "Gujarati_B", which is the "more South Asian" cluster within the HapMap data set. I also reduced the numbers for a lot of HapMap populations. I'm looking at inter-continental differences, so I assumed that N of ~20 would suffice. After merging these data sets with the Cape Coloured samples I pruned all the missing SNPs. This left me with ~230,000 markers. In my experience this is kind of overkill for ADMIXTURE at this level of genetic distance between the hypothetical parent populations, but better safe than sorry. I also ran the samples through EIGENSOFT to generate PCAs. Also know that I performed a few "trials" with Sandawe and Hadza from Henn et. al., as well as with larger samples from the HapMap. That either added nothing on the margin, or just got confusing (there's not really too much Sandawe and Hadza in the Cape Coloureds beyond what the Bantu must have picked up).

After I ran ADMIXTURE up to K = 7 it was clear that the optimal point in terms of informativeness was K = 6. You can see that the Cape Coloured samples have Northern European, Khoisan, Bantu African, Indian, and East Asian ancestry. There is a Yemeni component in two of the Coloured individuals which begs to be explained. This component is too high to be explained by Northern European ancestry alone. It could be explained by slaves from the Muslim Arab world. Also, the Indian reference sample used here was pruned to be very homogeneous. The slaves from South Asia were almost certainly much more diverse than the Gujarati_B population, which is mostly a group of Patels. Finally, sometimes when you run ADMIXTURE you see that combinations of atypical genetic backgrounds (e.g., Khoisan + Chinese) can general components which are likely artifacts. This tends to be an issue when you have two components which aren't normally found together, and one is at a far lower level than the other. I've noticed this in particular with people with low amounts of Sub-Saharan African ancestry and Eurasian genetic backgrounds. They often come out to be East African or Pygmy or Bushmen when the probability of this is likely to be very low a priori. Notice that a few of the Bushmen have the Yemeni component but nothing else besides what you'd expect. This to me increases the likely that the light green in the Coloureds is also an artifact of the Khoisan genetic background against one of the other components.

So below is the K = 6 ADMIXTURE plot, along with the informative PCA's. Observe that the three Coloureds have IDs.

ADMIXTURE plot

Image Credit: Wikimedia Commons.