



I have run ADMIXTURE on the 139-population/2,230-individual dataset, starting from K=3 and increasing K for as long as the Bayes Information Criterion increases. There was a temporary dip in the BIC at K=4, which, surprisingly, I had also encountered when analyzing a worldwide craniometric dataset (for an updated version of that analysis look here





Below is a plot of the BIC as a function of K, the number of clusters:









The above plot should clarify the ? in the post's title. The BIC seems to increase law-like up to K=15, and I have no idea when it will plateau. Certainly many clusters I've encountered in previous analyses with subsets of these populations have yet to appear, so who knows?





As long as I have a functional computer and enough RAM I will continue this analysis, and the updated results will be posted in this blog post (although I think that after a certain K, I may have to invent a new color palette to represent them, or resort to just posting the numbers).





K=3





At K=3, Sub-Saharan Africans, West, and East Eurasians are distinguished





K=4

At K=4, Native Americans get their own cluster (dark green)





K=5





K=10

At K=10, a cluster (light green) centered on Koryak, Chukchi, Greenland emerges. Notice that this is also represented strongly among Athabask, much less in Pima and Maya from Mexico, and none at all in Karitiana and Surui, the southernmost Amerindian groups. This component is probably related to Y-haplogroup C3b-P39





It would be interesting to consider this in the light of the theory of a separate migration of Na Dene speakers (of which Athapaskans are a part) into North America and their inferred relationship with Kets. The absence of the "dark green" component which is present in Kets in Athabask does not really invalidate this hypothesis, as the "dark green" component may postdate the expansion of Na Dene speakers into the New World; however, the presence of the "light green" component in Athabask and its absence in other Amerindian groups is quite consistent with the two-migration model. No specific genetic relationship can be detected with Kets, however.





K=11

At K=11, the isolated Kalash of Pakistan get their own cluster, and this occurs at a high level in their neighbors





K=13





K=14

At K=14, the Karitiana, an Amerindian group from Brazil get their own cluster (pink), which spills into other Amerindian groups, but not substantially to the more northern Pima and Athabask.





Notice also that the Karitiana component that appeared at K=14 has "folded back" to the Amerindian component, while a "West Asian" and "Red Sea" component has appeared, the latter appearing on both Arabians and East Africans. As I've mentioned before, as K increases, ADMIXTURE has many roughly equiprobable choices in trying to represent the data.









Fst distances between components







As always, you should treat the chosen names for the components as helpful mnemonics; also, if a name used here has been used in a different ADMIXTURE analysis, with another set of populations and/or K, you should not assume that it reflects exactly the same entity. Below is a table of genetic distances between the 15 inferred ancestral components:As always, you should treat the chosen names for the components as helpful mnemonics; also, if a name used here has been used in a different ADMIXTURE analysis, with another set of populations and/or K, you should not assume that it reflects exactly the same entity.

Below is a dendrogram of hierarchical clustering of these 15 components with complete linkage. Once again, I emphasize that tree-like representations of human variation are not to be taken as anything other than a useful visualization of the data, as human populations did not evolve strictly tree-like, but have experienced lateral gene flow.



The tree shows clearly the four major divisions of mankind, which are separated quite distinctly from each other. From top to bottom: East Eurasians, West Eurasians, Australo-Melanesians, and Sub-Saharan Africans.

Once again, I emphasize that you should look at the table of Fst distances above, especially for closely related populations. For example, the Mediterranean component is joined to the Red Sea component in the dendrogram, but the table of distances shows that it is marginally closest to the North European (0.057), equidistant to the Red Sea and West Asian ones (0.062), the Indian (0.084) and the Kalash isolate (0.092). Do not rely on lossy representations like dendrograms when you can examine the actual distances themselves.

average linkage method:





There is some internal re-arrangement of branches within the major races, and the Amerindian population becomes unattached from East Eurasians. Amerindians separated from East Eurasians fairly long ago, but their relationship to them is evidenced by the fact that they have their closest distances to East Asians and Siberians.



For completeness' sake, here is also a dendrogram of the hierarchical clustering using theThere is some internal re-arrangement of branches within the major races, and the Amerindian population becomes unattached from East Eurasians. Amerindians separated from East Eurasians fairly long ago, but their relationship to them is evidenced by the fact that they have their closest distances to East Asians and Siberians.





The maximum Fst in humans is between the Palaeoafrican ancestral population (Pygmies and San) and the Papuan one at 0.346, with a close second, that between Palaeoafricans and Amerindians (0.333). Finally, here is an MDS plot of the 15 components based on the inter-component Fst distances:

The average Fst between the 15 components is 0.167. Notice that these are Fst distances between inferred ancestral populations, not between extant human populations. As such, they can be expected to be somewhat higher than conventionally given Fst distances for human populations.

However, the maximum distance also corresponds to distance between extant populations: guided by this analysis, I carried out a separate ADMIXTURE run using Papuans and Mbuti Pygmies from the HGDP set, arriving at Fst=0.377. This is probably not the limit of genetic differentiation within our species though, as Australian Aborigines, who are one further step removed from Africa than Papuans may be even more distant.

Downloads

For anyone interested in exploring this data further, I've made a RAR file of the ADMIXTURE plots at a better resolution, as well as the raw admixture proportions behind them.

This also includes a file of Fst distances between components, and information about the samples (note that ancestral populations are labeled Pop0, Pop1, etc. and 1, 2, etc. in the distance file included in the RAR) At K=15 we are far from exhausting the available structure in modern humans.

This is part III of my series on human genetic variation; it is based on the same dataset as part I: Human genetic variation: the first 50 dimensions and part II: Human genetic variation: 124+ clusters with the Galore approach At K=5, Australoids (Papuans and Melanesians) get their own cluster (pink) which shows some affinity with populations from South Asia. At K=6, the East Eurasian cluster is split into a North Eurasian/Central Asian (light blue) one and an East Asian (pink) one. At K=7, a South Asian (light blue) cluster emerges. At K=8, the Caucasoid cluster is split into European-centered (orange) and West Asian-centered (light blue) components At K=9 the Mbuti, Biaka Pygmies, and San get their own cluster (Palaeofricans), with the Biaka showing some admixture with other Sub-Saharan Africans. At K=12, a Southeast Asian cluster (red) emerges, highest in Malay and Cambodians, and well-represented in Chinese ethnic minorities such as Dai and Lahu . Notice also that the East Asian component in Melanesians also becomes "red", linking them to the Austronesians. At K=13, a blue and a purple cluster supplant the previous West Asian cluster, with the blue one spilling to East Africa and the purple one to South Asia. At K=15 the Papuans and Melanesians are split into beige (?) and yellow population-specific clusters. Hence, the Melanesians, or at least the Nasioi from Bougainville where the HGDP sample is from, revealed in previous K to be associated with both Southeast Asians and Papuans, have actually acquired a genetic distinctiveness of their own.