The origin of domestic horse in East Asia has long been a puzzle to archaeologists. According to history and archaeology records, two controversial hypotheses have been proposed about the origin of domestic horse in East Asia: the external input and the local origin. Multiple lines of evidence from history, archaeology and genetics on large-scale sampling, should enable us to understand the scenario of horse domestication in East Asia better.

Testing the external input hypothesis

Archaeologists suggested the external input hypothesis of domestic horses in East Asia since the appearance time of domesticated horse in East Asia seems to be later than that concerning Central Asia. The earliest undisputed domestic horse was found in Kazakhstan, dating to about 5500 BP [4]. Domestic horse in China did not widely appear at archaeological sites before the Late Shang Dynasty (3300–3046 BP) in the middle and lower Yellow River regions [45]. Horse remains excavated in Korea and Japan was even later than that in China [46]. However, the external input hypothesis was at least argued by the following three archaeology facts. First, there is an absence of strong evidence regarding the observation that domestic horses did not exist before 3300BP. Horse bones, although scarce and broken, were discovered at Pleistocene and Neolithic sites across China [20]. It is difficult to label these bones as domestic or wild based on morphology criteria since horses have not changed much physically as a result of domestication [47]. Horses were not offered as sacrifices until Late Shang Era [20]. That’s one of the reasons that horse remains were a sparse contrast to other livestock in sacrificial places at archaeological sites before Late Shang Dynasty. And husbandry system for horse established over 3000 years ago was strictly controlled over by the royal regime [48]. As a result, horse remains cannot be found in typical tombs. Second, horse riding probably came first preceding driving [49]. It has to be noted that, evaluating the time of horse domestication based on the carriages excavated at the archaeological pits is not an appropriate method. Third, horse archaeological remains have been unearthed in a few parts of southern China [50]. Only one research well studied the ancient DNA of the remains unearthed in northern China [51]. Because of the lack of systematic and reliable evaluation method for distinguishing the domestic and the wild horse by bone fragments, Chinese archaeologists concentrate more on horse gear and carriage [52]. Comprehensive ancient DNA research of horse remains will help us to get a full picture of the origin of the East Asian domestic horse.

Genetics research gives us the possibility to test the external input hypothesis, puzzling the archaeologists and historians. If the genetic components of domestic horses in East Asia are absolutely a subset descended from the outside populations, the external input hypothesis is demonstrable. Europe is an important region for horse domestication because most haplotypes can be traced back to Europe [5]. However, just seven of 23 native horse breeds in China were analyzed in their study. The Iberian Peninsula was provided as refugia for wild horses in the Holocene [6]. The study agrees with the view that some horse haplogroups originated in Europe and then were introduced eastward. Haplogroups I, L and N are prevalent in Europe (Fig. 4, Additional file 8: Table S8, Additional file 15: Figure S4). The association of the western part of Eurasia (Central and West Asia and Europe) mtDNA types was extremely significant in haplogroup I (P = 3.96E-12), haplogroup L (P = 3.8E-17) and haplogroup N (P = 6.35E-10) under the Pearson χ2 test (Table 2). The F% (the ratio of the number of unique haplotype to the number of haplotypes) value of the three haplogroups is lower in East Asia (42.9–57.14%) than that in Europe (52.94–72.7%) (Additional file 9: Table S9). And most ancient haplotypes for haplogroups I and L are traced back to the western part of Eurasia (Fig. 3). It is worth to note that three of the five Europe predominant haplotypes (sample size >40) are assigned to haplogroup L, none of Europe predominant haplotypes belongs to haplogroup I. Thus, researchers believe the in-flow of haplogroup L from Europe into East Asia, which is emphasized by nucleotide diversity following the West-to-East gradient (Additional file 9: Table S9).

Testing the local origin hypothesis

The possibility of the local origin of the horse in East Asia may not be excluded, following history and archaeology records. As indicated in paleontology and history research, preconditions for horse domestication, such as wild horses and vegetation resource, were sufficient in East Asia during the Pleistocene [24, 25, 45, 53]. That is why Eastern steppes and Iberian were refugia for horses in Holocene [6]. The dog and pig domestication were proposed to be under the impact of rice planting culture in East Asia [26, 28]. The study has reason to believe that indigenous horse domestication is likely to occur in East Asia since flourishing grassland culture has been evidenced in this region. Three Neolithic sites (Xinglongwa: 8000BP; Zhaobaogou: 7000BP; Hongshan: 6000-5500BP) representing early grassland culture, were discovered in Inner Mongolia, China [54, 55]. Besides, rock arts representing the herding and riding of horse have been found in the northern part of East Asia at dates before 3600BP [20].

The view of the local origin of the horse in East Asia is reinforced by molecular evidence. Given the indigenous domestication of the horse in East Asia, East Asian-dominating haplogroups and their specific genetic component should be detected. Central and West Asia, Europe and East Asia are three putative horse domestication regions [7]. As shown in Table S8, samples quantity variance are not significant differences between the eastern part of Eurasia (East Asia, 1641 samples) and the western part of Eurasia (Europe and Central and West Asia, 1595 samples). However, haplogroups OP, Q, and R are frequently presented in East Asia (Additional file 8: Table S8, Fig. 4, Additional file 15: Figure S4). Also, a lot of unique haplotypes of East Asia were observed in haplogroups OP, Q, and R (Fig. 3 and Additional file 15: Figure S4). The ratio of the number of unique haplotype to the number of haplotypes (F%) of haplogroups OP, Q, and R are at least about two times more in the eastern part of Eurasia (52.94–72.7%) than that in the western part of Eurasia (18.75–37.5%) (Additional file 9: Table S9). It can be seen that ancient haplotypes for haplogroups Q and R are traced back to East Asia (Fig. 3).

To add credibility and supporting evidence, the ancient domestic horse remains should be considered. One previous ancient DNA research about Chinese horse revealed that clade F (correspond to haplogroups OP and Q now) was presented in samples older than 4000 years [51]. When the remains were assigned according to specific mutation motifs of mtDNA in this study, it was found that ancient domestic horse remains of haplogroup Q are more frequently discovered in East Asia than in other regions. Although haplogroup OP frequently distributes in East Asia (Additional file 8: Table S8 and Additional file 9: S9), East Asia is more likely to be a refugium but not an origin region for haplogroup OP because of its shared ancestral haplotypes that were found both in West Asia and East Asia (Fig. 3). The study did not deny that horse domestication originated in the western part of the Eurasian steppe [7], but haplogroups Q and R originated in East Asia.

To find out the most likely origin region for haplogroup Q, researchers carefully analyzed the 18 unique haplotypes of this haplogroup. In contrast to the sample distribution of haplogroup Q in the SEA and the NEA (Additional file 8: Table S8), unique haplotypes belonging to the SEA have numerical superiority. Nine of the 18 unique haplotypes came from the SEA, while six of the 18 came from the NEA, the rest three were from both the SEA and the NEA. Hence the SEA was presumed as the origin region of haplogroup Q. The nucleotide diversity value of haplogroup R follows the East-to-West gradient (Additional file 9: Table S9). And the F% of haplogroup R is almost about three times in the eastern part of Eurasia as many as that in the western part of Eurasia. The study also analyzed the eight unique haplotypes of haplogroup R to find out the most likely origin region for this haplogroup. Five of the eight unique haplotypes came from the SEA, while two of the eight came from the NEA, the remaining one was from both the SEA and the NEA. Thus researchers infer that the origin of haplogroup R was in the SEA. However, the association of East Asian mtDNA types belonging to haplogroup R was not significant using the Pearson χ2 test (P = 0.0969) (Table 2). One reasonable explanation is that their gene flow narrowed the genetic population difference between the SEA and the NEA. What is more interesting is one of the 17 haplotypes of early domestic horses that were extinct during the last 5500 years [5] still survives in horse population of southeast China (locate in Dali, Yunnan province).

Moreover, it is essential to note that a few East Asian unique haplotypes show several mutation steps from ancestral haplotypes (Fig. 3). These mutation steps were more likely from local wild horses’ introgression rather than domestication and subsequent evolution. Thus, hybridization probably has contributed new genetic composition to maternal lineages in East Asian domestic horses, but not predominantly.

Genetic and phylogeographic structure of the domestic horse in East Asia

The genetic and phylogeographic structure of domestic horse in the NEA is the difference from that in the SEA, which is supported by the PCA, haplogroup distribution frequency, network, contour maps of haplogroup frequencies, gene flow analyses, Pearson χ2 test, and AMOVA. From the region-based principal component analysis, most of the haplogroups occur in the NEA and the SEA shows different distributions. It reveals the genetic difference between northern East Asia and southern East Asia (Fig. 2 and Additional file 16: Figure S5). The difference between the NEA and the SEA was proven by further AMOVA analysis. The variation between the NEA and the SEA is significant (P = 0.036), although only explaining a low level of genetic variation (1.35%), suggesting the NEA horse populations demonstrates some geographic clustering to the exclusion of the SEA horse populations. Besides, the haplogroup frequency distribution (Additional file 10: Table S10), the minimum spanning network (Additional file 14: Figure S3) and the contour map (Fig. 4), they all show the genetic and phylogeographic difference between the NEA and the SEA. The frequency distribution of haplogroups I, N, OP and Q in the NEA are higher than that in the SEA. However, the frequency distribution of haplogroups H, L, M and QR in the SEA are higher than that in the NEA, and especially haplogroup N is prevalent in the NEA. According to the Pearson χ2 test, the independence for haplogroups EFG, H, I, M, N, OP and R sequences in the NEA and the SEA populations were highly significant (Table 2). Another interesting result is that one of the 12 universally occurring haplotypes (haplotye14, sample size = 88) cannot be found in the SEA horse populations, adding support to the genetic differentiation between the NEA and the SEA.

Based on the gene flow analysis (Additional file 7: Table S7), compared to the SEA populations, relatively large gene flow values were detected between the NEA populations and most non-EA horse populations. In addition, the ratio of the number of unique haplotype to the amount of the haplotype is 44% in the NEA (Additional file 11: Table S11), the proportion of individuals having 12 universally occurring haplotypes (UT) is 39.3% (Additional file 11: Table S11). However, the NEA populations are relatively far from other non-EA populations in the PCA plots. Although different kinds of human action influenced horse gene flow for thousands of years in Eurasia [56, 57]. Researchers speculate that local origin and distinct genetic elements from wild horses involved in the domestication of the NEA horse populations. A previous study indicated multiple ancient DNA of remains unearthed in the NEA region [51], which support our above view to some extent.

In contrast to the NEA, there is relatively week gene flow between the SEA populations and most non-EA horse populations, the ratio of the number of unique haplotype to the number of haplotypes is 55% in the SEA (Additional file 11: Table S11), the proportion of individuals having 12 universally occurring haplotypes (UT) value is 45.1% (Additional file 11: Table S11), however the SEA populations are relatively close to other non-EA populations in the PCA plots. Researchers consider that local origin and a lot of distinct genetic components from wild horses also involved in the domestication of the SEA horse populations. And then genetic barriers occurred between the SEA populations and most non-EA horse populations due to the mountain environment. Another interesting observation is that a sample from Dali (the SEA, Yunnan, China) is one of the 17 extinct haplotypes mentioned by [5], which confirmed gene flow between the SEA and non-EA horse populations that took place in the early stage during domestication.