mtDNA control-region and Y-chromosome founder analyses

To investigate the genetic input into ISEA through time, we carried out founder analyses with both mitochondrial DNA (mtDNA) control-region sequences—for the maternal line of descent—and Y-chromosome variation using a 19 Y-STR dataset within SNP-defined lineages—for the paternal line of descent. Founder analysis is a quantitative phylogeographic approach developed to evaluate the diversity of lineage clusters that has arisen within a particular geographic sink region (in this case, ISEA), following migration from a specified (assumed) source region (in this case, MSEA/China/Taiwan). Using the molecular clock to convert to time depth, these values are a proxy for the minimum arrival age of each founder cluster in the sink (Richards et al. 2000).

For maternal lineages, the 200-year scan of founder lineages dispersing into ISEA (Fig. 1a) identified two major coalescence peaks (corresponding to bursts of immigration) under the two criteria we employed, f1 and f2 (Fig. 1a) (Table S11), at 4.6–4.8 ka and at 8–10 ka, respectively. We also observed a slight hump ~55 ka with the f2 criterion alone.

Fig. 1 Founder analysis results for ISEA, assuming Taiwan as source, for mtDNA (female lineages) and Y-chromosome variation (male lineages). a Probabilistic distribution of mtDNA founder clusters across migration times scanned at 200-year intervals from 0 to 70 ka, using two criteria for founder identification, f1 and f2; b probabilistic distribution of Y-chromosome founder clusters across migration times scanned at 200-year intervals from 0 to 70 ka, using two criteria for founder identification, f1 and f2; c proportion of founder lineages in a four-migration model for mtDNA and Y-chromosome variation using two criteria for founder identification, f1 and f2; d probabilistic distribution of each individual lineage in mtDNA and Y-chromosome variation in a four-migration model chromosome using two criteria for founder identification, f1 and f2. Individual founder clusters with more than 2 % frequency in overall ISEA (sink populations) are indicated at the left-hand side of each plot Full size image

For Y-chromosome variation (Fig. 1b), we obtained very similar peaks with both criteria (one at 4–5 ka and a second at ~8 ka). Rather remarkably, the two main peaks in two different genetic systems with distinct mutation rates and estimated using two distinct founder criteria are consistent across each of these different analyses. In addition, we observed an increment representing very recent migrations and, with the f2 criterion, a further extra peak at 10–11 ka. This peak might signal the second well-defined episodic flood immediately after the Younger Dryas (Pelejero et al. 1999). We did not include it in the migration model, however, for two reasons: it was detected only under a single criterion and with one genetic system; and, in any case, founders at this peak will be included statistically in the ~8 ka migration that overall can be defined as postglacial migrations. The oldest arrivals here date to ~20 ka, largely haplogroup K and C lineages. This may well correspond with the ancient minor peak for mtDNA; we expect ρ dating with STRs to provide severe underestimates for ancient clades because of mutational saturation. However, for the present analysis this is a minor issue, since we are concerned primarily with events in the Holocene. Particularly in the case of K, an older age than this could be expected, considering that K probably evolved in the region since the first settlement as displayed by the high prevalence of K* and K subclades in the ancient Sahul populations, including Aboriginal Australians (Hudjashov et al. 2007).

We then partitioned the founders in ISEA using a migration model informed not only by the scan results in the two genetic systems, but also archaeological and palaeoclimatological evidence, to quantify the contribution of each immigration event to the extant mtDNA and Y-chromosome gene pools in ISEA. The model from mtDNA data here assumes migrations at 4.5, 8 and 50 ka, corresponding to Neolithic immigration, postglacial expansions and first Pleistocene settlement. We assumed a further dispersal at 0.5 ka to allow for any recent/historical gene flow.

For Y-chromosome variation, we used a more recent age of 20 ka to cover the more ancient migrations, as mentioned above. However, the matching of peaks at 4–5 ka and 8–10 ka for both the paternal and maternal line of descent is striking. The overall contribution at each proposed migration time for each of the two founder criteria in the mtDNA and Y-chromosome variation is shown graphically in Fig. 1c, d. The mtDNAs coalescing at the time of the first settlement (~50 ka) accounted for ~10 to 20 % of modern mtDNA lineages in ISEA. Note that many lineages from the ancient Sunda continent would very likely be present across both ISEA and MSEA, which were only finally separated by sea-level rise ~8 ka. However, MSEA is a source region in this analysis, so this value in the founder analysis corresponds to ancient lineages private to ISEA only. In the mtDNA analysis, lineages descending directly from the haplogroups carried by the first settlers correspond to M*, N*, R* and possibly haplogroup F3 (Fig. 1d). Although a recently published ancient mtDNA haplogroup E sequence (Ko et al. 2014) was used to suggest a Taiwanese source for this clade, an early origin in ISEA (Soares et al. 2008) remains more likely, as discussed below. At this ancient time-frame, Y-chromosome lineages (with STR ρ dating) are uninformative due to saturation, but haplogroups K* and even C may date to the first colonization at that time. These are above 30 % in the Y-chromosome analysis.

Overall, the migration at ~8 ka contributes the most lineages to the current gene pool of ISEA with a fraction of ~40–50 % in both mtDNA and Y-chromosome variation (Fig. 1c). We stress again that, statistically, this migration time could include lineages entering ISEA throughout the period of sea-level rises, from 14 to 8 ka, covering all three flooding episodes (Pelejero et al. 1999). This partition probabilistically includes major and well-studied haplogroups such as B4a1a (Soares et al. 2011), subclades of haplogroup E (Soares et al. 2008), F1a*, and subclades of haplogroup M shared between ISEA and MSEA, with B4a1a and E the major contributors. In Y-chromosome variation, this migration includes most clusters within haplogroups O2a1 and O3 and a subclade of O1a (Fig. S4), matching to some extent the results of Karafet et al. (2010) indicating that O1a* entered ISEA before the Neolithic. We should note that in our recent Y-chromosome survey (Trejaut et al. 2014), O2 and O3 clades declined in frequency moving north from ISEA towards Taiwan, the opposite of what one might expect from an “out-of-Taiwan” movement. A previous survey (Karafet et al. 2010) also suggested that O3, O2a1 and O1a* entered ISEA from the mainland before the Neolithic period.

The contribution at the time of the Neolithic, at 4–5 ka, varied with the criterion and the genetic system, but 25–35 % is probably the best estimate (Fig. 1c). (The f1 criterion in mtDNA probably overestimates recent migration due to the large size of the source sample used.) Only one major founder presented significant differences between the analyses: B4b appears Neolithic in f1 criterion and part of the postglacial migration in the f2 criterion (Fig. 1d). This haplogroup deserves further attention in the future. The widely held model for the spread of the Neolithic in ISEA implicates expanding pre-Austronesian/Austronesian speakers from South China/Taiwan (Bellwood 1997); but in fact not all of the Neolithic founders we identify support this hypothetical “out-of-Taiwan” dispersal. A large fraction of Neolithic mtDNA founder clusters from haplogroups B5a1 and F1a1a (~10 % out of the 25–35 % Neolithic lineages in the analysis) appear to have originated in MSEA, and are rare or absent in either Taiwan or the Philippines.

Our results therefore suggest that mid-Holocene Neolithic immigration into ISEA was in part via MSEA, temporally associated with spread of basket-marked and carved paddle-impressed pottery, which appeared across MSEA as early as red-slipped pottery appeared in Taiwan (Bulbeck 2011), and possibly involving speakers of Austroasiatic languages (i.e. Anderson’s “Neolithic I”) (Anderson 2005). The mtDNA haplogroups M7c3c, Y2, F1a4a, B4c1c and possibly B4b (which shows contrasting patterns under the two criteria) may, however, represent genuine “out-of-Taiwan” clades in ISEA. These founders are all derived from Chinese-mainland source haplogroups, and within Austronesian-speaking populations they have a higher overall frequency in Taiwan and the Philippines (Fig. 2a). This input, at ~20 %, lends support to a modified, small-scale “out-of-Taiwan” model [Anderson’s “Neolithic II” (Anderson 2005; Donohue and Denham 2010)], proposed to explain the appearance of red-slipped pottery in relation to the early dispersal of Austronesian languages.

Fig. 2 Frequency map of probable Neolithic markers (lineages argued to track one or other of the dispersals associated with Neolithic ceramics) in mtDNA and genome-wide data. a Pooled frequency of candidate “out-of-Taiwan”, “Neolithic II” mtDNA haplogroups, based on founder analysis. b Possible “out-of-Taiwan”, “Neolithic II” component in the genome-wide data when considering 10 ancestral populations in the ADMIXTURE analysis. c Pooled frequency of candidate MSEA “Neolithic I” haplogroups in ISEA. d Possible MSEA “Neolithic I” component in the genome-wide data when considering 10 ancestral populations in the ADMIXTURE analysis. The outline map was obtained from http://www.outline-world-map.com Full size image

On the male line of descent, the Neolithic contribution is lower (15–20 %) but, since MSEA is not represented in the Y-chromosome dataset, all these Neolithic founders are likely to represent the putative “out-of-Taiwan” dispersal, mirroring closely the ~20 % “out-of-Taiwan” founders for mtDNA. Most of O1a and all of O1a2 likely represent signals of Neolithic migrants from Taiwan, confirming earlier suggestions (Karafet et al. 2010; Trejaut et al. 2014). A portion of O3a (~10 % in the f1 criterion) was also partitioned into the Neolithic in our analysis.

Corroboration of founder analyses with genome-wide evidence

We next compared these results with patterns observed in autosomes, using genome-wide data from the Pan-Asian SNP Genotyping Database (Abdulla et al. 2009; Ngamphiw et al. 2011) and the ADMIXTURE software.

At a more basal level, the first that seems anthropologically and genetically potentially valid (K = 5, which includes African, South Asian and Near Oceanian components in purple, white and blue) (Fig. 3a), the East Asian autosomal data separate into a Southeast Asian component (green) with a focus on the ancient Sunda continental shelf (MSEA, Sumatra, Java and Borneo) that varies from ~80 % around Borneo and drops in frequency as one moves north, and a Chinese/Northeast Asian component (red), which varies between 100 and 60 % in mainland China. Frequencies of the latter in Taiwan (~30 %) and Southeast Asia (5–30 %) match the mtDNA picture of Neolithic-age Chinese gene flow into ISEA (Fig. S8; cf. Fig. 2a). It is, however, difficult to directly connect a given component in ancestry analysis with a given demographic occurrence. One could calculate the time of admixture, but admixture ages are not necessarily indicative of time of migration (Lipson et al. 2014). In addition, the ages calculated are sometimes dubious and under-estimated as the estimated time of split between Europeans and New Guineans suggests (Wollstein et al. 2010).

Fig. 3 Reconstruction of ancestry in Asian populations using ADMIXTURE. Considering a five ancestral populations (K = 5) and b 10 ancestral populations (K = 10) Full size image

Analyses from K = 6 to K = 15 generate additional components by further sub-dividing these Northeast and Southeast Asian components, whilst maintaining the African, South Asian and Melanesian/Near Oceanian components intact across the analyses. The autosomal estimate with ten ancestral populations, theoretically the best estimate of ancestry for the data as it has the lowest cross-validation error, includes seven components with discernible frequencies in at least one location in Austronesian-speaking populations (Fig. S9). The South Asian component (white) is present at low frequencies only in the Malay Peninsula and Sumatra, matching the historical record (Manguin et al. 2011). The Northeast Asian component (red) is seen at appreciable frequency only in the Philippines (at only very low rates). A Near Oceanian component (pale blue) dominates many of the populations of Eastern Indonesia, as expected. Two minor components (dark green and yellow) are virtually specific to ISEA, mainly in what was western Sundaland (Java/Sumatra/Malay Peninsula), with one (yellow) markedly elevated in Aboriginal Malays.

One important component (grey) is both specific for Austronesian-speaking populations and highly frequent across ISEA (Fig. S9). It reaches 60–70 % in the two aboriginal Taiwanese groups in the sample—the equivalent cluster in Pan-Asian SNP data approaches 100 % (Abdulla et al. 2009)—peaking in our dataset in the Philippines, Sumatra and Sulawesi (70–90 %), and is virtually absent from Continental Asia, suggesting an insular origin. Comparison between the analyses with five and ten ancestral populations also suggests that this was part of the larger Southeast Asian component in the former. Considering the major postglacial signal observed in mtDNA and Y-chromosome variation in both our founder analysis and in earlier analyses (Hill et al. 2007; Karafet et al. 2010; Soares et al. 2008, 2011; Trejaut et al. 2014), and the sharing of many lineages between ISEA and Taiwan (Soares et al. 2008, 2011), this autosomal component may correspond to an ancestral cluster common to both Taiwan and ISEA that was established before the hypothetical dispersal of Austronesian. Even if we consider that there is likely a signal of Austronesian expansion “out-of-Taiwan” in the genome-wide data (see below), this component, which is most frequent in Taiwan, the Philippines, the Mentawai Islands and Sulawesi, disparate islands at opposite extremes of the Sunda shelf, could explain why a maximum likelihood population tree of the Pan-Asian SNP data indicated Taiwan as an offshoot of ISEA diversity (Abdulla et al. 2009). Such population trees only depict broad patterns and, although a minor component could show an ancestry in Taiwan when compared with ISEA, the most frequent component could show the overall opposite ancestry.

Two autosomal components that might signal Neolithic dispersals can be compared with the patterns obtained from Neolithic founder candidates in the mtDNA analysis. One of these components (paler green in Fig. 3b) is frequent in MSEA/Southwest China (up to ~70 %) and varies from 5 to 40 % in Indonesia (Fig. 2d), but is absent from Taiwan and rare in the Philippines. It is probably a relatively recent arrival as it is not evenly distributed across ISEA. The MSEA Neolithic candidates in the mtDNA also show a strong peak of frequency in MSEA and frequencies of 5–30 % in Indonesia, but are rare in the Philippines and Taiwan (Fig. 2c). We can also match these distributions with the presence of basket-marked and carved paddle-impressed pottery: in Sarawak, assemblages at ~4.5 ka with carved cord-wrapped or basketry-wrapped paddle-impressed pottery (Bellwood 1997; Bulbeck 2008) show the influence of an early Neolithic from MSEA in Western Indonesia.

The final component (dark blue in Fig. 3b) has a high frequency in South China (Fig. 2b) and is also seen in Taiwan at ~25–30 %, in the Philippines at ~20–30 % (except in one location which is almost zero) and across Indonesia/Malaysia at 1–10 %, declining overall from Taiwan within Austronesian-speaking populations. The mtDNA candidates for “out-of-Taiwan” markers (Fig. 2a) also show an overall frequency of up to ~35 % in Taiwan and the Philippines, but are almost absent in parts of Borneo, Java and Eastern Indonesia. Sumatra superficially presents a more discordant picture between genome-wide and mtDNA results, but the sampling of the Pan-Asian SNP dataset involves only Batak people whilst our mtDNA sampling involved the wider Sumatran population. We should also bear in mind that the genome-wide sampling lacks major areas of ISEA, including the whole of Borneo.

Therefore, the overall picture from the ADMIXTURE analysis with 10 ancestral populations where the cross-validation error was the lowest, is concordant with the mtDNA and Y-chromosome pattern, with a minor Neolithic input from MSEA, probably immediately preceding a Neolithic input from Taiwan (Anderson 2005) that had a strong demographic impact in the Philippines, but a much more minor genetic input elsewhere in the Indo-Malaysian Archipelago.

Confirmation with whole-mtDNA genome data

Although providing much larger sample sizes, the low phylogenetic resolution of mtDNA HVS-I data can create problems for phylogeographic analyses such as founder analysis, for example by conflating distinct founders. In parallel, we therefore checked the phylogeographic signal with the much better resolved whole-mtDNA genomes for the major “out-of-Taiwan” haplogroup in the founder analysis, M7c3c. In particular, we wished to compare the results for M7c3c with the two putative postglacial signals for haplogroups E and B4a1a (Soares et al. 2008, 2011).

Haplogroup M7 dates to just over 50 ka. An overall mainland Eastern Asian distribution is clear for the M7 phylogeny (Fig. 4; full tree in Supplementary Material 2). There are two basal branches, M7a, which displays a strong Northeast Asian ancestry centred on Japan and Korea, and a second major clade encompassing M7b, M7c, M7d, M7e, M7f and M7g, which we refer to as M7b′c′d′e′f′g. This splits into two further major subclades, M7b′d′g and M7c′e′f both with an East Asian ancestry.

Fig. 4 Schematic tree of haplogroup M7. The tree is scaled using maximum likelihood and a time-dependent molecular clock for whole-mtDNA genomes Full size image

The overall phylogenetic and phylogeographic pattern is strikingly clear: both aboriginal Taiwanese and Island Southeast Asian-specific lineages are close to the tips of an overall mainland Eastern Asian distribution. The major subclade of M7b3, M7b3a, is only present in Taiwan and ISEA. It is frequent in Taiwan (at ~10 %) and considering its age (~6 ka) seems likely to have arrived in Taiwan with the rice Neolithic from South China; but it is vanishingly infrequent across ISEA. In M7b1, M7b1d3 is also restricted to Taiwan, and with a similar age may also have arrived from China with the Neolithic, but again it is virtually absent from ISEA.

In M7c′e′f, the three subclades branch from a single node and all show evidence of East Asian ancestry. Within M7c, M7c3 is by far the most frequent and the only one to disperse significantly into Taiwan and ISEA. This clade probably had an origin in South China, with several subclades also present in Taiwan. Its major subclade, M7c3c [M7c1c in Hill et al. (2007)], here re-dated with whole-mtDNA genomes to ~5 ka, is restricted to Austronesian-speaking populations (both Taiwan and ISEA). Given the presence of other subclades of M7c3 in Taiwan and South China, the most probable source for M7c3c is in Taiwan (amongst M7c3 arrivals from China, again perhaps with the rice Neolithic), with subsequent dispersal into ISEA. Several subclades of M7c3c exist throughout Taiwan and ISEA, and there is also one in the Pacific (M7c3c2, found in both Micronesia and the Solomon Islands), dating to less than 3 ka. This pattern confirms M7c3c as a strong candidate for an “out-of-Taiwan” marker, as indicated by the HVS-I founder analysis.

We can contrast this distinctive pattern with the distribution of haplogroups B4a1a and E, both of which are—like M7c3c—largely restricted to insular, Austronesian-speaking populations. For that reason they have been proposed as candidates for “out-of-Taiwan” markers, but neither shows a direct ancestry in South China. We propose here a set of phylogeographic parameters that we expect to see fulfilled in a clear-cut “out-of-Taiwan” marker:

(a) If the haplogroup was carried into Taiwan from South China by rice-agriculturists ~6 to 8 ka, the dispersal’s timing should be bracketed by the age of the ancestral clade seen in South China (upper bound) and the insular Austronesian-specific subclade (lower bound); (b) the insular and Austronesian-specific subclade should date to after the arrival of rice-agriculturists from China ~5.5 ka, but before the “out-of-Taiwan” migration ~4.5 ka; (c) the founder age in ISEA for the subclade should date to ~4.5 ka, the time of the “out-of-Taiwan” dispersal; (d) the founder age from Taiwan/Philippines to the rest of ISEA should be lower than the date of the “out-of-Taiwan” migration, ~4 ka; and (e) the expansion of the clade in Taiwan should predate the expansion in ISEA.

We evaluated each of these points in turn (Fig. 5; Table S12; note that taking into account mutation-rate uncertainty, as documented in Table S12 does not alter the conclusions). First, we consider the ML ages of key subclades, then founder ages, and finally Bayesian skyline plot (BSP) expansion time estimates. Regarding (a), B4a1a appears in Austronesian-speaking populations between 14.7 [11.0; 18.5] ka, the age of the continental ancestral clade B4a1, and 9.9 [5.5; 14.5] ka, the age of B4a1a; haplogroup E appears between 39.2 [26.9; 52.0] ka, the age of ancestral M9, and 24.0 [14.5; 33.9] ka; and M7c3c appears between 11.8 [3.9; 20.2] ka- the age of M7c3- and 5.2 [4.0; 6.5] ka. Only M7c3c clearly fits an arrival in Taiwan in line with the “out-of-Taiwan” model. B4a1a cannot be completely ruled out from these estimates, given the 95 % confidence interval of the age estimate, but it is nevertheless very unlikely (Fig. 5a, b).

Fig. 5 Phylogeographic patterns in haplogroups M7c3c, E and B4a1a1. a ML ages of key clades in the test for an “out-of-Taiwan” pattern; ρ founder ages from Taiwan into ISEA; ρ founder ages from Taiwan and the Philippines into the rest of ISEA. b Detailed view of the most relevant time-frame for the data in a. c–e Increments in expansion of haplogroups B4a1a (c), E (d) and M7c3c (e), measured from Bayesian skyline plots as effective population size change per 100 individuals per 100 years, in Taiwan and ISEA Full size image

Point (b) stipulates that the insular subclade should originate after the hypothetical arrival of rice-agriculturists in Taiwan and before the dispersal “out-of-Taiwan”. M7c3c, at 5.2 [4.0; 6.5] ka, follows this pattern; B4a1a, at 9.9 [5.5; 14.5] ka, and haplogroup E, at 24.0 [14.5; 33.9] ka, both suggest an earlier origin within currently Austronesian-speaking populations.

Taking point (c), an average founder age for M7c3c from Taiwan into ISEA is 4.4 [3.2; 5.7] ka, matching the 4.5 ka prediction of the “out-of-Taiwan” model. Haplogroups E and B4a1a yield 8.8 [6.0; 11.6] ka and 7.3 [5.2; 9.4] ka, respectively, suggesting earlier postglacial expansions. When including the Philippines along with Taiwan as part of the source for the dispersal—point (d)—the founder for haplogroup M7c3c dated a little lower at 4.2 [2.5; 5.9] ka—a striking match to the hypothetical Austronesian arrival in the Indo-Malaysian archipelago. Haplogroup E, by contrast, yielded 6.4 [4.8; 8.0], and the B4a1a point estimate actually increased to 8.5 [4.8; 12.3] ka, when compared with the previous founder age estimate into ISEA as a whole, clearly indicating that the “out-of-Taiwan” assumption of the founder model in this case is likely to be false.

Finally (e), we used BSPs to estimate the expansion time of each haplogroup. Figure 5c–e show the increment or rate of expansion (corresponding skyline plots in Fig. S10; data in Table S13). The B4a1a data for Taiwan and ISEA (Fig. 5c) suggest a very similar time of expansion, starting ~10 ka (with a second expansion restricted to Taiwan ~2000 years ago). However, haplogroup E expanded in ISEA before Taiwan (Fig. 5d), starting ~8 ka for ISEA and ~7 ka for Taiwan. Finally, for M7 we see a first expansion in Taiwan starting ~7.5 ka, peaking at 5.2 ka, while for ISEA the expansion starts later at 5.2 ka with peak at ~4 ka, corresponding closely to the “out-of-Taiwan” model.

Therefore, haplogroup M7c3c meets all the criteria expected for an “out-of-Taiwan” marker, whereas haplogroups E and B4a1a meet none of them. Yet a haplogroup E lineage recently recovered from human remains in the Strait of Taiwan, dating to ~8 ka, evidently represents a sequence ancestral to the E1 subclade, leading Ko et al. (2014) to suggest an origin of haplogroup E ~10 ka ago in China or Taiwan and a Neolithic migration into ISEA (based on a Bayesian analysis). This compares with our estimate for the age of haplogroup E with the time-dependent clock (Soares et al. 2009) of ~24 ka (Fig. 5). Previous age estimates based on the time-dependent clock and Bayesian ancient DNA calibrations do not differ to this extent (Fu et al. 2013b), despite some claims to the contrary. The authors of one recent estimate based on several ancient sequences claim that their estimated rate is 45 % faster than the one we estimated (Brotherton et al. 2013), but this arises from their comparing their estimated rate with our inter-specific phylogenetic rate rather than the time-dependent rate. For the time-frame of the European Neolithic and Bronze Age with which they were concerned, our curve indicates a mutation rate of 2.307 × 10−8 substitutions per site per year for the time of 6.15 ka (their oldest sample), only 4 % slower than the one they estimated. The difference would be even less for the age of their other, younger samples.

Here, indeed, we estimate an age for haplogroup E of 29.7 [18.5; 43.9] ka and an average mutation rate of 2.041 × 10−8 [1.54 × 10−8; 2.48 × 10−8] substitutions per site per year using a Bayesian estimate with two additional East Asian ancient DNA sequences. Given that the root of haplogroup E is seven mutations from the root of the “out-of-Africa” haplogroup M (Macaulay et al. 2005; Mellars et al. 2013) which has an average branch length to the present-day (~50,000 years) of ~20 mutations, age estimates for E more recent than ~20 ka seem implausible.

Involving haplogroup E in a wide-scale Neolithic dispersal across and out of mainland China also ignores the evidence that haplogroup E is restricted to the off-shore islands and has never been seen in any extant Chinese populations. Its age of >20 ka and insular distribution rather suggest an origin on the eastern side of the Sunda shelf. Although the early Holocene haplogroup E sequence creates a deeper link within E1, extant diversity haplogroup E diversity nevertheless remains deeper in ISEA, for both E1 and E2 (Soares et al. 2008). Moreover, a large mtDNA survey of aboriginal Taiwanese groups, which probably diverged early in Austronesian history, but were subsequently isolated and experienced drift very differently from other Austronesian populations, failed to detect any novel haplogroup E diversity, finding the same sub-set of ISEA diversity (Ko et al. 2014). The 8-ka age of the sample would place it in a period of intense postglacial expansions, due to huge sea-level changes resulting from global warming, and might be better explained as an offshoot from the south, where many lineages were lost in the postglacial period. We would caution against drawing strong conclusions from a single sample. Nevertheless, regardless of its point of origin, our analyses show that haplogroup E most probably expanded in ISEA well before the Neolithic period.