For each position within these regions, we determined whether the Neandertal, A00, or both differed from the human reference sequence. We then used the corresponding chimpanzee allele as a proxy for the ancestral state in order to assign the mutation to the appropriate branch of the tree relating the four sequences ( Figure 1 A). In doing so, we discarded five sites: two at which the chimpanzee carries a third allele, one for which the chimpanzee carries a deletion, and two that were specific to A00 but only supported by a single read. Excluding these sites had little impact on our analyses.

(B) Counts of SNVs consistent with each branch. Columns refer to sets of coordinates considered (see Materials and Methods ). Incompatible sites are those that cannot be explained by a single mutation on any of the three trees.

(A) A priori, three trees could feasibly have related the Y chromosomes of the chimpanzee (Chimp), the Neandertal (Neander), haplogroup A00, and the human reference (Ref). Mutations on branch a support topology i, with the Neandertal lineage as the outgroup to those of modern humans, whereas mutations on branches b and c support topologies ii and iii, respectively. Branches d, e, and f correspond to mutations private to individual lineages.

Using the blastz file chrY.hg19.panTro4.net.axt.gz,we identified the subset of regions within which the human sequences align to the chimpanzee reference. This yielded a total of 118,643 base pairs (bp). In what follows, we refer to this set of sites as “filter 1.” We also identified a second, more restrictive, set of regions totaling 100,324 bp, “filter 2,” by further requiring that the alignment correspond to the chimpanzee Y chromosome rather than to another chimpanzee chromosome (Tables S1A and S1B).

We then identified overlapping regions and excluded coordinates with unusually high coverage, filtering out sites with coverage greater than the mean plus five times its square root ( Figure S1 ). Under a Poisson model, this cutoff would elicit the loss of less than one genuine site per 10,000. Finally, we removed sites with inconsistent base calls, discarding those with more than two reads differing from the consensus allele and those for which more than one third of the observed bases did not match the consensus. This filter should minimize the effects of postmortem DNA damage and of modern contamination.

We used the Y-chromosome sequences from the exome capture of a Neandertal from El Sidrón, Spain,and we downloaded the complete sequences of two A00 Y chromosomes.The Neandertal data included coding, non-coding, and off-target sequences, and all three sequences were mapped against the GRCh37 reference.Given that the A00 sequences were closely related,we merged them to increase coverage. We called bases for both the Neandertal and A00 sequences by using SAMtools mpileup (v.1.1),specifying input options to count anomalous read pairs (-A), recalculate base qualities (-E), and filter out poor-quality bases (-Q 17) and poorly mapping reads (-q 20).

Estimating TMRCA

NR ), we decomposed this quantity ( AR ) and the time separating the most recent common ancestor of modern humans from its common ancestor with the Neandertal lineage (T NM ): T NR = T AR + T NM = α T AR α ≡ ( 1 + T NM T AR ) .

We then estimated T AR and used two methods to estimate α. Figure 2 Estimating the TMRCA of Neandertal and Modern Y Chromosomes Show full caption NR = T NM + T AR . Branches are labeled as in The quantity of primary interest is T= T+ T. Branches are labeled as in Figure 1 , and “M” denotes the most recent common ancestor of modern human lineages. To estimate the TMRCA of the Neandertal and modern human Y chromsomes (T), we decomposed this quantity ( Figure 2 ) into the sum of the TMRCA of modern humans (T) and the time separating the most recent common ancestor of modern humans from its common ancestor with the Neandertal lineage (T):We then estimated Tand used two methods to estimate α.

AR , we used sequence data from the ancient Ust’-Ishim sample, 9 Fu Q.

Li H.

Moorjani P.

Jay F.

Slepchenko S.M.

Bondarev A.A.

Johnson P.L.F.

Aximu-Petri A.

Prüfer K.

de Filippo C.

et al. Genome sequence of a 45,000-year-old modern human from western Siberia. 19 Poznik G.D.

Henn B.M.

Yee M.-C.

Sliwerska E.

Euskirchen G.M.

Lin A.A.

Snyder M.

Quintana-Murci L.

Kidd J.M.

Underhill P.A.

Bustamante C.D. Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. 20 Skaletsky H.

Kuroda-Kawaguchi T.

Minx P.J.

Cordum H.S.

Hillier L.

Brown L.G.

Repping S.

Pyntikova T.

Ali J.

Bieri T.

et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. 21 Rasmussen M.

Anzick S.L.

Waters M.R.

Skoglund P.

DeGiorgio M.

Stafford Jr., T.W.

Rasmussen S.

Moltke I.

Albrechtsen A.

Doyle S.M.

et al. The genome of a Late Pleistocene human from a Clovis burial site in western Montana. AR as well as for the mutation rate and the TMRCA of haplogroup K-M526 ( To estimate T, we used sequence data from the ancient Ust’-Ishim sample,first applying the filters described for the A00 sequences. To reduce the potential impact of postmortem DNA damage, we restricted this analysis to coordinates covered by at least three sequencing reads. We further restricted to the subset of Poznik et al.regions in which the human reference sequence is based on bacterial artificial chromosome clones derived from the RP-11 individual,a known carrier of haplogroup R1b. This left ∼7.83 Mb of sequence within which to assign variants to the appropriate branches ( Figure S2 Appendix A ). Using the known age of the Ust’-Ishim individual and the constrained optimization procedure described in Rasmussen et al.,we obtained parametric bootstrap estimates for Tas well as for the mutation rate and the TMRCA of haplogroup K-M526 ( Appendix A ). Briefly, we sampled from the process that generated the observed tree ( Figure S2 ) by simulating the number of single nucleotide variants (SNVs) on each branch as a Poisson draw with mean equal to the observed number of mutations. To obtain bootstrap samples of the three parameters, we maximized their joint likelihood for each tree replicate.

T a T a + T d + T e = T NM T NM + 2 T AR = ( α − 1 ) T AR ( α − 1 ) T AR + 2 T AR = α − 1 α + 1 .

In our first approach to estimate α, we used the relative numbers of mutations assigned to branches a, d, and e ( Figure 1 ), assigning the four sites that did not fit the consensus topology to the A00 or reference lineages, as appropriate ( Appendix A ). The proportion of time represented by branch a is:

Therefore, assuming a time-homogeneous mutation rate, the number of branch-a mutations is binomially distributed with parameters p = (α – 1) ∕ (α + 1) and n equal to the total number of mutations. Estimating p from the data leads directly to a point estimate and confidence interval (CI) for α. This first method has the appealing property that it is independent of both the mutation rate and the absolute values of the times. However, the estimation error might be suboptimal due to uncertainty in both the numerator and the denominator.

T NM / T AR , making use of the fact that we can estimate T AR with greater certainty than we can T NM . To estimate T NM , we restricted our attention to sequences overlapping the ∼8.8 Mb of sequence analyzed by Karmin et al., 15 Karmin M.

Saag L.

Vicente M.

Wilson Sayres M.A.

Järve M.

Talas U.G.

Rootsi S.

Ilumäe A.-M.

Mägi R.

Mitt M.

et al. A recent bottleneck of Y chromosome diversity coincides with a global change in culture. T ˆ NM = s / ( l r μ ) . Similarly, let L equal the subset of the 8.8 Mb for which the A00 sequence had 3× or greater coverage (also ∼8.8 Mb), and let S equal the number of mutations unique to either the reference sequence or to A00 over the entire 8.8 Mb. We can estimate T AR with T ˆ AR = S / ( 2 L μ ) and α with: α ˆ = ( 1 + T ˆ NM T ˆ AR ) = ( 1 + 2 s L r ˆ S l ) .

In the second method, we estimated α via the ratio, making use of the fact that we can estimate Twith greater certainty than we can T. To estimate T, we restricted our attention to sequences overlapping the ∼8.8 Mb of sequence analyzed by Karmin et al.,leaving 80,420 bp (“filter 3”) or 75,596 bp (“filter 4”) when confining our analysis to those sites that passed filter 2 ( Figure 1 , Tables S2a and S2b). Let l equal the total length of sequence under consideration (e.g., 80.42 kb), let μ equal the mutation rate over the full 8.8 Mb, let r equal the ratio of the mutation rate within the smaller region to that of the larger, and let s equal the number of mutations shared by A00 and the reference sequence within the smaller region. With these, we constructed the estimator. Similarly, let L equal the subset of the 8.8 Mb for which the A00 sequence had 3× or greater coverage (also ∼8.8 Mb), and let S equal the number of mutations unique to either the reference sequence or to A00 over the entire 8.8 Mb. We can estimate Twithand α with:

15 Karmin M.

Saag L.

Vicente M.

Wilson Sayres M.A.

Järve M.

Talas U.G.

Rootsi S.

Ilumäe A.-M.

Mägi R.

Mitt M.

et al. A recent bottleneck of Y chromosome diversity coincides with a global change in culture. We estimated r by comparing the number of mutations unique to a single branch of the Y-chromosome tree of Karmin et al.,both within the full 8.8-Mb region and within the ∼80-kb subset. These numbers, 32,853 and 279 (238 under filter 4), respectively, correspond to a relative mutation rate of 0.93 (95% CI: 0.82–1.04) (0.84 under filter 4 [95% CI: 0.74–0.95]). Because selection has the strongest effect on lower frequency mutations, we also estimated r by using only shared variants, and this yielded nearly identical point estimates.

Finally, to construct a CI for α, we sampled values of s and S from Poisson distributions with means equal to the observed numbers of mutations, and we sampled rl/L as the ratio of two Poisson random variables with means equal to 279 (or 238) and 32,853, respectively.