Sequencing and the single sperm During meiosis, homologous chromosomes undergo doublestrand breaks in DNA that can cross over, shuffling genetic material. However, not every double-strand break resolves in a crossover event. Hinch et al. wanted to determine the rules governing DNA recombination. They developed a method to sequence individual mouse sperm and applied it to mice carrying two different alleles of a protein involved in mammalian crossovers. A high-resolution genetic map revealed the relationships between the distribution of crossovers, proteins involved in recombination, and specific factors determining whether a double-strand break becomes a crossover. Science, this issue p. eaau8861

Structured Abstract INTRODUCTION In diploid organisms, the two chromosomes in each homologous pair act independently of each other during most cellular functions. An exception occurs in meiosis, in which the pair of chromosomes must first locate each other in the cell nucleus and then physically exchange genetic material through recombination and crossing over. This physical exchange is mechanistically essential for proper chromosomal segregation in meiosis. Along with mutation, it also shapes patterns of genetic variation in natural populations, providing the substrate on which natural selection acts. Recombination is initiated by the formation of programmed DNA double-strand breaks (DSBs). Repairing these breaks entails a search for the matching sequence in the homologous chromosome. Although DSBs are predominantly repaired using the homolog as template, only a small proportion result in the formation of crossovers. RATIONALE Most DSBs occur in narrow regions called recombination hotspots, defined in mice and many other species by the DNA binding specificities of the protein PRDM9. Despite recent progress, much remains unknown about the molecular processes occurring during meiosis and the factors affecting repair outcomes for individual DSBs. We have developed and applied a method for whole-genome amplification and DNA sequencing of single sperm to provide genome-wide maps of crossovers with unprecedented resolution. We combined this with molecular assays for various meiotic stages: H3K4me3 (which measures PRDM9 binding), SPO11-oligos (which count DSBs), and DMC1 on single-stranded DNA (which measures the number and persistence of DSBs). RESULTS We report single-cell sequencing of 217 sperm from a hybrid mouse. We inferred 2649 crossovers genome-wide, resolved to a median resolution of 916 base pairs (bp), with 386 crossovers resolved within 250 bp. By comparing our high-resolution crossover map with stage-specific molecular measures of recombination, we identify four factors that strongly increase the chance that a particular DSB will resolve as a crossover: (i) whether PRDM9 has bound the uncut template chromosome (the chromosome used for repair) at the site of the DSB, (ii) the proximity of the hotspot to the telomere, (iii) local GC content, and (iv) the Prdm9 allelic type of the hotspot. We show that each of these four factors also consistently decreases homolog engagement time, specifically the time until single-stranded DNA at the DSB site has located and invaded the DNA duplex of its homolog. We show that the precise location of the breakpoint in a crossover—the switchpoint from one homolog to the other—is also affected by whether PRDM9 has bound the template. Crossover breakpoints are modulated by the chromatin environment of the template chromosome, avoiding positions occupied by nucleosomes. We also find that the pseudoautosomal region, which must have a crossover in males, is likely determined by cis-acting factors and has a higher than expected use of hotspots that are activated independently of PRDM9. CONCLUSION Our work identifies several additional roles for PRDM9 in meiosis beyond positioning DSBs. We show that the breaks that are fastest to engage their homolog are more likely to repair as crossovers. Each of the contributing factors we identified also suggests mechanisms that could facilitate the otherwise seemingly intractable challenge of homology search. Identification of crossovers in sperm and factors affecting the repair of double-strand breaks in meiosis. (Top) Meiotic cells undergo programmed double-strand breaks, some of which resolve as crossovers. DNA sequencing of single sperm identifies sites of crossover. (Bottom) A double-strand break is more likely to engage its homolog quickly (and to resolve as a crossover) if PRDM9 binds the same location on the homologous chromosome, if it is near the telomere, or if the local GC content is high.

Abstract Recombination is critical to meiosis and evolution, yet many aspects of the physical exchange of DNA via crossovers remain poorly understood. We report an approach for single-cell whole-genome DNA sequencing by which we sequenced 217 individual hybrid mouse sperm, providing a kilobase-resolution genome-wide map of crossovers. Combining this map with molecular assays measuring stages of recombination, we identified factors that affect crossover probability, including PRDM9 binding on the non-initiating template homolog and telomere proximity. These factors also influence the time for sites of recombination-initiating DNA double-strand breaks to find and engage their homologs, with rapidly engaging sites more likely to form crossovers. We show that chromatin environment on the template homolog affects positioning of crossover breakpoints. Our results also offer insights into recombination in the pseudoautosomal region.

Recombination is a fundamental component of meiosis, the process that creates gametes in sexually reproducing organisms, and ensures the correct segregation of homologous chromosomes into daughter cells (1). Along with mutation, it shapes patterns of genetic variation in populations, providing the substrate for natural selection.

In many species, recombination events occur mainly in narrow regions of the genome called recombination hotspots (2). In humans, mice, cattle, and likely many other vertebrates (3), an early step in recombination is the binding of DNA by the histone methyltransferase PRDM9 (2). A subset of PRDM9 binding sites are subject to the formation of programmed double-strand breaks (DSBs). These breaks are repaired by a specialized pathway, which involves the meiosis-specific protein DMC1 and uses the homologous chromosome as the template for repairing the break (1). How the correct homologous sequence is located efficiently among the bulk of chromatin-embedded nuclear DNA remains unclear (4). A subset of the breaks repaired via the homolog become crossovers, whereas the majority resolve without a crossover (5). Any remaining DSBs are likely repaired using the sister chromatid as template (6). Despite its fundamental importance, critical aspects of the meiotic recombination process remain poorly understood.

Most mammals make only a few crossovers per chromosome (7), even though the number of DSBs is substantially greater (8). This raises the question of how the cell determines which DSBs will be repaired as crossovers. Although it is clear that not all DSBs are equally likely to resolve as a crossover (9–13), the factors affecting this decision remain largely unknown. Improper crossing over leads to aneuploidy, which affects 20 to 30% of human eggs and 1 to 8% of human sperm (14).

There are currently two major impediments to understanding crossover formation. First, pedigree-based maps in humans and mice only localize crossovers within tens to hundreds of kilobases (15–17). Cytological assays are informative about the staging of meiotic events (18), but crossovers can be placed only within large domains containing dozens if not hundreds of hotspots. An alternative approach, identifying recombinant molecules in sperm at targeted sites, has high precision but is limited to a small number of selected hotspots (10, 19, 20). Whole-genome sequencing of single sperm can identify crossovers genome-wide; however, existing methods have low resolution (21, 22). Second, analyses of genetic maps can be complicated by allelic variation in Prdm9, which leads to distinct sets of hotspots, both within and between populations (17, 23, 24). This makes it difficult to connect the initiation of recombination with the final outcome of crossovers.

A single-cell DNA sequencing method to identify crossovers in individual sperm We have developed a method for amplifying and sequencing DNA from single cells and have applied it to sperm to identify crossovers with high resolution (25). We isolate a cell mechanically and amplify its DNA by means of RNA random priming and Klenow fragment synthesis (Fig. 1A) (25). Our method achieves uniform coverage, with regions missed randomly, rather than systematically as a result of genomic features (figs. S1 and S2). Fig. 1 Experimental design for inferring crossovers from single sperm cells. (A) An illustration of the method for whole-genome amplification (WGA) of isolated single sperm cells (25). Random RNA oligonucleotides act as primers for WGA mediated by Klenow fragment, which displaces adjacent synthesized fragments to form overlapping single-stranded DNA copies. These in turn serve as templates for primer annealing and chain extension. The resulting amplicons are converted into double-stranded DNA for sequencing. (B) Sequencing depth and genome coverage achieved for each of 217 sperm. (C) An Integrative Genomics Viewer illustration of how a crossover was called by our method. The horizontal light gray lines show the reads that mapped in a region of chromosome 13 for a particular sperm. Vertical dark gray bars highlight variants found only in B6; orange bars highlight variants found only in CAST. The crossover breakpoint lies within a region of uncertainty (green). This crossover overlapped a PRDM9HUM hotspot, identified by DMC1 ChIP-seq (25), whose center was inferred to be at position 113,864,493. A good match to the PRDM9HUM binding motif occurs in the purple region. This approach was applied to 217 sperm cells from a single adult F 1 hybrid mouse, derived from a cross between the C57BL/6J (henceforth B6) and CAST/EiJ (henceforth CAST) inbred strains. The B6 mother is Mus musculus domesticus and is genetically altered at the Prdm9 gene to be homozygous for an allele carrying the zinc-finger domain found in many human populations (26), which we refer to as Prdm9HUM. The CAST father is M. musculus castaneus and is homozygous for the mouse wild-type Prdm9 allele we call Prdm9CAST. The F 1 mouse (hereafter, hybrid) is thus heterozygous at Prdm9, which allows us to compare the properties of hotspots associated with the two Prdm9 alleles. Because Prdm9HUM is not found in mice, we can separate biological properties of recombination from species-specific evolutionary effects. We also chose this design for the high sequence divergence between the parental strains (27), which improves localization of crossovers and often allows us to assign events specifically to one or the other of the two homologous chromosomes. We sequenced the individual sperm from the hybrid mouse to a median depth of 6.3×, which yielded a median genome coverage of 62% (Fig. 1B). The genome coverage was stable, with 90% of sperm having coverage between 48% and 70% of the genome (fig. S1). All sperm were euploid and there were no significant differences between the number of sperm carrying the X (108) or the Y (109) chromosome. Each chromosome in the sperm is expected to consist of one or more segments of B6 and CAST genomes, with transitions between the haplotypes representing crossovers. We developed a hidden Markov model–based computational approach that maps sequencing reads to the B6 or the CAST haplotype (25, 27) and identifies the most likely haplotypes, taking sequencing error into account (Fig. 1C). We applied this approach to identify 2649 crossovers in our 217 sperm samples. The median resolution of crossovers is 916 base pairs (bp), with 386 crossovers localized within 250 bp (fig. S3). This large study of crossovers localized at a fine scale in a mammal provides a resource for understanding crossover formation genome-wide.

Molecular assays Recombination is a multistage process (2). We draw attention here to aspects of five of those stages. PRDM9 binds DNA in a sequence-specific manner (stage 1) and places an H3K4me3 (histone H3 Lys4 trimethylation) mark on nearby nucleosomes (stage 2). SPO11 makes double-strand breaks (stage 3), which are resected to form single-stranded DNA (ssDNA) decorated with the meiosis-specific strand-exchange protein DMC1 and other proteins. The ssDNA covered with DMC1 undergoes a search for its homologous sequence (stage 4) and invades the homologous chromosome. This results in the formation of joint molecules, a subset of which are resolved as crossovers (stage 5). To identify the factors affecting this process, we used data measuring H3K4me3 (stage 2) (28) and performed assays for DMC1 bound to ssDNA (stage 4) in testes. We focus on processes after the SPO11-induced DSB. In analyses described below in the B6 mouse, we can compare counts of DSBs, as measured directly using the SPO11-oligos produced with each DSB (29), with downstream properties. However, SPO11-oligo measures require impractically large numbers of mice and are not available in our hybrid. H3K4me3 levels and SPO11-oligos have high biological correlation (r ≈ 0.83) (fig. S4) (25). Therefore, where necessary in the hybrid mouse, we use measures of the H3K4me3 mark at hotspots as a surrogate for DSB counts. Two distinct factors affect chromatin immunoprecipitation sequencing (ChIP-seq) measures for DMC1 on ssDNA: (i) the number of breaks, and (ii) how long the ssDNA remains unpaired, which leads to the persistence of DMC1 near the break site (25). Our peak calling algorithm (26) identified 24,586 peaks for DMC1. We also called peaks in H3K4me3 (25). In addition to hotspots, H3K4me3 is found at transcription start sites and other functional elements because of its role in the regulation of gene expression.

Properties of crossovers Among the 2649 genome-wide crossovers identified, 2615 crossovers are autosomal, corresponding to an autosomal map length of 12.1 M, similar to previous work (16, 30). We confirmed robust crossover assurance on the autosomes (Fig. 2A). The number of crossovers per cell is compatible with random segregation of homologous chromosomes and sister chromatids, with no evidence of systematic variation in the number of crossovers between gametes (figs. S5 and S6). Fig. 2 Properties of crossovers and recombination hotspots. (A) Average number of crossovers (±SE) called per chromosome per sperm, showing at least, and in many cases almost exactly, one crossover per chromosome per meiosis (equivalently 0.5 crossovers per haploid sperm). (B) Distribution of H3K4me3 intensity in all autosomal recombination hotspots identified by DMC1 ChIP-seq (blue) after removing hotspots that show evidence of PRDM9-independent H3K4me3 [e.g., transcription start sites (25)]. If crossovers occurred in proportion to the hotspot heat, the distribution of H3K4me3 in hotspots with crossovers should be the corresponding size-biased distribution (green). The observed distribution of H3K4me3 in hotspots with crossovers (red) is skewed further toward hotspots with greater H3K4me3 (P = 10−90). (C) The most active autosomal hotspot for crossover is on the centromere-distal end of chromosome 19. DMC1 binds the 3′ ssDNA overhangs on either side of the DSB, which leads to a shift between DMC1 coverage on the forward (blue) and reverse (red) strands (200 bp smoothing). Regions containing the crossover breakpoint in each sperm are in black. Crossovers at the same locus in distinct sperm can have different resolution, depending on the actual sequencing coverage achieved in each case. (D) PRDM9 binding at a hotspot is a stochastic event in a cell. In a population of cells, some proportion of cells will have one, both, or neither homolog bound by PRDM9 (sky blue). Here, we show the proportion of times that each of these possibilities occurs at two illustrative symmetric hotspots. In the very active hotspot (top row), PRDM9 binds the B6 (red) and CAST (blue) homologs with probability 80% each. As a result, PRDM9 is bound to both homologs in the majority of cells (64%). In the less active hotspot, the probability of PRDM9 binding each homolog is 40%. The proportion of cells in which PRDM9 is bound to both homologs is lower (16%). (E) As in (D), a comparison of the proportion of cells in which PRDM9 (sky blue) binds one or both homologs, B6 (red) and CAST (dark blue), but at an illustrative asymmetric hotspot. The probability of PRDM9 binding the B6 homolog is ~80% versus only ~4% for the CAST homolog. This is due to a SNP (yellow) in the PRDM9 motif on the CAST homolog, which partially disrupts binding. Only a small minority of cells have PRDM9 bound to both homologs. Most crossovers overlapped hotspots identified by DMC1 ChIP-seq (92%) or H3K4me3 ChIP-seq (94%). Nearly all crossovers (96%) overlapped at least one of these two sets of hotspots, which is unlikely to happen by chance (25). The expected number and localization of crossovers to known hotspots provide evidence that our single-sperm sequencing approach is effective. Conversely, this also shows that nearly all crossovers happen in hotspots, with little recombination in the rest of the genome. Hotspots with greater H3K4me3 have more crossovers (P < 10−15, test for Pearson correlation), as expected. Total H3K4me3 in hotspots is also a good predictor of the number of crossovers per chromosome (25) (fig. S7). However, crossovers are seen disproportionately more frequently in hotspots with higher H3K4me3 (Fig. 2B). The most active hotspots have five times as many crossovers as a larger set of less active hotspots with the same total level of H3K4me3 (fig. S8). Although crossovers overlapped 1634 distinct autosomal hotspots in total, several hotspots showed a high concentration of crossovers (fig. S9). Each of 17 specific hotspots had crossovers in more than 5% of meioses [95% confidence interval (CI) = 2% to 11%]. One subtelomeric hotspot on chromosome 19 exhibited crossovers in more than 9% of meioses (95% CI = 4% to 15%) (Fig. 2C). These appear to be the most active hotspots for crossover identified in a mammal to date (10, 19, 20, 31).

PRDM9 variants exhibit unexpected dominance and often bind asymmetrically to homologous sites in hotspots Recombination hotspots in the hybrid mouse consist of hotspots that are activated by PRDM9CAST or PRDM9HUM and those that are PRDM9-independent (32). We can identify, in most cases, which PRDM9 variant activates a hotspot (25) and the DNA sequence motif to which it binds (fig. S10) (33). Among autosomal crossovers that overlapped hotspots, 2309 (92%) overlapped a single DMC1 hotspot, of which 1377 (60%) overlapped PRDM9CAST and 784 (34%) overlapped PRDM9HUM hotspots. The remaining 148 crossovers (6%) could not be confidently assigned to an allele, including some (29, or 1.3%) where PRDM9-dependent and independent hotspots overlapped. We saw no instances of crossovers in the autosomes that could definitively be assigned to a PRDM9-independent hotspot. PRDM9CAST is dominant over PRDM9HUM for crossovers (64:36, 95% CI = 62% to 66%), as it is for H3K4me3 (62:38) and DMC1 (68:32) [also observed independently in (28)]. Hotspots in hybrid mice can vary in their activity on the two homologs if sequence differences cause differences in PRDM9 binding affinity at that position (26, 32, 34). Some such sequence differences are random polymorphisms. Others result from degradation of PRDM9-binding sites for evolutionary reasons: As a species evolves with a particular Prdm9 allele, the best binding sites for that allele are lost from the host genome because of meiotic drive favoring hotspot-disrupting mutations (35). As a result, a particular PRDM9 variant will bind chromosomes from another genome in preference to its own. For example, PRDM9CAST binding sites are lost on CAST chromosomes, with no systematic loss on B6 chromosomes. In the hybrid, this leads to a spectrum of “asymmetry” in PRDM9CAST binding, with reduced binding on the CAST chromosome in many hotspots (fig. S11). Informally, asymmetry is a measure of the extent to which PRDM9 preferentially binds one of the two homologs at a particular hotspot (Fig. 2, D and E). As expected, PRDM9HUM shows asymmetry in a smaller fraction of hotspots, with no overall bias toward either chromosome (fig. S11): Evolutionary loss of binding sites has not occurred for the engineered Prdm9HUM allele, and the observed asymmetry is due to stochastic variation in DNA sequence, which affects both homologs equally on average. The dominance of PRDM9CAST over PRDM9HUM despite loss of its binding sites is surprising. Both alleles appear to have similar overall levels of expression (fig. S12) but different distributions of H3K4me3 across hotspots, with PRDM9CAST hotspots skewed toward greater H3K4me3 (fig. S13). This suggests a functional difference between the alleles, such as a greater affinity of the PRDM9CAST zinc-finger domain for its binding sites.

PRDM9 binding on the non-initiating template homolog boosts resolution as a crossover Crossovers require engagement of the two homologous chromosomes, so it is natural to ask whether PRDM9 binding on both chromosomes influences crossover formation. To check this, we first compared the asymmetry of hotspots with their crossover resolution probability, informally the probability that a particular DSB resolves as crossover, which we estimated using H3K4me3 as the measure of recombination initiation (25). We find that asymmetry correlates with a decrease in crossover resolution probability (Fig. 3A), an effect also observed independently in a mouse pedigree (28). DSBs in very asymmetric hotspots were only 31% (95% CI = 19% to 52%) as likely to resolve as a crossover as those in symmetric hotspots (P = 5 × 10−6) (25). We observed comparable effects for PRDM9HUM and PRDM9CAST hotspots (fig. S14). The strong observed excess of DMC1 in asymmetric hotspots (fig. S15) rules out the possibility that these hotspots simply have fewer DSBs. Fig. 3 PRDM9 binding on the non-initiating template homolog affects crossover resolution. (A) Hotspots were binned into five groups depending on the level of asymmetry in PRDM9 binding of the homologs (25). The crossover resolution probability, which accounts for differences in H3K4me3, in each bin (normalized relative to the bin with the most symmetric hotspots) is plotted against the mean asymmetry of hotspots in that bin. Predicted effects on crossover resolution if PRDM9 binding on the template homolog was irrelevant (black) and if it was essential (red) are shown for comparison; error bars denote SE. (B) Hotspots were grouped into four bins depending on the number of SNP differences between B6 and CAST chromosomes in the central 200 bases of the hotspot. The crossover resolution probability in each bin (blue) was inferred relative to the bin containing hotspots with zero SNPs. Red points show the same quantity after correcting for asymmetry in PRDM9 binding. Error bars denote SE. (C) Crossover resolution probability is significantly higher for DSBs initiated on the “less-bound” homolog than on the “more-bound” homolog in asymmetric hotspots. Crossover resolution probabilities for initiation on the more-bound (red, n = 47) and less-bound homologs (blue, n = 13) are shown after accounting for differences in H3K4me3 on them. Probabilities were normalized against the average for symmetric hotspots (dashed black line); bars show 95% confidence intervals (25). (D) Fraction of crossovers (green), H3K4me3 (blue), and DMC1 (red) originating on the less-bound chromosome in asymmetric hotspots, with dashed line marking the proportion expected from H3K4me3. The fraction of crossovers initiating on the less-bound chromosome is significantly greater than expected from H3K4me3 (P = 5 × 10−6), whereas the fraction of DMC1 is significantly lower than expected from H3K4me3 (P < 10−16). Error bars denote SE. (E) Illustration that the probability of PRDM9 having bound the template depends on which homolog is initially cut for the same asymmetric hotspot as in Fig. 2E. A DSB is more likely to occur on the more-bound homolog B6 (red). When it does, fewer than 4% of cells (3/80) will have the template CAST homolog (blue) bound. In the less likely event that the CAST homolog is cut, the B6 homolog will have been bound in 75% of cells (3/4). Note that the likelihood of cells with PRDM9 bound on both homologs to be cut at this hotspot is twice that of cells with only one homolog bound. (F) Crossover resolution is influenced by PRDM9 binding on the template homolog. All potential sites for recombination initiation (i.e., the B6 and CAST homologous sites in each hotspot) were sorted according to the H3K4me3 on their respective template homologs. The initiating sites were then binned into seven bins, such that the total H3K4me3 intensity on the initiating sites in each bin is the same. The proportion of crossovers that initiated in each bin (out of 685 crossovers where the initiating homolog could be inferred) is shown against the average H3K4me3 on the corresponding template homologs (x axis). The dashed red line shows the expected relationship if H3K4me3 on the template were unrelated to crossover outcome. A previous study (32) noted that hotspots that form crossovers in F 1 mice tend to have a lower density of polymorphisms than other hotspots, and proposed that local DNA sequence heterozygosity caused reduced crossover formation. We found a slight effect of heterozygosity (P = 0.06) but a significant effect with asymmetry (P = 2 × 10−13) (25). Because asymmetry and the presence of polymorphisms are correlated (P = 1.6 × 10−280, test for Pearson correlation), and because polymorphisms within the PRDM9-binding motif are usually the cause of asymmetry (26), we checked whether the effect of heterozygosity could be due to its impact on asymmetry. We found no effect of heterozygosity once asymmetry is taken into account (P = 1; Fig. 3B and table S1). Asymmetric hotspots are less likely to have both homologs bound by PRDM9 (Fig. 2, D and E). Therefore, a possible explanation may be that PRDM9 binding on the template chromosome (hereafter, template) increases the chance of a DSB being repaired as a crossover. To investigate further, we assessed how frequently crossovers arise on the “less-bound” and “more-bound” homologs in asymmetric hotspots. We selected asymmetric hotspots with the property that the more-bound homolog was 20 times as likely as the less-bound homolog to be bound by PRDM9 on average. We identified 60 crossovers occurring at such asymmetric hotspots for which we could also infer the initiating chromosome (25)—that is, the chromosome on which the break occurred. If the likelihood of repairing a break as a crossover were equal between the homologs, we would simply expect to find that 20 times as many of these crossovers initiated on the more-bound homolog relative to the less-bound homolog. Analysis of these 60 cases, however, revealed that significantly fewer than expected initiated on the more-bound homolog (47/60, P = 5 × 10−6, binomial test). The crossover resolution probability shows a clear directional effect (Fig. 3C): When a DSB does occur on the less-bound homolog, it is six times as likely to form a crossover relative to a DSB that occurs on the more-bound homolog (P < 10−4) (25). The more asymmetric the hotspot, the greater is the difference in crossover resolution between the homologs (fig. S16). Differences in DNA sequence between the homologs cannot explain this directional effect within hotspots. We also rule out the possibility that the increase in crossovers on the less-bound homolog is driven by an increase in DSBs, as there is less DMC1 than expected on it (Fig. 3D). Our data strongly suggest a model in which it is PRDM9 binding on the template that promotes crossover formation: When a DSB occurs on the less-bound homolog, it is likely that PRDM9 will have bound the template (Fig. 3E). Conversely, when the more-bound homolog is cut, the template will not often be bound by PRDM9 (Fig. 3E). In fact, DSBs initiating on the less-bound homolog are almost 2.5 times as likely as those at symmetric hotspots to resolve as a crossover, despite the greater heterozygosity at asymmetric hotspots (P = 0.006) (25). If our model is correct, PRDM9-binding on the template should increase crossover resolution in all hotspots, regardless of whether they are asymmetric. Consistent with this, the rate of crossovers originating from a fixed amount of H3K4me3 on the initiating chromosome increases with H3K4me3 on the template chromosome (Fig. 3F). Hotspots with the greatest PRDM9 binding on the template are four times as likely as hotspots with the lowest binding on the template to have crossovers. This effect also helps to explain the observation that more active hotspots have a disproportionately greater number of crossovers (25). Taken together, these lines of evidence lead us to conclude that PRDM9 binding on the template chromosome increases the chance that a DSB is resolved as a crossover. Our data also imply that PRDM9 binding on the template is probably not essential for crossover formation (Fig. 3A).

PRDM9 allele, GC content, and proximity to the distal telomere strongly influence crossover resolution To identify further independent effects on crossover formation, we created two sets of otherwise-matched hotspots by pairing each hotspot that has a crossover with a hotspot that does not have a crossover (25). The paired hotspots were matched on their PRDM9-dependent H3K4me3 enrichment on both homologs and were chosen to be on the same chromosome (1592 pairs). We then asked whether the hotspot sets differed in additional features. Although PRDM9CAST is dominant over PRDM9HUM overall, PRDM9HUM hotspots have significantly more crossovers than PRDM9CAST hotspots matched for the same level of H3K4me3 and asymmetry (table S2, odds ratio = 1.32, P = 2.7 × 10−4, Fisher exact test). The difference between the variants is greater for more active hotspots than for less active hotspots (table S3). We then matched hotspots for the allele, in addition to the criteria mentioned above, and found that hotspots with crossovers have significantly greater GC content within 500 bp of the hotspot center (P = 1.2 × 10−14, paired t test; Fig. 4A). This is true of both PRDM9CAST and PRDM9HUM hotspots separately (fig. S17), so it cannot be explained by historical GC-biased gene conversion in PRDM9CAST hotspots. The possible reasons for this observation are either an increase in DSBs or a greater likelihood for a DSB to resolve as a crossover, in very local regions of higher GC. Previous data for SPO11-oligos in B6 (29) indicate that there is no significant effect of local GC content on the number of breaks in hotspots (fig. S18). Therefore, we conclude that greater GC content is conducive to the repair of DSBs as crossovers. Fig. 4 Crossover resolution is affected by local GC content and telomere proximity. (A) Each autosomal hotspot with a crossover was paired with another hotspot lacking a crossover for the same PRDM9 variant, on the same chromosome and with very similar H3K4me3 on both homologs (25). The distribution of local GC content (500 bp around the hotspot center) is compared between the two matched sets (n = 1355, P = 1.2 × 10−14, paired t test). (B) Hotspots were divided into seven bins depending on their distance from the distal telomere of their respective chromosome. Crossover resolution probability (relative to the leftmost bin) is shown for each bin; error bars denote SE. Chromosomes with more than one crossover in an individual sperm were removed to avoid confounding with crossover interference (see figs. S20 and S21 for additional views). Mouse chromosomes are acrocentric, and a well-known effect in male mice is a greater number of crossovers near distal telomeres (16), although the reason for this is not known. Crossover counts combine two different effects: the rate at which DSBs occur and the probability that a particular DSB resolves as a crossover. Analysis of SPO11-oligos in B6 establishes that the rate of DSBs within hotspots does not show spatial variation along a chromosome (fig. S19), although the number of hotspots over broad scales may vary (36). On the other hand, our data show that the probability of DSB resolution as a crossover depends on the chromosomal location of a hotspot, increasing by a factor of 5 from the centromere to the distal telomere (Fig. 4B and figs. S20 and S21). This chromosome-wide effect, which is strongest near the distal telomere, cannot be explained solely by suppression of crossovers at the centromere (11, 12). The GC and telomere effects are both observed with and without accounting for H3K4me3 (figs. S22 and S23). Additional analyses show that the four effects—namely, PRDM9 binding on the template, proximity to the distal telomere, the PRDM9 variant activating the hotspot, and local GC content—are distinct (figs. S24 to S26) (25).

Factors that boost crossover probability also lead to faster homolog engagement Each DSB results in a pair of long-lived SPO11-oligos; therefore, quantitative sequencing of these oligos provides a direct measure of the number of DSBs (29). In contrast, the assay for DMC1 measures its transient association with the ssDNA near each DSB (37). As a result, it depends both on the number of DSBs and on how long DMC1 remains bound to the ssDNA. Therefore, comparison of DMC1 with SPO11-oligos allows an assessment of the time until DMC1 is no longer associated with ssDNA (36). Because this happens after successful strand invasion takes place and the homologs become locally engaged near the DSB site, we refer to the ratio of DMC1 to SPO11-oligos as a measure of “homolog engagement time” (25). The B6 mouse is the only one in which assays of H3K4me3, SPO11, and DMC1 are all currently available. From these data, we find that across hotspots, homolog engagement time decreases as H3K4me3 levels increase, with DSBs at the most active hotspots engaging the fastest (Fig. 5A). Therefore, we can conclude that homolog engagement time is affected by PRDM9-binding on the template chromosome (fig. S27). Interestingly, even on the non-pseudoautosomal region of the X chromosome, where the sister chromatid is thought to be used as the template (6, 38), repair is faster for hotspots with the highest levels of H3K4me3 (fig. S28). Fig. 5 Factors affecting homolog engagement time in the repair of DSBs. (A) Hotspots in the B6 mouse were ordered by their H3K4me3 intensity and divided into 10 bins. Average homolog engagement time, the ratio of total DMC1 to total SPO11 per bin, is shown relative to the average H3K4me3 per hotspot in each bin; error bars denote SE. (B) Hotspots in the B6 mouse were divided into eight bins depending on their distance from the distal telomere of their respective chromosome. Average homolog engagement time (ratio of total DMC1 to total SPO11 in each bin) is shown; error bars denote SE. (C) Hotspots in the B6 mouse were divided into six bins depending on their local GC content (±500 bp around the hotspot center). Average homolog engagement time per bin is shown; bars show 95% confidence intervals. (D) Comparison of estimated homolog engagement time for the more-bound homolog (red) and less-bound homolog (blue) in asymmetric hotspots (corresponding to Fig. 3C, 95% confidence intervals) (25). Estimated homolog engagement time (ratio of DMC1 to H3K4me3) is normalized against the average for symmetric hotspots (dashed black line). It is known that DMC1 relative to SPO11 is lower in the 5 Mb adjacent to the centromere-distal telomere relative to the rest of the chromosome in B6 (36). Extending this finding, we show that the average DMC1 per hotspot increases further from the centromere-distal telomere (fig. S29), although the rate of DSBs and the width of DMC1 loading near break sites remain stable (figs. S19 and S30). Indeed, we find that homolog engagement time increases continuously as a function of distance from the distal telomere (Fig. 5B), with engagement time for breaks farthest from the distal telomere 25% longer than for the nearest ones. Homolog engagement time also decreases with increase in local GC content in B6 (Fig. 5C). Measurements of SPO11 are not available in our hybrid mouse (nor in any other hybrid mouse). Therefore, to see whether the results from B6 extended to our hybrid, we approximated homolog engagement time by using H3K4me3 in lieu of SPO11-oligos. Recapitulating the B6 findings, estimated homolog engagement time in our hybrid mouse also decreased with increasing H3K4me3 on the template (figs. S15 and S31). In asymmetric hotspots, DSBs on the more-bound homolog took almost four times as long to repair as DSBs on the less-bound homolog (Fig. 5D). Similar results on the impact of PRDM9 on DMC1 have been seen in several F 1 mice (26). As in B6, estimated homolog engagement time in the hybrid increased with distance from the telomere (P = 10−31) and decreased with local GC content (P = 2 × 10−15) (25). Finally, for PRDM9HUM hotspots, estimated homolog engagement time was 18% lower on average relative to PRDM9CAST hotspots (fig. S32). In summary, four factors influence homolog engagement time, and in each case they strongly and consistently influence crossover probability (25).

Crossover breakpoints are modulated by the chromatin environment on the template chromosome Crossover breakpoints, which are the points at which sperm DNA switches from one parental chromosome to the other, have been shown to be contained within the extent of H3K4me3 modification (39) and of DMC1 binding (29) in a small number of hotspots. However, detailed knowledge of breakpoints has been elusive. Our data allow a genome-wide examination of the fine-scale distribution of crossover breakpoints. For crossovers in symmetric PRDM9CAST hotspots, we observed a strongly multimodal pattern of breakpoints (Fig. 6A). Breakpoints appear to flank positions occupied by nucleosomes around the PRDM9 binding site, with clear peaks in the first, second, and third nucleosome-depleted regions (NDRs). In asymmetric hotspots, we also saw a multimodal pattern; however, it is shifted from that of symmetric hotspots (Fig. 6B). For hotspots that are not particularly symmetric or asymmetric, the peaks merge into a more continuous distribution (fig. S33), as might be expected from a mix of both situations. Fig. 6 Positioning of crossover breakpoints is influenced by nucleosome positioning on the template chromosome. (A) Distribution of crossover breakpoints from the motif center (green) for crossovers that overlap symmetric PRDM9CAST hotspots with a well-identified motif site and have breakpoint resolution of ≤250 bp (n = 132). To deal with the uncertainty in crossover breakpoint location in each sperm, we assign equal weight to all possible breakpoint positions in that sperm (25). H3K4me3 ChIP-seq with MNase averaged over PRDM9CAST hotspots is shown in red (20 bp smoothing). Red bars at top show average inferred positions of nucleosomes; black bar shows the PRDM9CAST binding site. (B) As in (A) but for crossovers that overlap asymmetric PRDM9CAST hotspots (n = 33). Average MNase-seq for the less-bound chromosome of asymmetric hotspots (blue, 50 bp smoothing) is shown, with blue bars at top showing average inferred nucleosome positions. This is an estimate of the nucleosome positioning at hotspot sites when PRDM9 is not bound (25). The peak in MNase-seq at the hotspot center is consistent with the presence of a nucleosome in PRDM9CAST hotspots in the absence of PRDM9 binding (fig. S37). (C) Illustration of nucleosome positions when the template homolog is bound by PRDM9CAST. DNA (dark brown) around histones (light brown), with red dots indicating H3K4me3 mark. Nucleosome positions on the DSB-initiating and template homologs are the same. This is more likely in symmetric hotspots (A). (D) Illustration of nucleosome positions when the template homolog is not bound by PRDM9CAST. Colors are as in (C). Typical nucleosome positioning at sites bound by PRDM9 is shifted relative to unbound sites, resulting in a difference between the DSB-initiating and template chromosomes. This is more likely in asymmetric hotspots (B). The shift in crossover breakpoints between (A) and (B) is consistent with the shift in nucleosome positions on the template homolog, as illustrated in (C) and (D). Nucleosome positions are known to exhibit a phase shift concomitant with PRDM9 binding (39). The homolog on which the DSB occurs is bound by PRDM9, regardless of whether the hotspot is symmetric or asymmetric (Fig. 6, C and D). However, the template is much more likely to have been bound by PRDM9 in symmetric rather than in asymmetric hotspots and thus is likely to have a different nucleosome profile. The shift in crossover breakpoints that we observe between symmetric and asymmetric hotspots is consistent with the shift in nucleosomes between bound and unbound sites, and with a model in which crossover breakpoints avoid nucleosome positions (Fig. 6, A and B, and fig. S34) (25). We conclude that crossover resolution is modulated by nucleosome positioning on the template chromosome. Crossover breakpoints also avoid nucleosomes in symmetric PRDM9HUM hotspots (fig. S35) although, in contrast with symmetric PRDM9CAST hotspots, there does not seem to be a peak in the first NDR from the motif site (Fig. 6A and fig. S36). This may be due to PRDM9CAST binding the template more strongly or for longer than PRDM9HUM, thereby, for example, creating a greater barrier to Holliday junction migration. Alternatively, this may point to differences in the a priori histone binding energies in regions that each allele prefers to bind, making nucleosomes more or less difficult to evict. Indeed, whereas PRDM9CAST preferentially binds sites that are occupied by a nucleosome a priori, PRDM9HUM preferentially binds sites that are depleted in nucleosomes (fig. S37). The overall differences in crossover breakpoints between the two alleles (fig. S38) reflect differences at symmetric hotspots as well as the different proportions of symmetric and asymmetric hotspots for each allele.

Crossovers in the pseudoautosomal region The pseudoautosomal region (PAR) is a short region of homology between the X and Y chromosomes, which must have a crossover in males for successful segregation of these chromosomes during meiosis. The precise PAR region varies in mouse subspecies; it is ~700 kb long in B6 and 430 kb longer in CAST (40) (fig. S39). A crossover in the PAR is achieved partly by an increased DSB rate, which is thought to be the result of a disproportionally long axis in this region (41). However, it is not known whether these biological properties of the PAR are determined by cis- or trans-acting factors. Specifically, it is not clear whether the PARs on both chromosomes in the hybrid behave differently, retaining the properties of their parental strains, or whether one of the parental strains is dominant. We compared the DMC1 signal in hotspots in the region that is pseudoautosomal in CAST but not B6 (henceforth het-PAR, fig. S39). Most of these hotspots have an excess of DMC1 on the CAST relative to the B6 chromosome, with seven times as much DMC1 on the CAST chromosome on average (Fig. 7A, P < 10−4) (25). This is not explained by any artifactual differences in sequence mapping between the haplotypes (Fig. 7A, P = 0.85) (25). The effect could be explained by a greater number of DSBs on the CAST chromosome, by DSBs initiating on the CAST chromosome taking longer to engage their homolog, or both. Either way, it follows that the CAST and B6 regions behave differently, which implies that the PAR is determined by factors that can distinguish between them, likely cis-acting factors. Fig. 7 Differences in recombination in the pseudoautosomal region. (A) Histogram of the fraction of DMC1 reads on the CAST chromosome across hotspots (red, n = 38). For the same regions, the corresponding histogram for reads from sequencing of bulk sperm is shown (blue) as a control to assess potential mapping artifacts. Whereas DMC1 is significantly biased toward the CAST haplotype (P < 10−4) (25), there is no significant bias in bulk sequencing (median = 0.51, P = 0.85) (25). (B) The most active hotspot for crossovers in the entire genome is in the het-PAR and is PRDM9-independent. DMC1 coverage (200 bp smoothing) is shown for the forward (blue) and reverse (red) strands. Crossover breakpoints are in black. See fig. S43 for a further het-PAR hotspot. We identified 34 PAR crossovers in 217 sperm, all of which are in the het-PAR (fig. S40). These crossovers demonstrate that any potential structural differences between the two homologs in this region do not preclude reciprocal exchange between them. The number of crossovers we identified is roughly in line with the proportional size of the het-PAR within the whole PAR (34 out of 108.5, which is expected if there is one crossover per meiosis). We do not have the power to detect crossovers outside the het-PAR because of a lack of adequate sequence assembly. Although previous research has shown the coexistence of PRDM9-dependent and independent hotspots near the PAR (42), their relative importance in crossover formation within the PAR remains unclear (42, 43). We found that 19 crossovers overlapped PRDM9-independent hotspots, 4 overlapped PRDM9CAST, and none overlapped PRDM9HUM hotspots. PRDM9-independent hotspots have 53% of the DMC1 signal among het-PAR hotspots, yet the concentration of crossovers in them is substantially greater (83%, P = 0.016) (25). This suggests differences in the timing or processing of DSBs in PRDM9-independent hotspots. The dominance of PRDM9-independent hotspots over PRDM9HUM hotspots also proves that it is not simply a consequence of the evolutionary erosion of PRDM9 binding motifs in this region. Across the genome, the hotspot with the greatest number of crossovers is in the het-PAR and is PRDM9-independent, with crossovers in 11% of meioses (Fig. 7B). Although the mechanism controlling PRDM9-independent hotspots is not currently known, we note that the number of DSBs in these hotspots is disproportionately elevated genome-wide in Atm−/− mice (fig. S41). This shows a role, either direct or indirect, for the ATM pathway in modulating the use of PRDM9-independent hotspots.

Discussion Recombination via the formation of crossovers is a central part of meiosis. We have identified four distinct factors that affect the probability that a particular DSB is resolved as a crossover: (i) whether PRDM9 is or has bound at the same position on the homologous chromosome, (ii) distance from the centromere-distal telomere, (iii) local GC content around the DSB, and (iv) whether PRDM9HUM or PRDM9CAST bound the hotspot where the DSB occurred. Our work uniquely separates upstream effects (numbers of DSBs) from those downstream of the breaks and implicates each of these four factors in an increase in the preferential use of the crossover pathway for DSB repair. The effect of these factors appears to be cumulative, so that hotspots with multiple favorable conditions are most likely to form crossovers (fig. S42). Equally, the effect of an unfavorable condition in one factor may be mitigated by a favorable condition in another. For example, although breaks in asymmetric hotspots are less likely to resolve as crossovers overall, those in telomere-proximal asymmetric hotspots are more likely to do so than breaks in telomere-distal symmetric hotspots (fig. S42). We further show that the same four factors that increase the probability that a particular DSB is resolved as a crossover also decrease homolog engagement time, namely the time until successful strand invasion takes place and DMC1 is no longer associated with ssDNA. The relative impact of these factors is also consistent for both, with the biggest effect being PRDM9 binding on the homolog, followed by telomere proximity, GC content, and PRDM9 variant (25). Note that multiple lines of evidence establish that it is PRDM9 binding on the homolog, rather than polymorphisms or hotspot asymmetry per se, which affects outcomes for DSBs. The relative effect sizes are consistent with the presence of additional factors affecting the spatial localization of crossovers within a chromosome (25). Each meiotic cell must solve, for each DSB, the seemingly intractable problem of finding the homologous sequence among billions of bases of DNA (4). The factors we have identified, by virtue of their impact on homolog engagement time, suggest potential mechanisms that affect this process. A natural explanation for the effect of PRDM9 binding on the homolog is that it facilitates homology search, either directly or indirectly. Possible mechanisms for this include its effect on the local chromatin environment (44), a role in bringing the template homolog to the chromosome axis (45) (thereby reducing the search space), or direct interaction between PRDM9 molecules at the DSB site and the template. Telomeres have distinct properties in meiosis that may facilitate homology search: They are physically bound to the nuclear envelope (46) and may thus be closer to each other a priori (47). They also engage in active movements during the phase of meiosis when the search for the homolog is taking place (48). The effect of GC content could be mediated by its influence on the local chromatin environment (49). Why are rapidly engaging breaks more likely to become crossovers? A compelling explanation is that delay in finding and engaging the homolog itself is a causal factor. There are several classes of mechanism, which are not mutually exclusive, with this property. In the first class, the earlier a DSB engages its homolog, the more likely it is to be resolved as a crossover. For example, sites of early-engaging breaks may be more likely to appropriate and stabilize protein complexes that are essential for crossover formation (50). This view is consistent with cytological findings that crossover sites are correlated with those where formation of the synaptonemal complex nucleates (18, 51, 52). A second class of model posits a window of opportunity during which DSB sites can acquire the necessary protein complexes and become crossover-proficient [crossover licensing (53)]. Early-engaging breaks may resolve as crossovers more often by virtue of having found their homolog prior to the end of this period. If multiple breaks on a particular chromosome engage their homolog during this window, other factors may determine which will become crossovers (18). One possibility is that several (or all) breaks might initially proceed down the crossover repair pathway (53, 54), but in the event of a surfeit in prospective crossovers, a subset of them could be redesignated down an alternative repair pathway [crossover designation (53, 55)]. A third class of model is that highly delayed breaks, which may have failed to engage the homolog, are repaired from the sister—for example, via a cutoff mechanism after which the cell switches from homolog-mediated to sister-mediated repair of the remaining breaks (6, 38). Previous research has shown the impact of nucleosomes on the initiating chromosome on strand resection (56). We have shown that the distribution of crossover breakpoints differs depending on whether PRDM9 has bound the template and is affected by the template’s chromatin environment. This suggests that PRDM9 often remains bound (and actively maintaining the local nucleosome environment) on the template until at least strand invasion and perhaps until Holliday junction resolution. Finally, our work sheds new light on how crossover is achieved in the PAR.

Methods summary We harvested and isolated 217 sperm from an adult B6xCAST mouse, which has the Prdm9 alleles Prdm9HUM (26) and Prdm9CAST (32). We developed a protocol for whole-genome amplification and DNA sequencing of single cells (25), which we applied to the sperm. Bulk sperm from the same animal was sequenced at high depth, and the DNA sequence was used to call variants de novo. We developed a computational approach to identify the most likely sequence of CAST and B6 haplotypes in each chromosome in each sperm (25). DMC1 ChIP-seq (37) was performed using testis tissue from the same animal, and hotspots were called using our previously published peak-calling algorithm (26). The PRDM9 variants activating hotspots (25) and the PRDM9-binding DNA sequence motifs in them (33) were identified. We developed a method for assessing the evidence of enrichment in H3K4me3 from ChIP-seq data (25). Micrococcal nuclease sequencing (MNase-seq) and H3K4me3 MNase ChIP-seq were performed in testes from another animal with the same genetic background (25).

Supplementary Materials www.sciencemag.org/content/363/6433/eaau8861/suppl/DC1 Materials and Methods Supplementary Text Figs. S1 to S43 Tables S1 to S3 References (57–67)

http://www.sciencemag.org/about/science-licenses-journal-article-reuse This is an article distributed under the terms of the Science Journals Default License.

Acknowledgments: We thank R. Li and C. Green for helpful discussions, and the High-Throughput Genomics team at the Wellcome Centre for Human Genetics for all sequencing work. We are grateful to E. Hatton for analytical work in the early development of our single sperm sequencing protocol. Funding: Wellcome Trust grants 095552/Z/11/Z to P.D. and grants 090532/Z/09/Z and 20314/Z/16/Z as core support for the Wellcome Centre for Human Genetics. R.H. is supported by Wellcome Trust grant 106130/Z/14/Z. Author contributions: P.D. designed the study; B.D. bred the mice; G.Z. developed the single-cell sequencing protocol with assistance from R.B. and performed the single-cell sequencing experiments; P.W.B. performed MNase ChIP-seq; D.M. performed cytological analysis; A.G.H. analyzed the data; R.H. contributed to the H3K4me3 peak caller; and A.G.H. and P.D. wrote the paper with input from G.Z., P.W.B., R.H., and B.D. Competing interests: P.D. is founder and CEO of Genomics plc, and a partner in Peptide Groove LLP. G.Z., R.B., and P.D. are listed as co-inventors on a patent application for the single-cell DNA amplification and sequencing protocol. P.W.B. is now an employee of GeneFirst Ltd., Abingdon, UK. Data and materials availability: Raw and processed data are available on the Gene Expression Omnibus website under SuperSeries accession GSE125327, comprising GSE125326 (sperm sequencing) and GSE124991 (DMC1). Code is available on Zenodo (DOI 10.5281/zenodo.2540356).