Meiotic recombination and de novo mutation are the two main contributions toward gamete genome diversity, and many questions remain about how an individual human’s genome is edited by these two processes. Here, we describe a high-throughput method for single-cell whole-genome analysis that was used to measure the genomic diversity in one individual’s gamete genomes. A microfluidic system was used for highly parallel sample processing and to minimize nonspecific amplification. High-density genotyping results from 91 single cells were used to create a personal recombination map, which was consistent with population-wide data at low resolution but revealed significant differences from pedigree data at higher resolution. We used the data to test for meiotic drive and found evidence for gene conversion. High-throughput sequencing on 31 single cells was used to measure the frequency of large-scale genome instability, and deeper sequencing of eight single cells revealed de novo mutation rates with distinct characteristics.

Here, we describe a single-cell whole-genome analysis method to characterize the genomic changes from gametogenesis. Using this technique, we analyzed the whole genomes of >100 single human sperm cells. Recombination data from 91 single sperm cells presented a comprehensive landscape of personal recombination activity. Genome-wide meiotic drive and gene conversion were also directly tested. Single-cell whole-genome sequencing further revealed primary information about human sperm genome instability and mutation rate.

Using pedigree data and statistical methods, deCODE () and the International HapMap Consortium () have been able to create high-resolution recombination maps at the population level. However, such maps only show average results across a population and cumulative results throughout evolutionary history (), and it is not clear what the relationship is between these population maps and the personal recombination processes for any given individual, especially because these focus only on meiotic products that yield successful offspring (). The 1000 Genome Project measured the mutation rate in two family trios (). However, their results are limited to measuring only a single meiosis per individual, and in general, such an approach probes only viable offspring, is limited by the number of offspring per family, and requires access to parental genome data.

Gametogenesis is a biological process by which precursor cells undergo cell division and differentiation to form mature haploid gametes. Human gametogenesis occurs by mitotic division of gametogonia, followed by meiotic division of gametocytes into various gametes. During this process, the gamete genome experiences both programmed and spontaneous changes, among which meiotic recombination shuffles the two haploid somatic genomes to create a unique hybrid haploid genome for each gamete cell, while accumulated replication errors contribute point mutations that may affect the gametes’ functionality. This results in an enormous variety of new genomes being created in the gametes, thereby enabling one’s children to add to the genetic diversity of the human race in a more complex manner than by simply mixing and matching entire parental chromosomes. The genome-wide recombination activity and de novo mutation rate have been directly characterized in many model organisms. However, it has been unclear how an individual human’s genome is edited during gametogenesis.

P0’s mutation rate (2–4 × 10) is higher than that obtained from genome-sequenced pedigree data (∼1 × 10) (), but it is consistent with evolutionary studies, which have revealed ∼4–5× more mutations in male than in female, possibly due to the larger number of germline cell divisions in male (). The results from the pedigree study identify the variation of germline mutation levels transmitted to each offspring but are not able to identify the source of such variation. Our results from the eight individual sperm cells have a high degree of internal consistency between their respective mutation levels ( Figure S4 B), which suggests inter- rather than intraindividual variation. Within each cell, most mutations reside in intergenic or intronic regions ( Table 2 ). However, we detected three missense mutations, a category that was not observed in the pedigree genomes. The transition-to-transversion ratio of P0 mutations is 5.6, as compared to a population average of 2.1. The main reason of more transition than transversion is generally thought to be deamination of methylated cytosine, primarily at CpG and potentially in other sequence contexts. The higher level of transition we observed is consistent with this, as 21% of C→T mutations correlated with CpApG, though only 8% were at CpG sites.

A distinct group of loci with 100% discordance with somatic DNA clearly stands out from the amplification error background ( Figure S4 A). These data are not statistically consistent with any of the measured amplification or sequencing errors and are strong candidates for de novo mutations in the sperm. After excluding signals from repetitive regions or with low alignment confidence, we detected 25–36 candidate point mutations in each sperm cell ( Tables 2 and S5 ). We selected 19 mutations for PCR-Sanger sequencing and were able to obtain PCR products from 16 regions. The Sanger results from these 16 regions all confirmed our original calls, thus ruling out the possibility of sequence or alignment errors. Because these loci are inconsistent with the statistical distribution of amplification errors, we conclude that they are de novo mutations.

Sequencing data from the gene conversion study also offered the opportunity to measure de novo germline mutations. The recombination detection performed above demonstrated robust genotyping by single-cell sequencing, and we further evaluated the error rate for mutation detection. We selected high-confidence homozygous positions in the P0 somatic genome based on previous sequencing and genotyping () and calculated the first alternate allele calling frequency in sperm sequencing reads at the same positions. Histogramming these frequency data revealed a decreasing number of positions extending from the perfect agreement side of the discordance axis ( Figure S4 A). This long tail of background noise represented an amplification/sequencing error rate of 2.7 × 10per read per position.

(A) Allele discordance ratio of sperm MDA products against somatic genome (insert as y axis zoom in). The peak at 100% discordance illustrates a distinct group of loci standing out of the amplification errors background tail.

Human reproduction is well known to be inefficient, with monthly fecundity rates of only 30%–40%, and a large number of conceptions fail before the women are aware of the pregnancy (). This early determination of pregnancy fate was further confirmed by results showing the ability to predict embryo development by the four-cell stage, before embryonic genome activation (EGA) (). The importance of cytokinesis dynamics in embryo development strongly suggests genome integrity as a key factor, as genome instability will induce cell-cycle arrest. Although the aneuploidy rate for oocytes (20%–30%) is higher than that found in sperm (2%–10%), male genome defects are still a significant contribution to conception failure. Even if the embryo does develop correctly, gamete genome abnormality may impose increased risk to certain diseases. For example, the large-scale deletion of chromosome 13 long arm (13q) found in two of our sperm samples ( Figure S3 B) may induce 13q deletion syndrome with malformations of craniofacial region and skeletal abnormalities ().

We chose eight sperm cells that clearly passed the SNP-PCR assay (“normal”) and 23 further sperm cells that had marginal or failing scores on the assay (“abnormal” and not within the 93 samples for the recombination study) for high-throughput sequencing ( Table S1 ) and obtained 0.02 × coverage of the genome. After mapping the sequence reads to the human reference genome, we found a discrete distribution of relative sequencing tag density in each chromosome in which chromosomes were typically either present at a uniform level or completely absent ( Figure 5 B and S3 ). All eight of the “normal” cells and 17 “abnormal” cells exhibited such patterns with one of the two sex chromosomes missing, and another four “abnormal” cells had clear aneuploidy. Two cells displayed complex, continuous distributions of chromosome representation ( Figures 5 B and S3 ). Additional genotyping results confirmed the sequencing findings. The results of these six abnormal cells cannot be explained by the known bias mechanisms in MDA (), and our previous study on single-chromosome amplification showed no bias for particular chromosomes or sharp coverage drops in any region (). Therefore, the most likely source of missing sequencing reads in the present results is genomic abnormality in the individual sperm cells. The six abnormal samples ( Figure 5 B), together with the other two samples from the recombination analysis ( Figure 5 A), represent ∼7% of the 116 single cell amplifications with high-resolution analysis, which agrees with literature results on aneuploidy of ∼2%–10% measured with FISH ().

(B) Cells 23 and 27 are shown as normal controls, with 23 chromosomes clustered by normalized tag density and one sex chromosome dropped. Cells 59, 60, 63, and 64 had whole-chromosome aneuploidy. Cells 49 and 61 displayed complex, continuous distributions of chromosome representation.

(A) Whole-genome genotyping results of cell 112. Two columns in each chromosome represent the two haplotypes, and each horizontal bar shows the genotype of a SNP. Chromosome 14 showed very low call rates, suggesting its complete deletion.

For each gene conversion candidate SNP covered by high-throughput sequencing, we compared the genotypes of the same SNP across different single cells as well as P0 genomic DNA sequencing data () to confirm genotyping and haplotyping accuracy. From the 568 candidates, we confirmed 90 converted SNPs ( Table S4 ). Most gene conversions presented as single SNP, whereas five groups of nearby SNPs gave gene conversion regions whose sizes range from 1 to 22 kbp. This size range is comparable to what was found in yeast (), but not in human (). More interestingly, when we aligned the converted SNP to historical recombination hot spots, only 10 out of the 90 SNPs reside in hot spot regions. This is substantially different than the 58% hot spot overlapping of P0 recombination events. We did not find a strict relationship between gene conversion and recombination level, but generally, cells with more crossovers ended with fewer gene conversions or vice versa ( Figure 4 C).

As shown in Figure 2 D, some SNPs have genotypes that are opposite to the haplotype in which they resided; therefore, they serve as good candidates for gene conversion detection. To eliminate potential errors in genotyping, we performed high-throughput sequencing on eight of these cells ( Table S1 , samples 23, 24, 27, 28, 101, 113, 135, and 136). Six to eight × average coverage was obtained with Illumina 2 × 100 read pairs from each sample, covering ∼30%–50% of the haploid genome. The less than expected physical coverage based on Poisson statistics is mainly due to amplification bias from MDA. Because the sperm genomes are haploid, one can make highly confident allele calls with substantially lower coverage than the 30 × standard genome sequencing depth. To test the accuracy of this genotype calling method, we performed quality control analysis by mapping the sequencing data to the two P0 somatic haplotypes. We correctly detected 184 of the 193 crossover events in these eight cells without false positives, and the nine missing events all reside near the tips of the chromosomes and had low sequencing coverage.

Meiotic gene conversion is the transfer of information between homologs without reciprocal recombination. Although effectively contributing to genome diversity equivalently as two closely spaced recombination crossovers, gene conversion is less well studied in humans due to its small size relative to genetic marker density. Gene conversion at specific loci has been studied by sperm typing and population genetics data (), but direct whole-genome measurements have not been conducted for humans.

We first investigated whether the meiotic drive happens at the whole-chromosome level. Because of the general absence of recombination near centromeres, we can accurately define the haplotype across these regions, where kinetochores assemble for mechanical segregation. None of the 22 autosomes had a transmission ratio that significantly deviated from an equal distribution (p > 0.7, binomial distribution). Pearson’s correlation test between different chromosomes did not detect any cotransmission of centromere haplotypes. Then, we divided the whole genome into 100 kb haploblocks and studied whether any block showed meiotic drive. Even though many blocks had some evidence for bias, none of them reached genome-wide significance level ( Figure 4 A). Together with the centromere data, our haplotype block results demonstrate that meiotic drive does not appear as a large haplotype. We then turned to measure the transmission ratio of individual SNPs, where we found an obvious difference between our data and simulations of equal transmission ( Figure 4 B). A putative reason for this pattern is gene conversion.

Mendel’s laws propose that the two alleles at a genetic position are transmitted to offspring with equal probability. However, results from specific regions and the whole genome have suggested transmission biased toward one allele (), an effect that can, in part, be explained by the phenomenon of meiotic drive and that can be directly tested in our data.

Of P0’s recombination events, 135 do not overlap with any HapMap hot spots. Despite being all singlets, 38 of these events showed statistical significance relative to the activities measured in the deCODE male data, even after multiple comparison adjustment. Such a set as a whole is likely enriched with new recombination spots that can serve as targets for further analysis with traditional sperm typing methods. To demonstrate this, we selected two further regions for allelic specific PCR sperm typing ( Figure S2 ) and discovered that one of them is a new personal hot spot ( Table 1 , “Non-Hot-Spot Overlapping”; Chr3:197,249,108–197,250,198 and Chr4:18,404,324–18,406,601).

Among the 2,075 recombination events in P0, 940 overlap with at least one another event. These 940 overlapping events form 324 distinct sets, with 2–14 overlapping events in each set. A simulation based on HapMap activities showed a significantly higher level of self-overlapping in P0 (permutation test, p value = 0.001), suggesting that these recombination clusters are new hot spots. To confirm that P0 does have high recombination activities within these regions, we selected two regions with manageable sizes for allelic PCR and 2 loci digital haplotyping ( Figure S2 ) and independently verified their high activities in P0 ( Table 1 , “Self-Overlapping Sets”; Chr16: 7,988,699–7,990,230 and Chr9: 1,864,696–1,868,831). By comparing to the deCODE male data, we found that most of these clusters are also active in the population. However, three regions showed significant higher activities than deCODE ( Table 1 , “Self-Overlapping Sets”). Considering the small number of recombination events that we detected in P0 comparing with the historical hot spots pool, such a high level of overlap demonstrates P0’s preference for only a subset of historical hot spots.

(D) Historical hot spots overlapping ratio for each single cell. The maximum likelihood estimate is shown as a circle and the 95% confidence intervals are shown by the horizontal lines.

(C) Digital haplotyping results of P0 blood DNA from a region on chromosome 16. Upper panel shows results from SNP1-A-FAM and SNP2-A-HEX assays, which detect alleles in coupling phase. Lower panel shows results from SNP1-A-FAM and SNP2-B-HEX assays, which detect alleles in repulsion phase. The two chambers with both alleles detected in the lower panel are due to multiple template molecules occupation (1.87 expected from Poisson Distribution).

(B) Scheme of 2-loci allelic specific TaqMan PCR. For each SNP, only one allele is detected at one time. The probes for the two PCR amplicons have different colors (red as FAM and blue as HEX). The combination of allelic specific primers from the two SNP can detect alleles in either coupling (e.g., SNP1-A with SNP2-A) or repulsion (e.g., SNP1-A with SNP2-B) phase.

(A) Somatic or recombined haplotypes were first amplified with different combinations of allelic specific primers (upper panel). ‘Primer A’ and ‘Primer B’ represent the two different allele specific primers at each locus. Amplified haplotypes were further quantified with TaqMan assay specific for one allele of one SNP using digital PCR (lower panel).

We then analyzed the reference human genome for the PRDM9 13 bp degenerate DNA sequence motif, which was previously shown to be enriched in HapMap hot spots (). The motif is significantly (p < 10) enriched in P0 recombination regions compared to the genome background. However, 50 out of 162 recombination regions smaller than 30 kb do not contain the motif. When we focused on recombination smaller than 10 kb, the enrichment was not significant (p = 0.29) due to the low motif occurrence. We performed a de novo motif search within those regions without the 13 bp motif. All five hits reside in transposon sequences and are significantly enriched in P0’s recombination regions (p < 0.05 by simulation). This is consistent with the PRDM9 motif, which is also often located in transposon regions. These results suggest that PRDM9 binding may not be directly required for recombination, and other regulatory mechanisms may exist, such as homologous DNA pairing within transposons.

Sanger sequencing showed that P0 has the homozygous A/A PRDM9 genotype (allele naming from), which correlates with the highest historical hot spot usage. We employed the likelihood method from the Hutterite study () on the portion of P0’s recombination data that matched their criteria (specifically, the 274 events with 30 kb or smaller size) and determined that only 58% of P0’s recombination events coincide with HapMap hot spots. The ten times larger sample size in our data led to higher accuracy than the previous results, revealed by our 95% confidence interval of hot spot overlap fraction as ±10%. These high-accuracy measurements of P0’s usage of historical hot spots reveals that, even with the most active and hot-spot-correlated variant of PRDM9, an individual still generates a substantial proportion of recombination events outside of historical hot spots.

Both the deCODE and HapMap projects have made extensive catalogs of recombination hot spots at the population level (). Previous sperm studies have demonstrated that some particular hot spots are used idiosyncratically among individuals but have not had the ability to measure genome-wide activity for an individual (). Data from a Hutterite pedigree suggested interindividual variation in hot spot usage () and supported a hypothesis that the meiosis-specific histone methyltransferase PRDM9 may act as a universal regulator for recombination distribution (). Polymorphisms in PRDM9, to some extent, correlate with the level of historical hot spot usage. However, the small number of meioses that each individual has in the pedigree data, as well as uncertainty from statistical haplotype inference, led to extensive overlapping of the hot spot usage percentage between individuals (95% confidence interval of single measurement covering ±25%–40%). Consequently, the power of PRDM9 explaining hot spot usage variation is still under debate.

When one compares our results and the population data at higher resolution, differences emerge. The telomere-weighted bias is stronger in our results than in HapMap or deCODE data, resulting in large regions without recombination near the centromere ( Figure S1 ). For example, no recombination was detected within any ∼8 Mb region symmetrically crossing the 17 metacentric chromosome centromeres in P0 (p value 0.028 based on deCODE male data). The relative activities on the p arms of some chromosomes are also higher than population-wide results ( Figure S1 ). These differences suggest some potential individual-specific features that may be diluted by population-wide averaging, and we therefore performed a more extensive comparison at a finer scale. A sliding window of 2 Mb was applied to P0’s recombination map with 1 Mb increments, and the resulting windows for which P0’s recombination rate was at least triple the genome-wide average (3 cM/Mb) were compared with deCODE male activity. Within the total of 66 such windows, 3 showed significantly higher activity than the deCODE male data in the corresponding regions. We refined the boundaries of these regions and summarized the activities in Table 1 (“Sliding Window Scanning”).

p value calculated as the chance of expecting equal or more recombination events from historical data than in P0 from 91 meioses using binomial statistics; further adjusted with Bonferroni correction.

a p value calculated as the chance of expecting equal or more recombination events from historical data than in P0 from 91 meioses using binomial statistics; further adjusted with Bonferroni correction.

p value calculated as the chance of expecting equal or more recombination events from historical data than in P0 from 91 meioses using binomial statistics; further adjusted with Bonferroni correction.

a p value calculated as the chance of expecting equal or more recombination events from historical data than in P0 from 91 meioses using binomial statistics; further adjusted with Bonferroni correction.

Each dot represents a recombination event with color code for resolution. Solid black lines connect recombination events from the same sperm cell. Red and blue lines show the cumulative recombination rates from deCODE (male) and HapMap, respectively.

Nonuniformity in the probability of recombination events also occurs within each chromosome. Our data show telomere-weighted distributions that are qualitatively similar to those found in population genetics studies (). With a 5 Mb window size, we detected a correlation of 0.85 between P0 and deCODE male data and 0.76 between P0 and HapMap data, whereas the correlation between deCODE male and HapMap data is 0.85 ( Figures 3 and S1 ). We observed an 87 Mb median distance between adjacent recombination events, comparing with the 49 Mb expected value after we randomly shuffled the recombination events (permutation test, p < 10), which demonstrates positive recombination interference, as has been previously observed (). Taken together, P0’s personal recombination map shows that recombination events within an individual recapitulate the general broad-scale features from population data. Our results experimentally demonstrate general concordance between an individual and the population average, which can be thought of as an analogy to the ergodic principle from statistical physics.

Each dot represents a recombination event, with color code for resolution. Solid black lines connect recombination events from the same sperm cell. Red and blue lines show the cumulative recombination rates from deCODE (male) and HapMap, respectively.

At a genome-wide scale, the recombination rate of 22.8 ± 0.4 SE (±3.7 SD) events per cell agrees well with the average male results implied from other methods, such as cytological imaging (49.8 ± 0.4 SE [±4.3 SD] MLH1 loci within the tetraploid spermatocytes [] and data inference (24.0 ± 0.2 SE [±2.7 SD] from Caucasian pedigrees []). The slightly lower recombination level in P0 is consistent with P0’s genotype of RNF212 (T/T at rs3796619), which is associated with a 5% lower recombination level than average (). When comparing the number of recombination events within each chromosome, we found similar discrepancies between chromosome length in base pairs and recombination rates ( Table S3 ), as has been previously reported by both cytological and pedigree studies ().

By mapping the genotyping results from each sperm cell to the two somatic haplotypes obtained by microfluidic direct deterministic phasing (DDP) of single lymphocytes (), we detected single-chromosome deletions in two cells ( Figure 5 A), whereas the other 91 cells gave a total of 2,075 autosomal crossover events (22.8 ± 0.4 SE [±3.7 SD] in each sperm) ( Figure 2 D and Table S2 ). The sizes of crossovers range from a few hundred base pairs to >1 Mbp, with 59%, 37%, and 13% of the total events localized to intervals of 200 kb, 100 kb, and 30 kb, respectively, comparing to 70%, 51%, and 20% from previous Hutterite pedigree data for the same intervals. The fact that P0 has a low number of heterozygous loci in the genotyping panel, in combination with the genotype calling rate, contributed to the slightly lower resolution of our data. The collection of all of these recombination events yields a personal recombination map for P0. To our knowledge, this is the first reported high-resolution genome-wide personal recombination map for an individual.

We selected 93 amplification products with high yield and no heterozygous genotyping calls for an additional round of MDA, followed by Illumina Omni1S whole-genome genotyping ( Table S1 ). Each single cell yielded successful calls at ∼30%–50% of the 1.2 million SNPs tested ( Figure 2 C), of which 83.2% were called as homozygous. The lower call rate on the bead array as compared to genotyping PCR is due to amplification bias from MDA. The abundance variation across different regions of the genome exceeds the dynamic range of microarray, and the underrepresented loci are not detected. TaqMan PCR, which has much larger dynamic range, gave >70% call rate, and this reveals the true extent of coverage of the amplification products. The heterozygous false positive rate is due to similar effects. Within the 0 to ∼3 Illumina signal intensity spectrum, the mean intensity of homozygous calls was 1.27, whereas the mean of heterozygous calls was 0.12, which is barely above the default noise cutoff value of 0.1. These results, together with those from qPCR, reveal that the heterozygous calls are false positives due to low signal intensity. To improve the genotyping accuracy, we applied a stringent noise cutoff on the raw genotyping calls to remove the low-intensity signals and hence eliminate the heterozygous calls.

We collected a sperm sample from a 40-year-old Caucasian individual (P0) whose genome has been sequenced (), clinically annotated (), and haplotype phased (). The patient has healthy offspring and normal clinical semen analysis results. Before the amplification reaction, we verified which microfluidic chambers held sperm cells with optical microscopy ( Figure 1 ). With the products of each of the 125 single-cell amplification attempts, we performed 46 loci genotyping TaqMan PCR to evaluate the amplification performance (a total of 5,750 PCR reactions, a subset of which is shown in Figure 2 A). Across the 125 samples, the mean call rate is 76.5% (4,398 out of 5,750), and 98 samples yielded call rates >70%, indicating effective whole-genome amplification ( Figure 2 B). Eight samples gave signals in <30% of the PCR assays ( Figure 2 A, chamber 11), suggesting amplification failure or misidentification of sperm cells by imaging. Because of the haploid nature of sperm cells, amplification products from single sperm cells should give only homozygous genotyping results, regardless of the polymorphism status of the diploid genome. As expected, 99.4% of the positive PCR reactions yielded signals from only one allele, and the allele combinations from multiple amplification products at each position match the genomic genotype at that locus. The 26 heterozygous calls (0.6% of 4,398) reside in 11 of the 125 single-cell experiments (ranging from 1 to 7 per cell), and we interpreted these heterozygous calls as the consequence of multiple cells in the chamber or other DNA contamination ( Figure 2 A, chamber 23). These results show that it is possible to obtain large numbers of high-quality single-cell genome amplification products by using an automated microfluidic device, and the products can be used for downstream genomic analysis ( Table S1 available online).

(D) Detection of recombination from a single sperm sample. The two columns in each chromosome represent the two somatic haplotypes, and blue lines show the genotyping calls of heterozygous SNPs from the sample. Each switch of haplotype block indicates a recombination event.

(A) Evaluation of amplification performance using 46 loci PCR. This table represents results from a subset of sperm cells being amplified. Each row represents the content from a microfluidic chamber, and each column represents a locus, with specified chromosome number and coordination (NCBI b36). The genotypes of genomic DNA control are also shown. The two alleles of a SNP are highlighted in red and green. Heterozygous loci are labeled in blue. Sample 11 shows a genotyping profile similar to no-template WGA control, indicating misidentification of sperm cell before amplification. Sample 23 shows heterozygous genotype on chromosome 14 and sex chromosome, suggesting multiple cells during amplification.

We developed a strategy to perform parallel analysis of the haploid genomes of many individual sperm cells by employing single-sperm whole-genome amplification on a microfluidic device ( Figure 1 ). Previously, we used microfluidic automation to perform whole-genome haplotype analysis by amplifying individual chromosomes at a rate of one cell per device () and demonstrated high-fidelity single-chromosome amplification. We have now extended that principle both in parallelization and in complexity of the starting material. The device described here enables the random dispensing of cell aliquots into 48 separate chambers, leading to typically half of them holding exactly one cell. We performed high-fidelity amplification of the entire genome in each chamber, followed by whole-genome genotyping and high-throughput sequencing analyses.

Device layout and operation pipeline are slightly modified from a similar device used to measure haplotype. A single sperm cell highlighted by the red square is recognized microscopically and captured in the cross region. In the overview image of the device, control channels are filled with green dye, and flow channels are filled with red dye.

Discussion

Ashley et al., 2010 Ashley E.A.

Butte A.J.

Wheeler M.T.

Chen R.

Klein T.E.

Dewey F.E.

Dudley J.T.

Ormond K.E.

Pavlovic A.

Morgan A.A.

et al. Clinical assessment incorporating a personal genome. Levy et al., 2007 Levy S.

Sutton G.

Ng P.C.

Feuk L.

Halpern A.L.

Walenz B.P.

Axelrod N.

Huang J.

Kirkness E.F.

Denisov G.

et al. The diploid genome sequence of an individual human. Pushkarev et al., 2009 Pushkarev D.

Neff N.F.

Quake S.R. Single-molecule sequencing of an individual human genome. Wheeler et al., 2008 Wheeler D.A.

Srinivasan M.

Egholm M.

Shen Y.

Chen L.

McGuire A.

He W.

Chen Y.-J.

Makhijani V.

Roth G.T.

et al. The complete genome of an individual by massively parallel DNA sequencing. Jeffreys et al., 2005 Jeffreys A.J.

Neumann R.

Panayi M.

Myers S.

Donnelly P. Human recombination hot spots hidden in regions of strong marker association. Webb et al., 2008 Webb A.J.

Berg I.L.

Jeffreys A. Sperm cross-over activity in regions of the human genome showing extreme breakdown of marker association. Despite the advances in personal genomics thus far (), gamete genome variation within individuals, especially fine-scale personal recombination activity and germline mutation rates, has been as yet generally inaccessible. Bulk analysis of sperm cells with PCR offers high-resolution and sensitivity () and has been used to demonstrate variable usage of historical recombination hot spots but is limited to investigating focused areas within the genome. Cytological approaches can be used to study recombination-related effects in individuals, but these studies use gamete progenitor cells instead of sperm and have several limitations. First, the sample collection requires invasive biopsies. Second, the analysis targets the synaptonemal complexes in spermatocytes, so each progenitor cell analyzed by this method predicts an average result of putative recombination from four future sperm cells. Third, cytological staining does not allow high-resolution molecular analysis such as genotyping or sequencing.

Navin et al., 2011 Navin N.

Kendall J.

Troge J.

Andrews P.

Rodgers L.

McIndoo J.

Cook K.

Stepansky A.

Levy D.

Esposito D.

et al. Tumour evolution inferred by single-cell sequencing. Hou et al., 2012 Hou Y.

Song L.

Zhu P.

Zhang B.

Tao Y.

Xu X.

Li F.

Wu K.

Liang J.

Shao D.

et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Xu et al., 2012 Xu X.

Hou Y.

Yin X.

Bao L.

Tang A.

Song L.

Li F.

Tsang S.

Wu K.

Wu H.

et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. There has been increasing interest in performing single-cell genome analysis in human cancers, and one can compare the methods and results used in cancer with those used here for human gamete genomes. One group used FACS to sort individual nuclei from human breast tumors (). The genomes from these nuclei were amplified in microliter volumes and lightly sequenced to ∼0.2 × coverage. This data was sufficient to construct a rough cell lineage map but did not allow calling of individual bases; rather, low-resolution structural variants were used. Another group used mouth pipetting to isolate individual cells from hematopoietic and kidney tumor (), whose genomes were then amplified in microliter volumes. Rather than performing whole-genome analyses, these samples were then put through exome amplification and sequencing—effectively obtaining 30 × coverage of only 1% of the genome. That data was also used to establish lineage relationships between the cells, this time on the basis of point mutations. Their work reveals one of the challenges of performing single-cell analyses on diploid genomes: only 57% of the diploid calls were correct. Without the ability to examine a significant proportion of the whole genome, the studies mentioned above had to rely on high mutation rate to distinguish single cells. As a consequence, none of the methods have been applied to samples other than late-stage cancers.

Hou et al., 2012 Hou Y.

Song L.

Zhu P.

Zhang B.

Tao Y.

Xu X.

Li F.

Wu K.

Liang J.

Shao D.

et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Woyke et al., 2011 Woyke T.

Sczyrba A.

Lee J.

Rinke C.

Tighe D.

Clingenpeel S.

Malmstrom R.

Stepanauskas R.

Cheng J.-F. Decontamination of MDA reagents for single cell whole genome amplification. Xu et al., 2012 Xu X.

Hou Y.

Yin X.

Bao L.

Tang A.

Song L.

Li F.

Tsang S.

Wu K.

Wu H.

et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Blainey and Quake, 2011 Blainey P.C.

Quake S.R. Digital MDA for enumeration of total nucleic acid contamination. In this study, we applied microfluidics to single-cell whole-genome amplification. This technique not only enables great parallelization, but also improves amplification performance. MDA is sensitive to environmental contamination, and extensive sample purification is required for traditional bench-top whole-genome amplifications (). More sensitive assays even revealed contamination in the MDA reagents (). By incorporating the amplification into microfluidic chips, we reduced the reaction volume and, hence, the contamination by ∼1,000 fold.

−5 false discovery rates from MDA ( Hou et al., 2012 Hou Y.

Song L.

Zhu P.

Zhang B.

Tao Y.

Xu X.

Li F.

Wu K.

Liang J.

Shao D.

et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Xu et al., 2012 Xu X.

Hou Y.

Yin X.

Bao L.

Tang A.

Song L.

Li F.

Tsang S.

Wu K.

Wu H.

et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. −9 with 5 × coverage (binomial probability with per read error rate). An important feature of single molecule MDA is its repetitive usage of the originating genuine template molecule. Even if an amplification error happens in the initial stage, there will still be a large fraction of products preserving the correct base information from the original template, and the power of statistics from multiple coverage discriminates these errors from true genomic variation. Amplification error has been a concern for single-cell whole-genome analysis. Previous microliter volume single-cell exome studies have shown 2–3 × 10false discovery rates from MDA (). Using our microfluidic approach on haploid cells, we have reduced the error rate to 4 × 10with 5 × coverage (binomial probability with per read error rate). An important feature of single molecule MDA is its repetitive usage of the originating genuine template molecule. Even if an amplification error happens in the initial stage, there will still be a large fraction of products preserving the correct base information from the original template, and the power of statistics from multiple coverage discriminates these errors from true genomic variation.

Using this microfluidic MDA approach, we reported the first genome-wide single-cell analysis of human sperm. We were able to create a personal recombination map for an individual and to measure the rate of de novo mutations in this individual’s germline. The advantage of sampling a large set of meioses from a single individual for fine-scale analysis allowed us to uncover individual specific features potentially buried under population data. P0’s preference for a subset of historical hot spots suggests how individual features contribute to the population diversity and a potential solution for the hot spot paradox. We propose that this partially overlapping feature is also the general pattern in individuals: everyone is using a different subset of the historical hot spots. While some hot spots are dying in some people, new recombination activities evolve to refill the hot spot pool; the partially overlapping patterns of individuals give rise to the population results, with hot spots (still active in many people) and deserts (used by fewer people). Support for this theory comes from single-cell analysis. Whereas P0 has, on average, 58% overlap with the historical hot spots, this ratio ranges from 0 to 100% for his single cells ( Figure S2 D). The partially overlapping patterns between individual cells produce P0’s personal recombination landscape.

Jeffreys and May, 2004 Jeffreys A.J.

May C.A. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Gay et al., 2007 Gay J.

Myers S.

McVean G. Estimating meiotic gene conversion rates from population genetic data. Transmission distortion has long been known, but the key factors behind it are not clear. Biased segregation during meiosis, differing ability to achieve fertilization, and differing postzygotic viability can all contribute to this phenomenon. Specifically, if meiotic drive exists, the molecular mechanism is not known. Our data from 91 cells showed that meiotic drive does not generally appear as whole haplotype blocks but may occur at individual SNP loci. The most intuitive explanation for this result would be gene conversion. Indeed, we found 5–15 gene conversions in each genome-sequenced cell. This represents a lower bound for the total number of conversions in each single human sperm because there is a limited heterozygous SNP density. If both crossover events and gene conversion originate from double-strand breaks and share a recombination mechanism, then they should have the same hot spot overlapping ratio. If we match the number of gene conversions at hot spots and further assume that there are 1.5 million heterozygous SNP in the genome, the total number of gene conversions in a single cell would be ∼250–800, which is 10–35× the number of crossovers. Previous sperm typing studies have suggested 4–15× the number of gene conversions over crossovers, based on data from three hot spots (). But this value apparently changes across the genome ().

Makova and Li, 2002 Makova K.D.

Li W.-H. Strong male-driven evolution of DNA sequences in humans and apes. Conrad et al., 2011 Conrad D.F.

Keebler J.E.M.

DePristo M.A.

Lindsay S.J.

Zhang Y.

Casals F.

Idaghdour Y.

Hartl C.L.

Torroja C.

Garimella K.V.

et al. 1000 Genomes Project

Variation in genome-wide mutation rates within and between human families. Li et al., 2012 Li J.

Harris R.A.

Cheung S.W.

Coarfa C.

Jeong M.

Goodell M.A.

White L.D.

Patel A.

Kang S.-H.

Shaw C.

et al. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome. Conrad et al., 2011 Conrad D.F.

Keebler J.E.M.

DePristo M.A.

Lindsay S.J.

Zhang Y.

Casals F.

Idaghdour Y.

Hartl C.L.

Torroja C.

Garimella K.V.

et al. 1000 Genomes Project

Variation in genome-wide mutation rates within and between human families. Miyoshi et al., 1992 Miyoshi Y.

Ando H.

Nagase H.

Nishisho I.

Horii A.

Miki Y.

Mori T.

Utsunomiya J.

Baba S.

Petersen G.

et al. Germ-line mutations of the APC gene in 53 familial adenomatous polyposis patients. Wang and Edelmann, 2006 Wang J.Y.J.

Edelmann W. Mismatch repair proteins as sensors of alkylation DNA damage. Evolutionary studies have estimated the germline mutation level (), but recent results from the 1000 Genome Project () are not consistent with the previous findings. The combination of data from our study and the 1000 Genome Project suggests that the germline mutation rate can vary greatly among different individuals, but not among different cells from the same individual. This may explain why the male mutation rate is not always higher than the female. DNA methylation also affects genome instability () and C→T point mutation levels but in opposite ways. A fine-tuned methylation level is therefore required for high-quality sperm genome. The high germline mutation rate at CpA regions () at least suggests a methylation profile that is different from the somatic genome. The fact that cytosine deamination is less well repaired at CpA than at CpG also explains our findings ().

The ability to study a large number of single sperm cells has offered several new insights in meiosis. Studying the germline genome is but one application of single-cell genomics, and we expect that the method described here will find applications in many other fields, including cancer, aging, immunology, and developmental biology.