The extraordinary advancements made in DNA sequencing technologies over the past few years have led to the elucidation of ∼10,000 (refs 1–13) individual human genomes (30× or greater base coverage) from different ethnicities and using different technologies2,3,4,5,6,7,8,9,10,11,12,13 and at a fraction of the cost10 of sequencing the original human reference genome14,15. Although this is a monumental achievement, the vast majority of these genomes have excluded a very important element of human genetics. Individual human genomes are diploid in nature, with half of the homologous chromosomes being derived from each parent. The context in which variations occur on each individual chromosome can have profound effects on the expression and regulation of genes and other transcribed regions of the genome16. Furthermore, determining whether two potentially detrimental mutations occur within one or both alleles of a gene is of paramount clinical importance.

Almost all recent human genome sequencing has been performed on short read length (<200 base pairs (bp)), highly parallelized systems starting with hundreds of nanograms of DNA. These technologies are excellent at generating large volumes of data quickly and economically. Unfortunately, short reads, often paired with small mate-gap sizes (500 bases–10 kilobases (kb)), eliminate most single nucleotide polymorphism (SNP) phase information beyond a few kilobases8. Population-based genotype data has been used to successfully assemble short-read data into long haplotype blocks3, but these methods suffer from higher error rates and have difficulty phasing rare variants17. Although using pedigree information18 or combining it with population data provides further phasing power, no combination of these methods is able to resolve de novo mutations17.

At present, four personal genomes—J. Craig Venter19, a Gujarati Indian (HapMap sample NA20847)11, and two Europeans (Max Planck One13 and HapMap Sample NA12878 (ref. 20))—have been sequenced and assembled as diploid. All have involved cloning long DNA fragments in a process similar to that used for the construction of the human reference genome14,15. Although these processes generate long-phased contigs (N50 values (50% of the covered bases are found within contigs longer than this number) of 350 kb19, 386 kb11 and 1 megabase (Mb)13, and full-chromosome haplotypes in combination with parental genotypes20) they require a large amount of initial DNA, extensive library processing, and are currently too expensive11 to use in a routine clinical environment. Furthermore, several reports have recently demonstrated whole chromosome haplotyping through direct isolation of metaphase chromosomes21,22,23,24. These methods have yet to be used for whole-genome sequencing and require preparation and isolation of whole metaphase chromosomes, which can be challenging for some clinical samples. Here we introduce long fragment read (LFR) technology, a process that enables genome sequencing and haplotyping at a clinically relevant cost, quality and scale.