Here we describe the sequencing and analysis of a primary human cancer genome using next-generation sequencing technology. Our patient’s tumour genome was essentially diploid, and contained ten non-synonymous somatic mutations that may be relevant for her disease. These mutations affect genes participating in several well-described pathways that are known to contribute to cancer pathogenesis, but most of these genes would not have been candidates for directed re-sequencing on the basis of our current understanding of cancer. Hence, these results justify the use of next-generation whole-genome sequencing approaches to reveal somatic mutations in cancer genomes.

As we demonstrated in our re-sequencing of the genome of the C. elegans N2 Bristol strain14, and again in this study, massively parallel short-read sequencing provides an effective method for examining single nucleotide and short indel variants by comparison of the aligned reads to a reference genome sequence. By sequencing our patient’s tumour genome to a depth of >30-fold coverage, and gauging our ability to detect known heterozygous positions across the genome, we have produced a sufficient depth and breadth of sequence coverage to comprehensively discover somatic genome variants. A slightly lower coverage of the normal genome from this individual helped to identify nearly 98% of potential variants as being inherited, a critical filter that allowed us to more readily identify the true somatic mutations in this tumour. Our results strongly support the notion that hypothesis-driven (for example, candidate gene-based) examination of tumour genomes by PCR-directed or capture-based methods is inherently limited, and will miss key mutations. A further and important consideration is the demand for large amounts of genomic DNA by these techniques; this is a serious limitation when precious clinical samples are being studied. The Illumina/Solexa technology requires only ∼1 μg of DNA per library, enabling the study of primary tumour DNA rather than requiring the use of tumour cell lines, which may contain genetic changes and adaptations required for immortalization and maintenance in tissue culture conditions.

A total of ten non-synonymous somatic mutations were identified in this patient’s tumour genome. Two are well-known AML-associated mutations, including an internal tandem duplication of the FLT3 receptor tyrosine kinase gene, which constitutively activates kinase signalling, and portends a poor prognosis5,24,25, and a four-base insertion in exon 12 of the NPM1 gene (NPMc)26,27,28. Both of these mutations are common (25–30%) in AML tumours, and are thought to contribute to progression of the disease rather than to cause it directly29. Notably, the frequency of the mutant FLT3 allele in the primary and relapse tumour samples (35.08% and 31.30%, respectively) was significantly less than that of the other nine mutations (P < 0.000001 for both the primary and relapse samples). These data suggest that the FLT3 ITD may not have been present in all tumour cells, and further, that it may have been the last mutation acquired.

The other eight somatic mutations that we detected are all single base changes, and none has previously been detected in an AML genome. Four of the genes affected, however, are in gene families that are strongly associated with cancer pathogenesis (including PTPRT, CDH24, PCLKC and SLC15A1). The other four somatic mutations occurred in genes not previously implicated in cancer pathogenesis, but whose potential functions in metabolic pathways suggest mechanisms by which they could act to promote cancer (including KNDC1, GPR123, EBI2 and GRINL1B). We speculate about the roles of these mutations for the pathogenesis of this patient’s disease in Supplementary Information.

The importance of the eight newly defined somatic mutations for AML pathogenesis is not yet known, and will require functional validation studies in tissue culture cells and mouse models to assess their relevance. Even though we could not detect recurrent mutations in the limited AML sample set that we surveyed, several lines of evidence suggest that these mutations may not be random, ‘passenger’ mutations. First, somatic mutations in this genome are extremely rare. The rarity of somatic variants, and the normal diploid structure of the tumour genome, argues strongly against genetic instability or DNA repair defects in this tumour. Conceptually, this result is further supported by the very small number of somatic mutations discovered in the expressed tyrosine kinases of AML samples4,5; genetic instability does not seem to be a general feature of AML genomes.

Second, on the basis of the equivalent frequencies of the variant and wild-type alleles for the mutations in the tumour genome (except for FLT3 ITD), it is highly probable that all the mutations are heterozygous, and are present in virtually all of the tumour cells (Fig. 3). The latter suggests that these mutations may have all been selected for and retained because they are important for disease pathogenesis in this patient. Alternatively, all may have occurred simultaneously in the same leukaemia-initiating cell, but only a subset of the mutations (or an as-yet undetected mutation) is truly important for pathogenesis (that is, disease ‘drivers’ versus passengers). Although we suggest that the latter hypothesis is very unlikely on the basis of our current understanding of tumour progression, many more AML genomes will need to be sequenced to resolve this issue.

Third, the same mutations were detected in tumour cells in the relapse sample at approximately the same frequencies as in the primary sample. All of these mutations were therefore present in the resistant tumour cells that contributed to the patient’s relapse, further suggesting that a single clone contains all ten mutations. Fourth, seven of the ten genes containing somatic mutations were detectably expressed in the tumour sample. FLT3 and NPM1 messenger RNAs were highly expressed in this tumour sample, as they are in virtually all AML samples. We detected mRNA from the CDH24, SLC15A1 and EBI2 genes on the Affymetrix expression array, whereas expression of GRINL1B and PCLKC were detected by PCR with reverse transcription (RT–PCR; data not shown). Expression of KNDC1, PTPRT and GPR123 was not detected by either approach, but we cannot rule out expression of these genes in a small subset of tumour cells (for example, leukaemia-initiating cells). Furthermore, for the five point mutations where data are available, the mutated base is highly conserved across multiple species (Table 2).

Although we performed whole-genome sequencing on this cancer sample, we restricted our initial validation studies to the 1–2% of the genome that encodes genes. This raises the issue of whether sequencing the complementary DNA transcriptome of this tumour would have been a faster, cheaper and more efficient way of finding the mutations. Although this approach will undoubtedly be an important adjunct to whole-genome sequencing, there are several advantages to the approach we used: (1) coverage models for whole-genome libraries are at present better understood than for cDNA libraries, where transcript abundance can vary over many orders of magnitude; (2) even if the transcriptome had been sequenced, extensive characterization of the normal genome would have been required to distinguish inherited variants from somatic mutations; and (3) relevant non-synonymous mutations could be missed by cDNA sequencing, including mutations that result in RNA instability (splice variants, nonsense mutations), and/or mutations in genes expressed at low levels, or in only a small subset of tumour cells.

The additional non-coding and non-genic somatic variants in this genome (which we presently estimate at 500–1,000 on the basis of our calculated false positive and negative rates for non-synonymous mutations), will provide a rich source of potentially relevant sequence changes that will be better understood as more cancer genomes are sequenced.

In summary, we have successfully used a next-generation whole-genome sequencing approach to identify new candidate genes that may be relevant for AML pathogenesis. We cannot overemphasize the importance of parallel sequencing of the patient’s normal genome to determine which variants were inherited; the identification of the true somatic mutations in this tumour genome would not have been feasible without this approach. Furthermore, until hundreds (or perhaps thousands) of normal genomes and other AML tumours are sequenced, the contextual relevance of the mutations found in this genome will be unknown. Nevertheless, the somatic mutations that we did find were neither predicted by the curation of previously defined cancer genes, nor by the study of this tumour using unbiased, high-resolution array-based genomic approaches. For AML and other types of cancer, whole-genome sequencing may therefore be the only effective means for discovering all of the mutations that are relevant for pathogenesis.