We performed whole-exome sequencing (WES) on tumour and germline DNA, with a mean coverage of 97.6× and 95.8×, respectively, as performed previously17. The mean somatic mutation rate across the TCGA cohort was 8.87 mutations per megabase (Mb) of DNA (range: 0.5–48, median: 5.78). The non-synonymous mutation rate was 6.86 per Mb. MutSig2CV18 identified significantly mutated genes among our 230 cases along with 182 similarly-sequenced, previously reported lung adenocarcinomas12. Analysis of these 412 tumour/normal pairs highlighted 18 statistically significant mutated genes (Fig. 1a shows co-mutation plot of TCGA samples (n = 230), Supplementary Fig. 2 shows co-mutation plot of all samples used in the statistical analysis (n = 412) and Supplementary Table 4 contains complete MutSig2CV results, which also appear on the TCGA Data Portal along with many associated data files (https://tcga-data.nci.nih.gov/docs/publications/luad_2014/). TP53 was commonly mutated (46%). Mutations in KRAS (33%) were mutually exclusive with those in EGFR (14%). BRAF was also commonly mutated (10%), as were PIK3CA (7%), MET (7%) and the small GTPase gene, RIT1 (2%). Mutations in tumour suppressor genes including STK11 (17%), KEAP1 (17%), NF1 (11%), RB1 (4%) and CDKN2A (4%) were observed. Mutations in chromatin modifying genes SETD2 (9%), ARID1A (7%) and SMARCA4 (6%) and the RNA splicing genes RBM10 (8%) and U2AF1 (3%) were also common. Recurrent mutations in the MGA gene (which encodes a Max-interacting protein on the MYC pathway19) occurred in 8% of samples. Loss-of-function (frameshift and nonsense) mutations in MGA were mutually exclusive with focal MYC amplification (Fisher’s exact test P = 0.04), suggesting a hitherto unappreciated potential mechanism of MYC pathway activation. Coding single nucleotide variants and indel variants were verified by resequencing at a rate of 99% and 100%, respectively (Supplementary Fig. 3a, Supplementary Table 5). Tumour purity was not associated with the presence of false negatives identified in the validation data (P = 0.31; Supplementary Fig. 3b).

Figure 1: Somatic mutations in lung adenocarcinoma. a, Co-mutation plot from whole exome sequencing of 230 lung adenocarcinomas. Data from TCGA samples were combined with previously published data12 for statistical analysis. Co-mutation plot for all samples used in the statistical analysis (n = 412) can be found in Supplementary Fig. 2. Significant genes with a corrected P value less than 0.025 were identified using the MutSig2CV algorithm and are ranked in order of decreasing prevalence. b, c, The differential patterns of mutation between samples classified as transversion high and transversion low samples (b) or male and female patients (c) are shown for all samples used in the statistical analysis (n = 412). Stars indicate statistical significance using the Fisher’s exact test (black stars: q < 0.05, grey stars: P < 0.05) and are adjacent to the sample set with the higher percentage of mutated samples. PowerPoint slide Full size image

Past or present smoking associated with cytosine to adenine (C >A) nucleotide transversions as previously described both in individual genes and genome-wide12,13. C > A nucleotide transversion fraction showed two peaks; this fraction correlated with total mutation count (R2 = 0.30) and inversely correlated with cytosine to thymine (C > T) transition frequency (R2 = 0.75) (Supplementary Fig. 4). We classified each sample (Supplementary Methods) into one of two groups named transversion-high (TH, n = 269), and transversion-low (TL, n = 144). The transversion-high group was strongly associated with past or present smoking (P < 2.2 × 10−16), consistent with previous reports13. The transversion-high and transversion-low patient cohorts harboured different gene mutations. Whereas KRAS mutations were significantly enriched in the transversion-high cohort (P = 2.1 × 10−13), EGFR mutations were significantly enriched in the transversion-low group (P = 3.3 × 10−6). PIK3CA and RB1 mutations were likewise enriched in transversion-low tumours (P < 0.05). Additionally, the transversion-low tumours were specifically enriched for in-frame insertions in EGFR and ERBB2 (ref. 5) and for frameshift indels in RB1 (Fig. 1b). RB1 is commonly mutated in small-cell lung carcinoma (SCLC). We found RB1 mutations in transversion-low adenocarcinomas were enriched for frameshift indels versus single nucleotide substitutions compared to SCLC (P < 0.05)20,21 suggesting a mutational mechanism in transversion-low adenocarcinoma that is probably distinct from smoking in SCLC.

Gender is correlated with mutation patterns in lung adenocarcinoma22. Only a fraction of significantly mutated genes from the complete set reported in this study (Fig. 1a) were enriched in men or women (Fig. 1c). EGFR mutations were enriched in tumours from the female cohort (P = 0.03) whereas loss-of-function mutations within RBM10, an RNA-binding protein located on the X chromosome23 were enriched in tumours from men (P = 0.002). When examining the transversion-high group, 16 out of 21 RBM10 mutations were observed in males (P = 0.003, Fisher’s exact test).

Somatic copy number alterations were very similar to those previously reported for lung adenocarcinoma24 (Supplementary Fig. 5, Supplementary Table 6). Significant amplifications included NKX2-1, TERT, MDM2, KRAS, EGFR, MET, CCNE1, CCND1, TERC and MECOM (Supplementary Table 6), as previously described24, 8q24 near MYC, and a novel peak containing CCND3 (Supplementary Table 6). The CDKN2A locus was the most significant deletion (Supplementary Table 6). Supplementary Table 7 summarizes molecular and clinical characteristics by sample. Low-pass whole-genome sequencing on a subset (n = 93) of the samples revealed an average of 36 gene–gene and gene–inter-gene rearrangements per tumour. Chromothripsis25 occurred in six of the 93 samples (6%) (Supplementary Fig. 6, Supplementary Table 8). Low-pass whole genome sequencing-detected rearrangements appear in Supplementary Table 9.