Further information and requests for reagents may be directed to, and will be fulfilled by the Lead Contact, Charles Swanton ( charles.swanton@crick.ac.uk ).

The clinical details of the cohort are described in detail in. In total, 38 patients were female, while 62 were male. The median age at diagnosis was 68 (range, 34-85).

The TRACERx 100 cohort comprises the first 100 patients prospectively analyzed by the lung TRACERx study ( https://clinicaltrials.gov/ct2/show/NCT01888601 , approved by an independent Research Ethics Committee, 13/LO/1546) and mirrors the prospective 100 patient cohort described in

Method Details

LOHHLA (Loss Of Heterozygosity in Human Leukocyte Antigen) algorithm Shukla et al., 2015 Shukla S.A.

Rooney M.S.

Rajasagi M.

Tiao G.

Dixon P.M.

Lawrence M.S.

Stevens J.

Lane W.J.

Dellagatta J.L.

Steelman S.

et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Szolek et al., 2014 Szolek A.

Schubert B.

Mohr C.

Sturm M.

Feldhahn M.

Kohlbacher O. OptiType: precision HLA typing from next-generation sequencing data. As input, LOHHLA requires: a tumor and germline BAM; patient-specific HLA calls, either predicted by an HLA inference tool (e.g., POLYSOLVER [] or Optitype []) or through HLA serotyping; the HLA fasta file location; purity and ploidy estimates. (For implementation of LOHHLA in this manuscript, ASCAT was used to estimate tumor purity and ploidy, while HLA inference was performed using POLYSOLVER, see below.) To call HLA LOH, LOHHLA relies upon five computational steps: Step 1: extract HLA reads First, tumor and germline reads that map to the HLA region of the genome (chr6:29909037-29913661, chr6:31321649-31324964, and chr6:31236526-31239869) as well as chromosome 6 contigs (chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7) are extracted using samtools view. Unpaired mates from this step are removed and the output is converted to FASTQ format. Step 2: create HLA allele specific BAM files Shukla et al., 2015 Shukla S.A.

Rooney M.S.

Rajasagi M.

Tiao G.

Dixon P.M.

Lawrence M.S.

Stevens J.

Lane W.J.

Dellagatta J.L.

Steelman S.

et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. For each of the patient’s heterozygous HLA alleles, a patient-specific reference fasta is created. The FASTQ files generated in the previous step are used to generate HLA specific BAM files,using similar mapping parameters to those previously published that allow for reads to map to multiple HLA alleles (). Post-alignment filtering is subsequently performed such that reads whose mates map to a different allele are discarded, as well as any reads that contain more than one insertion, deletion, or mismatch event compared to the reference HLA allele. For each filtered tumor/germline HLA allele-specific BAM file, coverage is then calculated using samtools mpileup. Step 3: determine coverage at mismatch positions between homologous HLA alleles For each HLA locus, a local pairwise alignment is performed between the two homologous HLA alleles, using the R Biostrings package. From the pairwise alignment, all of the mismatch positions between the two homologs are extracted. The HLA-specific coverage calculated in Step 2 is then used to determine differences in coverage at each of the mismatch positions. An additional file is also generated containing the coverage at every mismatch position, counting each read only once, as to avoid over-counting reads that span more than one mismatch position. Step 4: obtain HLA specific logR and BAF LogR across each HLA gene is then obtained by binning the coverage across both homologous alleles at 150 base pair intervals, for both tumor and normal. For each bin, the tumor/normal coverage ratio is multiplied by the multiplication factor, M, corresponding to number of unique mapped reads in the germline, divided by the number of unique mapped reads in the tumor region. The BAF, corresponding to the coverage of HLA allele 1 divided by the coverage of HLA allele 1 + coverage of HLA allele 2, is subsequently calculated at each polymorphic site. Step 5: determine HLA haplotype specific copy number A l l e l e 1 = ρ − 1 + B A F × 2 logR × ( 2 ( 1 − ρ ) + ρ × ψ ) ρ

A l l e l e 2 = ρ − 1 − 2 ( B A F − 1 ) logR × ( 2 ( 1 − ρ ) + ρ × ψ ) ρ

where ρ = tumor purity and ψ = tumor ploidy, which are input at the start. The logR value from the corresponding bin in which the polymorphic site was found to reside is used as well as the BAF of the polymorphic site. Finally, at each polymorphic site, an estimate of the major and minor allele copy number is obtained using the following equations:where= tumor purity and= tumor ploidy, which are input at the start. The logR value from the corresponding bin in which the polymorphic site was found to reside is used as well as the BAF of the polymorphic site. For each bin, the median Allele 1 and Allele 2 copy number is then determined. To estimate copy number of Allele 1, the median value across bins is calculated. Likewise, to estimate the copy number of Allele 2, the median value across bins is calculated. A copy number < 0.5, is classified as subject to loss, and thereby indicative of LOH. To avoid over-calling LOH, we also calculate a p value relating to allelic imbalance for each HLA gene. This p value corresponds to the pairwise difference in logR values at mismatch sites between the two HLA homologs, adjusted to ensure each sequencing read is only counted once. Allelic imbalance is determined if p < 0.01 using the paired Student’s t-Test between the two distributions.

TRACERx 100 Cohort Jamal-Hanjani et al., 2017 Jamal-Hanjani M.

Wilson G.A.

McGranahan N.

Birkbak N.J.

Watkins T.B.K.

Veeriah S.

Shafi S.

Johnson D.H.

Mitter R.

Rosenthal R.

et al. TRACERx Consortium

Tracking the Evolution of Non-Small-Cell Lung Cancer. TRACERx samples considered were obtained from (). Four patients were excluded due to homozygosity at all three HLA loci or too few mismatch positions between HLA alleles. Lung adenocarcinoma and lung squamous cell carcinoma tumors were considered for downstream analyses. Seven tumors were classified as having a separate histology. Of these, one carcinosarcoma exhibited HLA LOH and three adenosquamous carcinomas, one carcinosarcoma, one large cell carcinoma, and one large cell neuroendocrine tumor did not.

Comparison of ASCAT and LOHHLA In order to compare ASCAT and LOHHLA we treated each tumor region as a separate sample, and ran it through the LOHHLA pipeline with default settings. Note, for this analysis we used all TRACERx samples available, including NSCLCs that were not classified as lung adenocarcinomas or lung squamous cell carcinomas. Given that it was not possible to directly infer the copy number of the HLA alleles using ASCAT, the segment overlapping the HLA locus was used. In twenty-five tumor regions from seven tumors no segment overlapped the HLA locus, and in these cases, the closest genomic segment was used. To compare our allelic imbalance estimates, we considered a tumor region to be concordant if ASCAT predicted allelic imbalance across the locus and at least one HLA gene using LOHHLA was found to harbor allelic imbalance. Likewise, for LOH, we considered ASCAT and LOHHLA estimates to be concordant if ASCAT predicted a minor allele of 0 and this was also predicted for at least one HLA gene. Conversely, allelic imbalance estimates were classified as discordant if allelic imbalance was predicted in any HLA gene using LOHHA and not with ASCAT. Similarly, LOH was classified as discordant if any HLA gene using LOHHLA was classified as exhibiting a minor allele of 0 and no LOH was identified using ASCAT.

Fragment analysis validation of LOHHLA results Allelic imbalance was validated using four polymorphic Sequence-Tagged Site (STR) markers located on the short arm of chromosome 6, close to the HLA locus - (D6S2852, D6S2872, D6S248 and D6S1022). 20ng of patient germline and tumor region DNA was amplified using the PCR. The PCR comprised of 35 cycles of denaturing at 95C for 45 s, followed by an annealing temperature of 55C for 45 s and then a PCR extension at 720C for 45 s. PCR products were separated on the ABI 3730xl DNA analyzer. Fragment length and area under the curve of each allele was determined using the Applied Biosystems software GeneMapper v5. When two separate alleles were identified for a particular marker, the fragments could be analyzed for allelic imbalance using the formula (A tumor /B tumor )/(A normal /B normal ). The output of this formula was defined as the normalized allelic ratio.

HLA Type, HLA Mutations, and Predicted NeoAntigen Binders Shukla et al., 2015 Shukla S.A.

Rooney M.S.

Rajasagi M.

Tiao G.

Dixon P.M.

Lawrence M.S.

Stevens J.

Lane W.J.

Dellagatta J.L.

Steelman S.

et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. The HLA type for each sample was inferred using POLYSOLVER (POLYmorphic loci reSOLVER), which uses a normal tissue BAM file as input and employs a Bayesian classifier to determine genotype (). HLA mutations in each tumor region were also assessed using POLYSOLVER. Jamal-Hanjani et al., 2017 Jamal-Hanjani M.

Wilson G.A.

McGranahan N.

Birkbak N.J.

Watkins T.B.K.

Veeriah S.

Shafi S.

Johnson D.H.

Mitter R.

Rosenthal R.

et al. TRACERx Consortium

Tracking the Evolution of Non-Small-Cell Lung Cancer. 50 binding affinities and rank percentage scores, representing the rank of the predicted affinity compared to a set of 400,000 random natural peptides, were calculated for all peptides binding to each of the patient’s HLA alleles using netMHCpan-2.8 and netMHC-4.0 ( Andreatta and Nielsen, 2016 Andreatta M.

Nielsen M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Hoof et al., 2009 Hoof I.

Peters B.

Sidney J.

Pedersen L.E.

Sette A.

Lund O.

Buus S.

Nielsen M. NetMHCpan, a method for MHC class I binding prediction beyond humans. Nielsen et al., 2003 Nielsen M.

Lundegaard C.

Worning P.

Lauemøller S.L.

Lamberth K.

Buus S.

Brunak S.

Lund O. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Novel 9-11-mer peptides that could arise from identified non-silent mutations present in the sample () were determined. The predicted ICbinding affinities and rank percentage scores, representing the rank of the predicted affinity compared to a set of 400,000 random natural peptides, were calculated for all peptides binding to each of the patient’s HLA alleles using netMHCpan-2.8 and netMHC-4.0 (). Putative neoantigen binders were those peptides with a predicted binding affinity < 500nM or rank percentage score < 2%.

Mapping HLA LOH to phylogenetic trees and identification of parallel evolution LOH events detected in every tumor region tested were considered to be clonal events and mapped to the trunk of the phylogenetic tree. For heterogeneous LOH events, the regional copy number of the HLA allele lost was used in conjunction with the patient tree structure and subclone cancer cell fractions in a quadratic programming approach, using the R package quadprog, to determine the best placement of the LOH event. min ( − d ∧ T b + 1 / 2 b ∧ T D b )

with the constraints: A ∧ T b > = bvec .

This was achieved by solving a quadratic programming equation:with the constraints: The LOH event was tested at each branch. For each possibility, the phylogenetic tree was broken into two, one containing all clones after the LOH event and the other consisting of the remainder of the tree. A 2xn matrix, where n is the number of regions sampled, was constructed containing the regional sum of the cancer cell fractions for each subclone in the subtree and the regional sum of cancer cell fractions from subclones in the remaining tree. The regional cancer cell fraction matrix was multiplied by the transpose of itself to generate a 2x2 matrix for input (Dmat) into the quadprog function, solve.QP. The vector to be minimized (dvec) was obtained by multiplying the LOHHLA calculated HLA allele copy number for each region by the transpose of the regional cancer cell fraction matrix. Finally, the solve.QP function was called with Dmat and dvec, using a constraint matrix, Amat, such that all results had to be positive and a constraint vector, bvec, such that the estimated copy number of HLA allele for the remaining tree was at least 0.5. The errors between observed and predicted copy number values from placing LOH event on each branch were output and the solution providing the least error was selected. Each mapped event was inspected and events that did not fit the phylogenetic tree or had large error values, either indicating the presence of an additional subclone or multiple independent HLA LOH events, were manually adjusted. Patients CRUK0013, CRUK0061, CRUK0082, and CRUK0084 had HLA LOH events that did not fit the current phylogentic tree, so additional nodes (indicated in gray) were included to contain the HLA LOH event. Patients CRUK003, CRUK0032, CRUK0051, and CRUK0062 had multiple independent HLA LOH events which were manually mapped.

Assessing significance of focal and arm-level LOH In order to assess whether HLA LOH occurred more than expected by chance, we considered whether each LOH event was focal or arm-level in nature. In brief, to classify LOH as arm-level or focal, we focused on the minor allele frequency across the genome. First, any segments (as predicted by ASCAT) with identical minor allele copy numbers were merged. Subsequently, segments that spanned > = 75% the length of a given chromosome arm, were classified as ‘arm-level’, while segments that were < 75% were considered focal. To assess the significance of focal events, for each tumor, the proportion of the genome subject to focal minor allele loss was determined. This value was assumed to reflect the probability for focal minor allele loss in each tumor. Based on this probability, we generated an aberration state (loss or no loss) for each sample separately and determined the proportion of samples exhibiting loss. We repeated this process 10,000 times to obtain a background distribution reflecting the likelihood of observing losses given the probability of loss in each sample. A p value reflecting the likelihood of observing the level of minor allele loss seen at the HLA locus was determined by counting the percentage of simulations showing a higher proportion loss than that observed. The same procedure was conducted for arm-level events, using the observed frequency of arm-level allele specific loss in each tumor.

Mutational signature analysis Rosenthal et al., 2016 Rosenthal R.

McGranahan N.

Herrero J.

Taylor B.S.

Swanton C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Mutational signatures were estimated using the deconstructSigs R package (). Signature 1A, 2, 4, 5, 13 were considered.

Assessing whether neoantigens preferentially bind to loss HLA alleles To assess whether neoantigens preferentially bind to lost HLA alleles, we focused on tumors exhibiting six distinct HLA alleles (i.e., no homozygosity for any allele in the germline) and loss of one HLA haplotype (HLA-A, HLA-B and HLA-C) in at least one tumor region. 50 binding scores. Duplicate mutations were removed to ensure each neoantigen reflected the highest binding score (lowest IC 50 value) for any given mutation. We further filtered the mutation list to only include subclonal mutations (defined as previously described ( Jamal-Hanjani et al., 2017 Jamal-Hanjani M.

Wilson G.A.

McGranahan N.

Birkbak N.J.

Watkins T.B.K.

Veeriah S.

Shafi S.

Johnson D.H.

Mitter R.

Rosenthal R.

et al. TRACERx Consortium

Tracking the Evolution of Non-Small-Cell Lung Cancer. Neoantigens (as defined above), were ranked according to ICbinding scores. Duplicate mutations were removed to ensure each neoantigen reflected the highest binding score (lowest ICvalue) for any given mutation. We further filtered the mutation list to only include subclonal mutations (defined as previously described ()) occurring in the tumor regions harboring loss events (> 5% VAF). The number of subclonal neoantigens binding to each haplotype was then determined for each tumor. A paired wilcoxon test was used to compare the number of subclonal neoantigens binding to the lost haplotype compared to the kept haplotype.

PD-L1 immunohistochemistry Herbst et al., 2014 Herbst R.S.

Soria J.C.

Kowanetz M.

Fine G.D.

Hamid O.

Gordon M.S.

Sosman J.A.

McDermott D.F.

Powderly J.D.

Gettinger S.N.

et al. Predictive correlates of response to the anti-PD-L1 antibody MPDL3280A in cancer patients. Formalin-fixed, paraffin-embedded (FFPE) tissue sections of 4-um thickness were stained for PD-L1 with an anti-human PD-L1 rabbit monoclonal antibody (clone SP142; Ventana, Tucson, AZ) on an automated staining platform (Benchmark; Ventana) with the OptiView DAB IHC Detection Kit and the OptiView Amplification Kit (Ventana Medical Systems Inc.) in a GCP-compliant central laboratory (Targos Molecular Pathology GmbH). PD-L1 expression was evaluated on tumor cells and tumor-infiltrating immune cells. For tumor cells the proportion of PD-L1-positive cells was estimated as the percentage of total tumor cells. For tumor-infiltrating immune cells, the percentage of PD-L1-positive tumor-infiltrating immune cells occupying the tumor was recorded. Scoring was performed by a trained histopathologist [according to previously published scoring criteria ()].