a, Stacked bar chart showing the proportion of reads attributed to the human genome, mouse genome, both, neither, or with ambiguous mapping for the pure mouse fibroblast feeder line (left) or a pure human sample (right), assessed with the Xenome pipeline. b, Clean-up of mutation calls using the Xenome pipeline for one of the samples that was more heavily contaminated by the mouse feeder layer. The Venn diagram on the left shows the overlap in mutation calls before and after removing non-human reads by Xenome. c, Histograms of VAFs for two representative colonies in the sample set. The plot on the left shows a tight distribution around 50%, as expected for a colony derived from a single cell without contamination. The plot on the right shows a bimodal distribution with one peak at 50% (mutations present in the original basal cell) and a second peak at around 25% (probably representing mutations that were acquired in vitro during colony expansion). These second peaks at less than 50% are more evident in colonies from children, owing to the low number of mutations in the original basal cell. d, Histogram of VAFs for a colony seeded by more than one basal cell, leading to a peak at much less than 50%. e, Estimated sensitivity of mutation calling according to sequencing depth. Heterozygous germline polymorphisms were identified in each subject; for each colony sequenced, we calculated the fraction of these polymorphisms that was recalled by our algorithms. f, Comparison of mutational burden in normal bronchial epithelial cells that neighbour a carcinoma in situ (CIS) versus cells distant from the CIS in five patients. The box-and-whisker plots show the distribution of mutational burden per colony within each subject, with the boxes indicating median and interquartile range and the whiskers denoting the range. The overlaid points are the observed mutational burden of individual colonies.