a, Colour-gradient maps of CNAs and UPDs as detected by SNP-array karyotyping or sequencing-based assays are shown for PNE samples from high-risk and low-risk individuals, as well as dysplastic tissues (n = 12) and cancer samples (n = 51) (middle panels). Fractions of genomes showing copy-number gains (red), losses (blue) and UPD (green) are also plotted (top panels). Information about histology, age of the subject, sample size and risks of developing ESCC are shown in top panels. b, Box plots of fractions of genomic regions showing CNAs in PNE (n = 188), dysplasia (n = 12) and cancer (n = 45) samples; the median, first and third quartiles, as well as outliers, are indicated with whiskers extending to the furthest value within 1.5 of the interquartile range. P values for significant differences are from two-sided Mann–Whitney U-test. DP, dysplasia. c, Effects of lifestyle ESCC risks and age on CNAs. Mean (± 95% confidence interval) of standardized residuals of the total fraction of genomic regions showing CNAs in samples from high-risk individuals, compared with the linear regression model in samples from low-risk individuals (left panel) and Fisher’s z-transformed correlation of age to the total fraction of genomic regions showing CNAs in samples from low-risk individuals (right panel) are plotted for 0.2-mm2, 0.8-mm2, 4-mm2 and 8-mm2 samples, which are combined using a random-effects model. P value in the random-effects model is also indicated, together with the weight from each sample size (in per cent) for the model fitting (two-sided Wald test) (Methods). Numbers of samples from low-risk and high-risk individuals, respectively, are 34 and 19 (0.2-mm2 sampling), 19 and 12 (0.8-mm2 sampling), 40 and 33 (4-mm2 sampling), and 3 and 28 (8-mm2 sampling). Numbers of samples from low-risk individuals who are <50 years old and ≥50 years old, respectively, are 20 and 14 (0.2-mm2 sampling), 13 and 6 (0.8-mm2 sampling), and 7 and 33 (4-mm2 sampling), respectively. d, f, LOH maps of chromosomes 9 (d) and 17 (f) in PNE (top) and cancer (bottom) samples. Deletions and UPDs are shown by blue and green lines, respectively. Positions and mutation status of CDKN2A, NOTCH1 and TP53 genes are indicated. e, g, Bar plots of frequencies of 9q UPD (e) and 17p LOH (g) in samples from high-risk and low-risk individuals. Individuals of <50 years old (low risk, n = 40; high risk, n = 11) and those of ≥50 years old (low-risk, n = 56; high risk, n = 81) were analysed. Number (n) of samples in each group is also indicated. P values are for significant differences between both risk groups (two-sided Fisher’s exact test). Whiskers indicate the upper bounds of 95% confidence intervals from the binomial distribution. Source data