a, Top, sequencing reads encompassing MMs were classified into those containing both mutant alleles (cis reads), one mutant and one reference allele (trans reads), and both reference alleles (reference reads). Bottom, one-sided permutation test (n = 10,000) for the allelic configuration (cis versus trans) of MMs. In this representative case, the observed numbers of cis (left) and trans (right) reads (red) were significantly higher and lower, respectively, than the expected distribution (black); thus, this example is considered to be cis. b, Difference in mutant allele frequency (AF) between MMs across 60 oncogenes and 35 TSGs in the discovery cohort. Each dot represents an MM, coloured by cancer type. Two-sided Brunner–Munzel test. Box plots show medians (lines), interquartile ranges (IQRs; boxes) and ±1.5 × IQRs (whiskers). c, Proportion of MMs (combinations of missense mutations only) showing concordant or discordant allele frequencies in MM+ oncogenes in primary samples from the total cohort. d, Fraction of PIK3CA, EGFR, ERBB2 and PDGFRA copy-number (CN) alterations according to MM status in recurrently mutated cancer types (defined as those with 20 or more hotspot/functional mutations) in primary samples from TCGA. e, Proportion of MMs in cis and trans (with distances between mutations of 25 bp or more) by phasing from RNA-seq or WES/WGS in MM+ oncogenes with and without concurrent CN amplification of the mutated gene. f, Allelic configuration (cis versus trans) assessed by cDNA amplicon sequencing for PIK3CA P539R–H1047R, E545D–E970K and E545K–D549N mutations in BT-20, SUP-T1 and HRT-18 cell lines, respectively. Proportions of mutant and reference alleles are shown. b–f, Examined numbers are shown in parentheses. g, Density plot illustrating the distribution of read length and average read quality for each of three long-read WGS samples. h, Percentage of bases covered by at least ×2, ×5, ×10, ×20 and ×30 sequencing reads for three long-read WGS samples. i, Validation rate of SNV calling from long-read WGS according to base quality and/or flanking indels. Examined read numbers are shown in parentheses. d, e, i, Two-sided Fisher’s exact test. j, Density plot showing the correlation between variant allele frequencies in short-read and long-read WGS in positions with coverage of at least ×40 and at least ×20, in short-read and long-read WGS, respectively. Two-sided Pearson’s correlation test. k, Phasing of MMs using long-read WGS reads. Positions of MMs (red) and in-between SNVs (blue) according to their genomic position (top) and long-read WGS reads between them (bottom). Reads supporting both mutant alleles and both reference alleles present in cis are shown in orange and black, respectively; discordant reads are shown in green.