a, Pan-cancer normalized abundances of Fusobacterium with a one-way ANOVA (Kruskal–Wallis) test for microbial abundances across types of cancer for each sample type. Sample sizes are inset in blue and box plots show median (line), 25th and 75th percentiles (box), and 1.5 × IQR (whiskers); TCGA study names are listed below. b, SourceTracker2 results for faecal contribution, as based on HMP2 data, for TCGA-COAD solid-tissue normal samples (n = 70) and TCGA-SKCM primary tumour samples (n = 122). Only one solid tissue normal sample was available for TCGA-SKCM (Supplementary Table 4), so primary tumours were used instead as the best proxy of expected skin flora. It is expected that colon samples should have higher faecal contribution than skin, so a one-sided Mann–Whitney U-test was used. As SourceTracker2 outputs the mean fractional contributions of each source (that is, HMP2) to each sink (that is, COAD, SKCM samples), the centre value of each bar plot is the mean of these values and the error bars denote the s.e.m. The sample sizes are shown below in blue. c, Pan-cancer normalized abundances of Alphapapillomavirus with a one-way ANOVA (Kruskal–Wallis) test for microbial abundances across types of cancer for each sample type. Sample sizes are inset in blue, and box plots show median (line), 25th and 75th percentiles (box), and 1.5 × IQR (whiskers); TCGA study names are listed below. TCGA studies that clinically tested patients for HPV infection are divided into negative and positive groups. d, Screenshot of interactive website showing plotting of Alphapapillomavirus normalized microbial abundances using Kraken-derived data. Plotting using SHOGUN-derived normalized microbial abundances is available on another tab of the website (left-hand side). e, Screenshot of interactive website of ML model inspection. Selecting the data type (for example, all likely contaminants removed), cancer type (for example, invasive breast carcinoma), and comparison of interest (for example, tumour versus normal) will automatically update the ROC and PR curves, as well as the confusion matrix (using a probability cutoff threshold of 50%) and the ranked model feature list. Website is accessible at http://cancermicrobiome.ucsd.edu/CancerMicrobiome_DataBrowser. Source data