We performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes—wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-β dominant—characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers. Multiple control modalities of the intracellular and extracellular networks (transcription, microRNAs, copy number, and epigenetic processes) were involved in tumor-immune cell interactions, both across and within immune subtypes. Our immunogenomics pipeline to characterize these heterogeneous tumors and the resulting data are intended to serve as a resource for future targeted studies to further advance the field.

Through this approach, we identified and characterized six immune subtypes spanning multiple tumor types, with potential therapeutic and prognostic implications for cancer management. All data and results are provided in Supplemental Tables , at the NCI Genomic Data Commons (GDC, https://portal.gdc.cancer.gov ), and though the Cancer Research Institute iAtlas portal for interactive exploration and visualization ( http://www.cri-iatlas.org ), and are intended to serve as a resource for future studies in the field of immunogenomics.

We integrated major immunogenomics methods to characterize the immune tumor microenvironment (TME) across 33 cancers analyzed by TCGA, applying methods for the assessment of total lymphocytic infiltrate (from genomic and H&E image data), immune cell fractions from deconvolution analysis of mRNA-seq data, immune gene expression signatures, neoantigen prediction, TCR and BCR repertoire inference, viral RNA expression, and somatic DNA alterations ( Table S1 ). Transcriptional regulatory networks and extracellular communication networks that may govern the TME were found, as were possible germline determinants of TME features, and prognostic models were developed.

Contemporaneous with the work of TCGA, cancer immunotherapy has revolutionized cancer care. Antibodies against CTLA-4, PD-1, and PD-L1 are effective in treating a variety of malignancies. However, the biology of the immune microenvironment driving these responses is incompletely understood () but is critical to the design of immunotherapy treatment strategies.

The Cancer Genome Atlas (TCGA) has profoundly illuminated the genomic landscape of human malignancy. Genomic and transcriptomic data derived from bulk tumor samples have been used to study the tumor microenvironment (TME), and measures of immune infiltration define molecular subtypes of ovarian, melanoma, and pancreatic cancer () and immune gene expression in other tumors varies by molecular subtype (). Characterization of the immune microenvironment using gene expression signatures, T cell receptor (TCR), and B cell receptor (BCR) repertoire, and analyses to identify neo-antigenic immune targets provide a wealth of information in many cancer types and have prognostic value ().

In SYGNAL-PanImmune, the increased expression of biclusters enriched with IMs from KIRC, LGG, LUSC, and READ was associated with worse patient survival (CoxPH BH adjusted p value ≤ 0.05). Conversely, the increased expression of biclusters enriched with IMs from SKCM, containing CCL5, CXCL9, CXCL10, HAVCR2, PRF1, and MHC class II genes, were associated with improved patient survival (BH-adjusted p ≤ 0.05).

We identified the putative regulators of immune gene expression within immune subtypes ( Figure 7 E). In these predictions, C1-associated biclusters were regulated by ERG, KLF8, MAFB, STAT5A, and TEAD2. C1 and C2 shared regulation by BCL5B, ETV7, IRF1, IRF2, IRF4, PRDM1, and SPIB, consistent with IFN-γ signaling predominance in these subtypes. C3 was regulated by KLF15 and miR-141-3p. C6-associated biclusters were regulated by NFKB2. C1, C2, and C6 shared regulation by STAT2 and STAT4, implying shared regulation by important immune TF families, such as STAT and IRF, but also differential employment of subunits and family members by the immune milieu.

In SYGNAL-PanImmune, some regulators of IMs, but not upstream somatic mutations, were shared between tumor types, including STAT4, which regulated BTN3A1 and BTN3A2 in both LUSC and UCEC, secondary to implied causal mutations TP53 and ARHGAP35, respectively. Conversely, causal mutations shared across tumor types may associate with different tumor-specific downstream regulators. TP53 was a causal mutation in UCEC acting through IRF7 to regulate many of the same IMs as was seen in LUSC. These differences in causal relationships arise because the different cell types giving rise to each tumor type affect oncogenic paths.

Somatic alterations in AKAP9, HRAS, KRAS, and PREX2 were inferred to modulate the activity of IMs according to both the MR- and SYGNAL-PanImmune, a significant overlap (p = 1.6 × 10, Fisher’s exact test). In MR-PanImmune, MAML1 and HRAS had the highest number of statistical interactions with 26 MRs. This analysis identified complex roles for the RAS-signaling pathway ( Figure 7 D) specifically through connections to lineage factor VAV1 (implicated in multiple human cancers), potentially mediated by MAP2K1. Similarly, MAML1, hypothesized to mediate cross-talk across pathways in cancer (), was associated (p ≤ 0.05) with multiple MRs, including STAT1, STAT4, CIITA, SPI1, TNFRSF4, CD86, VAV1, IKZF1, and IL16.

Seven TFs were shared between the MR- and SYGNAL-PanImmune networks, a significant overlap (p = 4.8 × 10, Fisher’s exact test): PRDM1, SPI1, FLI1, IRF4, IRF8, STAT4, and STAT5A. Additional MRs included the hematopoietic lineage specific factor IKZF1, which may reflect variation in immune cell content, and known IMs, such as IFNG, IL16, CD86, and TNFRSF4. The regulators in SYGNAL-PanImmune were inferred to regulate a total of 27 IM genes ( Figure S7 C). The top two most commonly co-regulated IMs from SYGNAL-PanImmune, BTN3A1 and BTN3A2, are of particular interest as they modulate the activation of T cells () and have antibody-based immunotherapies ().

This resulted in two transcriptional networks. The first one, MR-PanImmune, consisted of 26 MRs that acted as hubs associated with observed gene expression and LF, connected with 15 putative upstream driver events ( Figure 7 D). The second one, SYGNAL-Panimmune, comprised 171 biclusters enriched in IMs and associated with LF.

We next used two complementary approaches, master regulators (MRs) and SYGNAL, to synthesize a pan-cancer transcriptional regulatory network describing the interactions linking genomic events to transcriptional regulators to downstream target genes, and finally to immune infiltration and patient survival. In both approaches, somatic alterations were used as anchors to infer regulatory relationships, in that they can act as a root cause of the “downstream” transcriptional changes mediated through transcription factors (TFs) and miRNAs.

The derived extracellular networks reflect the properties of immune subtypes in terms of cellular propensities and immune pathway activation noted earlier ( Figures 1 B, 1C, 2 A, and S2 A), but also place those properties in the context of possible interactions in the TME that may play a role in sculpting those same properties. The particular associations observed among IMs within distinct subtypes may be important for identifying directions for therapy.

A small sub-network ( Figure 7 A), focused around IFN-γ, illustrates some subtype-specific associations. In both C2 and C3, CD4 T cells, CD8 T cells, and NK cells correlated with expression of IFNG and CCL5, a potent chemoattractant. A second sub-network ( Figure 7 B), centered on TGF-β, was found in the C2, C3, and C6 networks. Across subtypes, different cell types were associated with abundant expression of TGFB1: CD4 T cells and mast cells in C3 and C6, macrophages in C6, neutrophils and eosinophils in C2 and C6, and B cells, NK cells, and CD8 T cells in C2 and C3. The receptors known to bind TGF-β likewise were subtype specific and may help mediate the TGF-β-driven infiltrates, with TGFBR1, 2, and 3 found only in the C3 and C6 networks. These results largely echo findings seen in our TGF-β pathway analysis ( Figure S4 C), which examined the effects of intracellular, rather than extracellular, signaling disruption on immune TME composition across immune subtypes. Finally, a third cytokine subnetwork illustrates variation in T cell ligands and receptors across immune subtypes ( Figure 7 C). CD4 and CD8 receptors fell into two groups, those found in C2, C3, and C6 networks, such as PDCD1, and those absent in C3, such as IL2RA and LAG3. Some T cell-associated ligands were subtype specific, such as CD276 (C2, C6), IL1B (C6), and VEGFB (C4).

(E) Regulators of immune subtypes from SYGNAL-PanImmune Network. Tumor types (octagons) linked through mutations (purple chevrons) to transcription factors (TFs, red triangles) and miRNAs (orange diamonds) that actively regulate the expression of IMs in biclusters associated with a single immune subtype (circles). The network describes predicted causal and mechanistic regulatory relationships linking tumor types through their somatic mutations (yellow edges) which causally modulate the activity of TFs and/or miRNAs (purple edges), which in turn regulate genes (not shown) whose expression is associated with an immune subtype (red edges). For example, RB1 mutations in LIHC (5% of patients) have significant evidence for causally modulating the activity of PRDM1 which in turn regulates genes associated (causal model at least 3 times as likely as alternative models and p value < 0.05) with C1 and C2. Interactions for this path are bolded.

(D) Master Regulator (MR)-Pan-Immune Network. The network diagram shows 26 MR “hubs” (filled orange) significantly associated with 15 upstream driver events (orange rings), along with proteins linking the two. The lineage factor VAV1 (on left) is inferred to be a MR by combining predicted protein activity with data on gene expression, protein interactions, and somatic alterations. VAV1 activity correlates with LF (degree of correlation depicted as degree of orange). Mutations in HRAS (center of network) are statistically associated with changes in LF. The HRAS and VAV1 proteins are in close proximity on a large network of known protein-protein interactions (not shown), as both can lead to activation of protein MAP2K1, (as shown connecting with dotted lines). Mutations in HRAS are associated (p < 0.05) with VAV1 activity, and their link through documented protein interactions implies that HRAS could directly modulate the activity of VAV1. In the diagram, the size of MR nodes represents their ranked activity. Smaller nodes with red borders represent mutated and/or copy-number altered genes statistically associated with one or more MR and LF, with the thickness of the border representing the number of associated MRs; small gray nodes are “linker” proteins.

(A) Immune subtype-specific extracellular communication network involving IFN-γ (IFNG, bottom of the diagram), whose expression is concordant with that of its cognate receptors IFNGR1 and IFNGR2 (bottom right and left, respectively), in C2 and C3 (yellow and green arrows, respectively; line thickness indicates strength of association). NK cells (left), which are known to secrete IFN-γ, could be producing IFN-γ in C2 and C3, as the NK cellular fraction is concordant with IFNG expression in both. CXCR3 is known to be expressed on NK cells and has concordant levels, but only in C3 (green arrow). This is a subnetwork within a larger network constructed by similarly combining annotations of known interactions between ligands, receptors, and particular immune cells types, with evidence for concordance of those components.

Beginning with a large network of extracellular interactions known from other sources, we identified which of those met a specified precondition for interaction, namely that both interaction partners are consistently present within samples in an immune subtype, according to our TME estimates. We focused the network on IMs. Networks in C2 and C3 had abundant CD8 T cells, while C3, C4, and C6 were enriched in CD4 T cells.

The immune response is determined by the collective states of intracellular molecular networks in tumor, immune, and other stromal cells and the extracellular network encompassing direct interaction among cells and communication via soluble proteins such as cytokines to mediate interactions among those cells.

Among IMs under investigation for cancer therapy, expression of VISTA is relatively high in all tumor types and highest in MESO; BTLA expression is high in C4 and C5; HAVCR2 (TIM-3) shows evidence of differential silencing among immune subtypes; and IDO1 is amplified, mostly in C1. The observed differences in regulation of IMs might have implications for therapeutic development and combination immune therapies, and the multiple mechanisms at play in evoking them further highlights their biological importance.

Copy-number alterations affected multiple IMs and varied across immune subtypes. C1 and C2 showed both frequent amplification and deletion of IM genes, consistent with their greater genomic instability, while subtypes C3 and C5 generally showed fewer alterations in IM genes. In particular, IMs SLAMF7, SELP, TNFSF4 (OX40L), IL10, and CD40 were amplified less frequently in C5 relative to all samples, while TGFB1, KIR2DL1, and KIR2DL3 deletions were enriched in C5 ( Figure 6 D), consistent with our observation of lower immune infiltration with TGFB1 deletion ( Figure S4 A). CD40 was most frequently amplified in C1 ( Figure 6 D) (Fisher’s exact p < 10for all comparisons mentioned). Overall, these marked differences in IM copy number may be reflective of more direct modulation of the TME by cancer cells.

Gene expression of IMs ( Table S6 Figure 6 A) varied across immune subtypes, and IM expression largely segregated tumors by immune subtypes ( Figure S6 A), perhaps indicative of their role in shaping the TME. Genes with the greatest differences between subtypes ( Figures 6 B and S6 B) included CXCL10 (BH-adjusted p < 10), most highly expressed in C2 (consistent with its known interferon inducibility) and EDNRB (BH-adjusted p < 10), most highly expressed in the immunologically quiet C5. DNA methylation of many IM genes, e.g., CD40 ( Figure 6 C), IL10, and IDO1, inversely correlated with gene expression, suggesting epigenetic silencing. 294 miRNAs were implicated as possible regulators of IM gene expression; among these, several associated with IMs in multiple subtypes ( Figure S6 C) including immune inhibitors (EDNRB, PD-L1, and VEGFA) and activators (CD28 and TNFRSF9). The immune activator BTN3A1 was one of the most commonly co-regulated IMs from the SYGNAL-PanImmune network (below). Negative correlations between miR-17 and BTN3A1, PDCD1LG2, and CD274 may relate to the role of this miRNA in maturation and activation of cells into effector or memory subsets ().

(A) From left to right: mRNA expression (median normalized expression levels); expression versus methylation (gene expression correlation with DNA-methylation beta-value); amplification frequency (the difference between the fraction of samples in which an IM is amplified in a particular subtype and the amplification fraction in all samples); and the deletion frequency (as amplifications) for 75 IM genes by immune subtype.

IMs are critical for cancer immunotherapy with numerous IM agonists and antagonists being evaluated in clinical oncology (). To advance this research, understanding of their expression and modes of control in different states of the TME is needed. We examined IM gene expression, SCNAs, and expression control via epigenetic and miRNA mechanisms.

Higher TCR diversity only correlated with improved PFI in a few tumor types (BLCA, COAD, LIHC, and UCEC) ( Figure S5 F). Therefore, it may be more important for the immune system to mount a robust response against only a few antigens, than a diverse response against many different antigens.

We evaluated TCR α and β and immunoglobulin heavy and light chain repertoires from RNA-seq. Mean TCR diversity values differed by immune subtype, with the highest diversity in C6 and C2 (p < 10, Wilcoxon, relative to all other subtypes; Figure 5 C) and by tumor type ( Figure S5 D, lower panel). We saw recurrent TCR sequences across multiple samples ( Figure S5 E, Table S5 ), suggesting a common, but not necessarily cancer-related, antigen (the top recurrent TCRs include known mucosal associated invariant T cell sequences). We assessed co-occurrence of complementarity determining region 3 (CDR3) α and β chains, in order to determine the frequency of patients with identical TCRs (a surrogate marker for shared T cell responses). We identified 2,812 α-β pairs present in at least 2 tumors (p ≤ 0.05, Fisher’s exact test with Bonferroni correction; Figure 5 D and Table S5 ). Likewise, testing for co-occurrence of specific SNV pMHC-CDR3 pairs across all patients identified 206 pMHC-CDR3 α pairs and 196 pMHC-CDR3 β pairs ( Figure 5 E, Table S5 ). Thus, a minority of these patients appear to share T cell responses, possibly mediated by public antigens. That said, there is relatively little pMHC and TCR sharing among tumors, highlighting the large degree of diversity in TILs.

Our findings suggest that pMHC burden and viral content impact immune cell composition, while CTAs have inconsistent effects on the immune response. Moreover, the effect of pMHC load on prognosis is disease specific and influenced by immune subtype.

We found human papilloma virus (HPV) in 6.2% of cases, mainly in CESC, GBM, HNSC, and KIRC, whereas hepatitis B virus (HBV) and Epstein-Barr virus (EBV) were mainly found in LIHC and STAD (stomach adenocarcinoma), respectively. In a regression model of all tumors, high load of each virus type associated with immune features ( Figure S5 C, cancer-type adjusted). High EBV content associated strongly with high CTLA4 and CD274 expression and low B cell signatures. High HPV levels associated with increased proliferation and Th2 cells but low macrophage content. In contrast, high HBV levels associated with Th17 signal and γδ T cell content. These findings highlight the diverse effect of different viruses on the immune response in different cancer types.

Cancer testis antigens (CTA) overall expression, and that of individual CTAs, varied by immune subtype with C5 having the highest (p < 10) and C3 the lowest (p = 10) expression values ( Figure 1 C). CEP55, TTK, and PBK were broadly expressed across immune subtypes, with enrichment in C1 and C2. C5 demonstrated high expression of multiple CTAs, illustrating that CTA expression alone is insufficient to elicit an intratumoral immune response.

Most SNV-derived peptides which bind to MHC were each found in the context of a single MHC allele (89.9%). Single mutations generate 99.8% of unique pMHCs while 0.2% result from distinct mutations in different genetic loci yielding identical peptides ( Figure 5 A). The most frequently observed pMHCs were from recurrently mutated genes (BRAF, IDH1, KRAS, and PIKC3A for SNVs, TP53 and RNF43 for indels) ( Figure 5 B, Tables S3 and S4 ). In BRCA and LIHC, worse PFI was associated with higher neoantigen load, while BLCA and UCEC showed the opposite effect ( Figure S5 A). For most tumors, however, there were no clear associations between predicted pMHC count and survival. Within immune subtypes ( Figure S5 B), higher neoantigen load was associated with improved PFI in C1 and C2 and worse PFI in C3, C4, and C5. These results suggest that neoantigen load provides more prognostic information within immune subtypes than based on tissue of origin, emphasizing the importance of overall immune signaling in responding to tumor neoantigens.

(D and E) Co-occurrence of CDR3a-CDR3b (D) and pMHC-CDR3 pairs (E) as a surrogate marker for shared T cell responses. Pairs found in at least two samples and meeting statistical significance are plotted, with jitter. x and y axes indicate how exclusive the pair members are: pairs in the top right typically co-occur, whereas along the axes each member is more often found separately. Size of the circle indicates how many samples that pair was found in.

Peptides predicted to bind with MHC proteins (pMHCs) and induce antitumor adaptive immunity were identified from SNV and indel mutations. The number of pMHCs (neoantigen load) varied between immune subtypes ( Figure 1 C), correlated positively with LF in most immune subtypes ( Figure S4 I), and trended positive in most TCGA tumor subtypes, with some negative correlation seen among GI subtypes, and differential trending seen among individual LUAD, LUSC, OV, and KIRP subtypes ( Figure S4 J). Neoantigen load also associated with higher content of CD8 T cells, M1 macrophages, and CD4 memory T cells, and lower Treg, mast, dendritic, and memory B cells in multiple tumor types ( Figure S4 K).

Immune cell content and expression of PD-L1 varied by gender and genetic ancestry ( Figures 4 E and S4 D). PD-L1 expression was greater (p < 0.05, Kruskal-Wallis test, unadjusted) in women than in men in HNSC, KIRC, LUAD, THCA, and KIRP ( Figure S4 E), while mesothelioma (MESO) showed an opposite trend. PD-L1 expression was lower in individuals of predicted African ancestry (overall p = 5 × 10). This association was consistent across most cancer types and was significant (p < 0.05, unadjusted) in BRCA, COAD, HNSC ( Figure S4 F), and THCA. No single cis-eQTL significantly correlated with PD-L1 expression, although the SNP rs822337, approximately 1 kb upstream of CD274 transcription start, correlated weakly (p = 0.074; 1.3 × 10unadjusted; Figure S4 G). Lymphocyte fractions tended to be lower in people of Asian ancestry, particularly in UCEC (uterine corpus endometrial carcinoma) and BLCA ( Figure S4 H). The significance of these demographic associations remains unclear but provides hypotheses for the efficacy of checkpoint inhibitor therapy based on genetic ancestry.

Since driver mutations in the same pathway had opposing correlations with LF (e.g., BRAF, KRAS, NRAS), we considered the overall effect of somatic alterations (mutations and SCNAs) on eight oncogenic signaling pathways. PI3K, NOTCH, and RTK/RAS pathway disruptions showed variable, tumor type-specific effects on immune factors, while TGF-β pathway disruptions more consistently associated with lower LF (most prominently in C2 and C6; Figure S4 C), higher eosinophils (C2), and increased macrophages. However, in C3, TGF-β pathway disruption associated with higher LF and M1 macrophages and lower memory B cells, helper T cells, and M0 macrophages. Thus, TGF-β pathway disruption has context-dependent effects on LF but may promote increased macrophages, particularly M1. Higher M1/M2 ratio, in turn, may reiterate the local pro-inflammatory state in these patients.

We correlated mutations in 299 cancer driver genes with immune subtypes and found 33 significant associations (q < 0.1) ( Figure 4 C, Table S2 ). C1 was enriched in mutations in driver genes, such as TP53, PIK3CA, PTEN, or KRAS. C2 was enriched in many of these genes, as well as HLA-A and B and CASP8, which could be immune-evading mechanisms (). C3 was enriched in BRAF, CDH1, and PBRM1 mutations, a finding of note since patients with PBRM1 mutations respond particularly well to IM therapy (). C4 was enriched in CTNNB1, EGFR, and IDH1 mutations. C5 was enriched in IDH1, ATRX, and CIC, consistent with its predominance of LGG samples. C6 was only enriched in KRAS G12 mutations. Mutations in 23 driver genes associated with increased LF either in specific tumor types or across them, including TP53, HLA-B, BRAF, PTEN, NF1, APC, and CASP8. Twelve other events were associated with lower LF, including the IDH1 R132H mutation, GATA3, KRAS, NRAS, CTNNB1, and NOTCH1 ( Figure 4 D).

Increased ITH associates with worse clinical outcomes or lower efficacy of IM therapy in a number of cancer types (). ITH correlated (Spearman, Benjamini-Hochberg [BH]-adjusted p < 0.05) with total LF in nine tumor types (LUAD, BRCA, KIRC, HNSC, GBM [glioblastoma multiforme], OV, BLCA, SKCM, and READ; data not shown) and with individual relative immune cell fractions in many tumor types ( Figure S4 B). ITH was highest in C1 and C2 (p < 10relative to all others) and lowest in C3 (p = 3 × 10 Figure 1 C), possibly supporting the link between lower ITH and improved survival.

Specific SCNAs affected LF and immune composition ( Figures 4 B and S4 A). Chromosome 1p (including TNFRS9 and VTCN1) amplification associated with higher LF, while its deletion did the opposite. 19q deletion (including TGFB1) also correlated with lower LF, consistent with the role of TGF-β in immune cell recruitment (). Amplification of chr2, 20q, and 22q (including CTLA4, CD40, and ADORA2, respectively), and deletions of 5q, 9p, and chr19 (including IL13 and IL4, IFNA1 and IFNA2, and ICAM1, respectively) associated with changes in macrophage polarity ( Figure S4 A). IL-13 influences macrophage polarization (), implying a possible basis for our observation that IL-13 deletions associated with altered M0 macrophage fractions.

The immune infiltrate was related to measures of DNA damage, including copy number variation (CNV) burden (both in terms of number of segments and fraction of genome alterations), aneuploidy, loss of heterozygosity (LOH), homologous recombination deficiency (HRD), and intratumor heterogeneity (ITH) ( Figure 4 A). LF correlated negatively with CNV segment burden, with strongest correlation in C6 and C2, and positively with aneuploidy, LOH, HRD, and mutation load, particularly in C3. These results suggest a differential effect of multiple smaller, focal copy number events versus larger events on immune infiltration in certain immune subtypes.

(E) Left: Degree of association between gender for eight selected immune characteristics (rows) within TCGA tumor types (columns). Blue denotes a higher value in women than in men, and red the opposite. Right: Degree of association between the immune characteristics and the first principal component of genetic ancestry in TCGA participants (PC1), reflecting degree of African ancestry. Blue reflects lower values in individuals of African descent.

(D) Volcano plot showing driver genes and OMs associated with changes in LF, across all tumors (“Pancan”) and within specific tumor types as indicated. x axis: Multivariate correlation with LF (B-factor), taking into account tumor type and number of missense mutations. Values > 0 represent positive correlation with LF and vice versa; y axis: −log10(p). Significant events (FDR < 0.1; p < 0.003) are in orange, others in gray.

(B) LF association with copy number (CN) alterations. Left: Differences between observed and expected mean LF in tumors with amplifications, by genomic region. Significant (FDR < 0.01) differences in mean LF are marked with black caps on the profiles. Right: Same, for deletions.

We obtained and validated a survival model using elastic-net Cox proportional hazards (CoxPH) modeling with cross-validation. Low- and high-score tumors displayed significant survival differences in the validation set ( Figure 3 D), with good prediction accuracy ( Figure 3 E). Incorporating immune features into Cox models fit with tumor type, stage, and tumor type + stage ( Figure 3 F) improved predictive accuracy, highlighting the importance of the immune TME in determining survival. Lymphocyte expression signature, high number of unique TCR clonotypes, cytokines made by activated Th1 and Th17 cells, and M1 macrophages most strongly associated with improved OS ( Figure S3 E), while wound healing, macrophage regulation, and TGF-β associated with worse OS, recapitulating survival associations in immune subtypes. Within tumor types, the prognostic implications of immune subtypes seen in univariate analyses were largely maintained, with C3 correlating with better OS in six tumor types and C4 with poor OS in three cancer types ( Figure S3 F).

Immune subtypes associated with overall survival (OS) and progression-free interval (PFI) ( Figures 3 A and S3 A). C3 had the best prognosis (OS HR 0.628, p = 2.34 × 10relative to C1, adjusted for tumor type), while C2 and C1 had less favorable outcomes despite having a substantial immune component. The more mixed-signature subtypes, C4 and C6, had the least favorable outcome. Functional orientation of the TME for tumor and immune subtypes was measured using the concordance index (CI) () and found to have context-dependent prognostic impact ( Figures 3 B, 3C and S3 B). Higher lymphocyte signature associated with improved outcome in C1 and C2. An increased value of any of the five signatures led to worse outcome in C3 ( Figure 3 B), perhaps reflecting a balanced immune response. While increased Th17 cells generally led to improved OS, Th1 associated with worse OS across most immune subtypes, and Th2 orientation had mixed effects ( Figure 3 C). Tumor types displayed two behaviors relative to immune orientation ( Figures 3 B, OS; S3 B, PFI). In the first group including SKCM and CESC, activation of immune pathways was generally associated with better outcome, while in the other, the opposite was seen. The relative abundance of individual immune cell types had complex associations that differed between tumor types ( Figures S3 C and S3D). These analyses extend beyond mere determination of lymphocyte presence to suggest testable properties that correlate with patient outcome in different tumor types and immune contexts.

(E) Prediction versus outcome from elastic net model in validation set data (from D). Top: Patient outcomes for each sample (black, survival; red, death) plotted with vertical jitter, along the sample’s model prediction (x axis). Middle: Fractional density of the outcomes plotted against their model predictions. Confidence intervals were generated by bootstrapping with replacement. Bottom: LOESS fit of the actual outcomes against the model predictions; narrow confidence bands confirm good prediction accuracy.

(D) Risk stratification from elastic net modeling of immune features. Tumor samples were divided into discovery and validation sets, and an elastic net model was optimized on the discovery set using immune gene signatures, TCR/BCR richness, and neoantigen counts. Kaplan-Meier plot shows the high (red) and low (blue) risk groups from this model as applied to the validation set, p < 0.0001 (G-rho family of tests, Harrington and Fleming).

The spatial fraction of tumor regions with tumor-infiltrating lymphocytes (TILs), estimated by analysis of digitized TCGA H&E-stained slides (), varied by immune subtype, with C2 the highest (p < 10 Figure 2 D). Image estimates correlated modestly with molecular estimates of lymphocyte proportion ( Figures S2 C and S2D), in part because the molecular estimate is more similar to cell count, while spatial TIL is a fraction of the area. The relative similarity of the estimates of lymphocytic content between two radically different methodologies reinforces the robustness of individual methods.

The leukocyte proportion of tumor stromal fraction, ρ, varied across tumor types and immune subtypes ( Figures 2 C and S2 B), ranging from >90% in SKCM to <10% in stroma-rich tumors such as PAAD, PRAD, and LGG. Some tumors, e.g., BRCA, showed variation within annotated or immune subtypes. In BRCA, C1 has the lowest ρ (ρ= 0.44) while ρ= 0.61 was 37% higher (p < 0.001) ( Figure S2 B), and there were likewise differences between luminal A and basal BRCA (ρ= 0.45 and ρ= 0.67 [p < 0.001]). For LGG, ρ= 0.28 (p < 0.001), whereas ρ= 0.48 and ρ= 0.50 (p < 0.001) ( Figure S2 B), and in READ, ρ= 0.40 and ρ= 0.78 (p < 0.001).

Leukocyte fraction (LF) varied substantially across immune subtypes ( Figure 1 C) and tumor types ( Figure 2 B). Tumors within the top third LF included cancers most responsive to immune checkpoint inhibitors, such as lung adenocarcinoma (LUAD), LUSC, cutaneous melanoma (SKCM), HNSC, and kidney renal clear cell carcinoma (KIRC), and in particular, the LUSC.secretory, LUAD.6, bladder urothelial carcinoma (BLCA.4), kidney renal papillary cell carcinoma (KIRP.C2a), and HNSCC mesenchymal subtypes. Uveal melanoma (UVM) and ACC had very low LF. Glioma subtypes displayed a greater range in LF than other tumors, which may reflect the presence or absence of microglia.

These six categories represent features of the TME that largely cut across traditional cancer classifications to create groupings and suggest certain treatment approaches may be independent of histologic type. For a complete list of the TCGA cancer type abbreviations, please see https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations

Finally, C6 (TGF-β dominant), which was a small group of mixed tumors not dominant in any one TCGA subtype, displayed the highest TGF-β signature (p < 10 −34 ) and a high lymphocytic infiltrate with an even distribution of type I and type II T cells.

C5 (immunologically quiet), consisted mostly of brain lower-grade gliomas (LGG) ( Figures 1 D and S1 B), exhibited the lowest lymphocyte (p < 10) and highest macrophage (p < 10) responses ( Figure 2 A), dominated by M2 macrophages ( Figure S2 A). Glioma subtypes () CpG island methylator phenotype-high (CIMP-H), the 1p/19q codeletion subtype and pilocytic astrocytoma-like (PA-like) were prevalent in C5, with remaining subtypes enriched in C4. IDH mutations were enriched in C5 over C4 (80% of IDH mutations, p < 2 × 10, Fisher’s exact test), suggesting an association of IDH mutations with favorable immune composition. Indeed, IDH mutations associate with TME composition () and decrease leukocyte chemotaxis, leading to fewer tumor-associated immune cells and better outcome ().

C3 (inflammatory) was defined by elevated Th17 and Th1 genes ( Figure 1 C, both p < 10), low to moderate tumor cell proliferation, and, along with C5, lower levels of aneuploidy and overall somatic copy number alterations than the other subtypes. C3 was enriched in most kidney, prostate adenocarcinoma (PRAD), pancreatic adenocarcinoma (PAAD), and papillary thyroid carcinomas (THCA).

C2 (IFN-γ dominant) had the highest M1/M2 macrophage polarization ( Figure S2 A, mean ratio = 0.52, p < 10, Wilcoxon test relative to next-highest), a strong CD8 signal and, together with C6, the greatest TCR diversity. C2 also showed a high proliferation rate, which may override an evolving type I immune response, and was comprised of highly mutated BRCA, gastric, ovarian (OV), HNSC, and cervical tumors (CESC).

C1 (wound healing) had elevated expression of angiogenic genes, a high proliferation rate ( Figure 1 C), and a Th2 cell bias to the adaptive immune infiltrate. Colorectal cancer (COAD [colon adenocarcinoma], READ [rectum adenocarcinoma]) and lung squamous cell carcinoma (LUSC) were rich in C1, as were breast invasive carcinoma (BRCA) luminal A ( Figures S1 C and S1D), head and neck squamous cell carcinoma (HNSC) classical, and the chromosomally unstable (CIN) gastrointestinal subtype.

To characterize intratumoral immune states, we scored 160 immune expression signatures and used cluster analysis to identify modules of immune signature sets ( Figure 1 A, top). Five immune expression signatures—macrophages/monocytes (), overall lymphocyte infiltration (dominated by T and B cells) (), TGF-β response (), IFN-γ response (), and wound healing ()—which robustly reproduced co-clustering of these immune signature sets, were selected to perform cluster analysis of all 30 non-hematologic cancer types ( Figures 1 A middle, and S1 A). The six resulting clusters “Immune Subtypes,” C1–C6 (with 2,416, 2,591, 2,397, 1,157, 385, and 180 cases, respectively) were characterized by a distinct distribution of scores over the five representative signatures ( Figure 1 A, bottom) and showed distinct immune signatures based on the dominant sample characteristics of their tumor samples ( Figures 1 B and 1C). Immune subtypes spanned anatomical location and tumor type, while individual tumor types and TCGA subtypes ( Figures 1 D and S1 B–S1D) varied substantially in their proportion of immune subtypes.

(A) Expression signature modules and identification of immune subtypes. Top: Consensus clustering of the pairwise correlation of cancer immune gene expression signature scores (rows and columns). Five modules of shared associations are indicated by boxes. Middle: Representative gene expression signatures from each module (columns), which robustly reproduced module clustering, were used to cluster TGCA tumor samples (rows), resulting in six immune subtypes C1–C6 (colored circles). Bottom: Distributions of signature scores within the six subtypes (rows), with dashed line indicating the median.

Discussion

We report an extensive evaluation of immunogenomic features in more than 10,000 tumors from 33 cancer types. Data and results are available as Supplemental Tables , at NCI GDC, and interactively at the CRI iAtlas portal, which is being configured to accept new immunogenomics datasets and feature calculations as they come available, including those derived from immunotherapy clinical trials, to develop as a “living resource” for the immunogenomics community. Meta-analysis of consensus expression clustering revealed immune subtypes spanning multiple tumor types and characterized by a dominance of either macrophage or lymphocyte signatures, T-helper phenotype, extent of intratumoral heterogeneity, and proliferative activity. All tumor samples were assessed for immune content by multiple methods. These include the estimation of immune cell fractions from deconvolution of gene expression and DNA methylation data, prediction of neoantigen-MHC pairs from mutations and HLA-typing, and evaluation of BCR and TCR repertoire from RNA-sequencing data. Immune content was compared among immune and cancer subtypes, and somatic alterations were identified that correlate with changes in the TME. Finally, predictions were made of regulatory networks that could influence the TME, and intracellular communication networks in the TME, based on integrating known interactions and observed associations. Immunogenomic features were predictive of outcome, with OS and PFI differing between immune subtypes both within and across cancer types.

Galon et al., 2013 Galon J.

Angell H.K.

Bedognetti D.

Marincola F.M. The continuum of cancer immunosurveillance: prognostic, predictive, and mechanistic signatures. Punt et al., 2015 Punt S.

Langenhoff J.M.

Putter H.

Fleuren G.J.

Gorter A.

Jordanova E.S. The correlations between IL-17 vs. Th17 cells and cancer patient survival: a systematic review. C4 and C6 subtypes conferred the worst prognosis on their constituent tumors and displayed composite signatures reflecting a macrophage dominated, low lymphocytic infiltrate, with high M2 macrophage content, consistent with an immunosuppressed TME for which a poor outcome would be expected. In contrast, tumors included in the two subtypes displaying a type I immune response, C2 and C3, had the most favorable prognosis, consistent with studies suggesting a dominant type I immune response is needed for cancer control (). In addition, C3 demonstrated the most pronounced Th17 signature, in agreement with a recent systematic review suggesting that Th17 expression is generally associated with improved cancer survival (). C2 was IFN-γ dominant and showed a less favorable survival despite having the highest lymphocytic infiltrate, a CD8 T cell-associated signature, and highest M1 content, suggesting a robust anti-tumor immune response. One explanation for this discrepancy is the aggressiveness of both the tumor types and specific cases within C2 relative to C3. C2 showed the highest proliferation signature and ITH while C3 was the lowest in both those categories. It may be that the immune response simply could not control the rapid growth of tumors comprising C2. A second hypothesis is that tumors in C2 are those that have already been remodeled by the existing robust type I infiltrate and have escaped immune recognition. While signatures biased toward interferon-mediated viral sensing and antigen presentation genes were often associated with higher survival, interferon signatures without increased antigen presentation showed an opposite association. Loss of genes associated with antigen processing and presentation is often found in tumors that have been immune edited. In contrast to the potential immune editing of C2, C3 may represent immunologic control of disease, that is, immune equilibrium.

Ilieva et al., 2014 Ilieva K.M.

Correa I.

Josephs D.H.

Karagiannis P.

Egbuniwe I.U.

Cafferkey M.J.

Spicer J.F.

Harries M.

Nestle F.O.

Lacy K.E.

et al. Effects of BRAF mutations and BRAF inhibition on immune responses to melanoma. Amankulor et al., 2017 Amankulor N.M.

Kim Y.

Arora S.

Kargl J.

Szulzewsky F.

Hanke M.

Margineantu D.H.

Rao A.

Bolouri H.

Delrow J.

et al. Mutant IDH1 regulates the tumor-associated immune system in gliomas. Possible impact of somatic alterations on immune response was seen. For example, KRAS mutations were enriched in C1 and but infrequent in C5, suggesting that mutations in driver oncogenes alter pathways that affect immune cells. Driver mutations such as TP53, by inducing genomic instability, may alter the immune landscape via the generation of neoantigens. Our findings confirmed previous work showing that mutations in BRAF () enhance the immune infiltrate while those in IDH1 diminish it (). Further work is needed to determine the functional aspects of these associations.

Brown et al., 2014 Brown S.D.

Warren R.L.

Gibb E.A.

Martin S.D.

Spinelli J.J.

Nelson B.H.

Holt R.A. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Tumor-specific neoantigens are thought to be key targets of anti-tumor immunity and are associated with improved OS and response to immune checkpoint inhibition in multiple tumor types (). We found OS correlated with pMHC number in only a limited number of tumors, with no clear association in most tumors, including several responsive to immune checkpoint inhibitor therapy. There are some caveats to this finding. The current predictors are highly sensitive but poorly specific for neoantigen identification, and our approach did not include neoantigens from introns or spliced variants. Moreover, it is not possible to fully determine the ability to process and present an epitope or the specific T cell repertoire in each tumor, which impacts the ability to generate a neoantigen response. It is also possible that the role of neoantigens may vary with tumor type, as supported by our per-tumor results.

Integrative methods predicted tumor-intrinsic and tumor-extrinsic regulation in, of, and by the TME and yielded information on specific modes of intracellular and extracellular control, the latter reflecting the network of cellular communication among immune cells in the TME. The resulting network was rich in structure, with mast cells, neutrophils, CD4 T cells, NK cells, B cells, eosinophils, macrophages, and CD8 T cells figuring prominently. The cellular communication network highlighted the role of key receptor and ligands such as TGFB1, CXCL10, and CXCR3 and receptor-ligand pairs, such as the CCL5-CCR5 axis, and illustrated how immune cell interactions may differ depending on the immune system context, manifested in the immune subtype.

Predicted intracellular networks implied that seven immune-related TFs (including interferon and STAT-family transcription factors) may play an active role in transcriptional events related to leukocyte infiltration, and that mutations in six genes (including Ras-family proteins) may influence immune infiltration. Across tumor types, the TFs and miRNAs regulating the expression of IMs tended to be shared, while somatic mutations modulating those regulatory factors tended to differ. This suggests that therapies targeting regulatory factors upstream of IMs should be considered and that they may have a broader impact across tumor types than therapies focusing on somatic mutations. Of note, in these approaches, it is not always possible to fully ascertain whether some particular interaction acts in the tumor, immune, or stromal cell compartments, but this could be improved on by incorporating additional cell-type-specific knowledge. Shared elements of intra- and extracellular network models should also be explored, with particular regard to the IMs and cytokines in both.

There are important caveats to using TCGA data. First, survival event rates and follow-up durations differ across the tumor types. Second, for most tumor types, samples with less than 60% tumor cell nuclei by pathologist review were excluded from study, thus potentially removing the most immune-infiltrated tumors from analysis. The degree to which this biases the results, relative to the general population of cancer patients, is difficult to ascertain. Our analyses were also limited by restriction to data from genome-wide molecular assays, in the absence of targeted classical cellular immunology assays for confirming cell phenotype distribution, as those types of data have not been collected from TCGA patients.

In summary, six stable and reproducible immune subtypes were found to encompass nearly all human malignancies. These subtypes were associated with prognosis, genetic, and immune modulatory alterations that may shape the specific types of immune environments we have observed. With our increasing understanding that the tumor immune environment plays an important role in prognosis as well as response to therapy, the definition of the immune subtype of a tumor may play a critical role in the predicting disease outcome as opposed to relying solely on features specific to individual cancer types.