We analyzed molecular data on 2,579 tumors from The Cancer Genome Atlas (TCGA) of four gynecological types plus breast. Our aims were to identify shared and unique molecular features, clinically significant subtypes, and potential therapeutic targets. We found 61 somatic copy-number alterations (SCNAs) and 46 significantly mutated genes (SMGs). Eleven SCNAs and 11 SMGs had not been identified in previous TCGA studies of the individual tumor types. We found functionally significant estrogen receptor-regulated long non-coding RNAs (lncRNAs) and gene/lncRNA interaction networks. Pathway analysis identified subtypes with high leukocyte infiltration, raising potential implications for immunotherapy. Using 16 key molecular features, we identified five prognostic subtypes and developed a decision tree that classified patients into the subtypes based on just six features that are assessable in clinical laboratories.

Gynecologic and breast (Pan-Gyn) cancers have a projected incidence of more than 350,000 cases in the United States in 2017, with much larger numbers worldwide. Despite recent clinical advances, more comprehensive information on molecular characteristics of the tumors is a priority. As part of The Cancer Genome Atlas (TCGA) Pan-Cancer Atlas project, we present here an integrated analysis of 2,579 patients' Pan-Gyn cancers at the DNA, RNA, protein, histopathological, and clinical levels. We highlight shared characteristics and unique molecular features of the tumors, identifying clinically significant subtypes and suggesting potential therapeutic targets. Finally, we present a practical decision tree with only six laboratory-assessable molecular features, which classifies patient samples into one of five prognostic molecular subtypes.

Taken together, the Pan-Gyn cohort reflects a projected incidence of more than 350,000 cases in the United States in 2017 (), with many more worldwide. Many of the commonalities and differences among cancer types and subtypes presented here were not identified in the individual TCGA disease-type projects ().

The study focuses on the following five TCGA tumor types: high-grade serous ovarian cystadenocarcinoma (OV), uterine corpus endometrial carcinoma (UCEC), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), uterine carcinosarcoma (UCS), and invasive breast carcinoma (BRCA). Although each Pan-Gyn organ site is subject to a variety of uncommon histologic cancer subtypes not studied by TCGA, the most frequent and/or aggressive tumors are represented. Despite impressive recent advances in diagnosis and management, these tumors share unmet needs for effective treatment. The analyses here can provide background biological information and prompt hypotheses about therapeutic choices or provide evidence for pre-existing hypotheses.

Gynecologic cancers share a variety of characteristics: they arise from similar embryonic origins in the Müllerian ducts, their development is influenced by female hormones, and they are managed by a particular medical specialty, gynecologic oncology, as reflected in the departmental organizations of academic medical centers (). Recently, similarities at the molecular level have been identified across gynecologic and breast cancers in a comprehensive analysis of all 33 TCGA tumor types (). Despite the commonalities, however, the various gynecologic cancer types do differ from each other in a variety of intriguing and important ways. The principal aims of the present study are to highlight both similarities and differences among types and subtypes of gynecologic cancers, in addition to the ways in which they differ from non-gynecologic cancers. Because breast tumors share most of the generic characteristics listed above, we have chosen to include them in the analysis.

We repeated the same type of survival analysis for the clusters predicted by the decision tree as we did for the original clusters ( Figure 7 E). Log rank test p values for the tumor type-unadjusted and -adjusted methods were both highly significant (p < 0.0001), showing that the decision tree-based clusters retained prognostic value despite not having 100% accuracy. These survival rates were comparable with the original clusters, with a 5-year survival rate ranging from 85% (C1) to 39% (C4), and a 10-year survival rate ranging from 67% (C1) to 14% (C4).

Finally, we used dichotomous decision tree methodology () to reduce the number of assessed molecular variables needed to classify patients into 1 of the 5 subtypes. The resulting tree required specification of only 6 of the original 16 features ( Figure 7 D). The tree had an accuracy of 82% predicting the original 16-feature-based clusters, with a receiver-operator characteristic area under the curve of 0.94.

We then performed overall survival analysis on the five clusters and obtained very significant survival differences among them (p < 0.0001, log rank test) ( Figure 7 C). The 5-year survival rate ranged from 83% (C1) to 44% (C4), and the 10-year survival rate ranged from 64% (C2) to 20% (C4). We assessed the statistical significance of the added prognostic value of the 16-feature clusters after accounting for tumor type differences to control for effects that may be due to individual tumor type contributions; the resulting p value was still significant (p = 0.0006, log rank test).

We present molecular subtypes that illuminate commonalities and distinguishing features across the Pan-Gyn tumor types, with the potential to inform future cross-tumor-type therapies. We first identified 16 features (listed in the STAR Methods ) across 1,956 samples that were either (1) currently used in the clinic for at least 1 of the 5 tumor types, or (2) identified as informative in previous TCGA gynecologic and breast cancer studies. Next, we clustered the feature matrix and obtained 5 clusters ( Figure 7 A; Tables S4 and S5 ). SCNA load was the predominant feature and produced the first division. In the low-SCNA-load group, we found two clusters, non-hypermutator (C1) and hypermutator (C2). The non-hypermutator cluster had virtually no hypermutators but had high levels of ER, PR, and/or ARsamples, indicating potential susceptibility to hormone therapies. C2, the hypermutator cluster, could be further subdivided into four subclusters (clusters C2A-C2D). C2A was enriched with POLE mutations, which have previously been associated with “ultramutators” and their extremely high mutation rates (>100 mutations/mbp) (). C2B showed enrichment with MSI-high samples and C2C showed high immune-infiltration levels. C2D was depleted of hypermutators and showed enrichment with high immune-infiltration and HPV-positive samples. The high-SCNA-load group consisted of three clusters: immune high (C3), AR or PR low (C4), and AR or PR high (C5). The immune high cluster showed low levels of hormone receptors and enrichment with HPV-positive samples. Interestingly, samples with ERBB2 amplification fell into two main clusters; those in clusters C3 (n = 39) and C4 (n = 30) showed high and low immune infiltration levels, respectively (purple and black rectangles in Figure 7 A). C3 displayed a tendency toward better survival than C4 (hazard ratio = 2.8), with a p value that trended toward significance (p = 0.087) ( Figure S6 B). C4 showed low levels of AR and PR and had a subcluster with BRCA1 or BRCA2 somatic mutations. C5 had high levels of at least one of the three hormone receptors, again suggesting sensitivity to hormone therapies. Each cluster had varying levels of representation of samples from each disease, mitigating tissue specificity ( Figure 7 B).

(E) Kaplan-Meier curves showing differences in overall survival among the five decision tree-based predicted clusters (with 5- and 10-year survival rates shown). Log rank test p < 0.0001, before (log rank test) and after (chi-square test) adjusting for tumor type differences in overall survival rates. See also Figure S6 Tables S4 and S5

(C) Kaplan-Meier curves showing differences in overall survival among the five clusters (with 5- and 10-year survival rates shown). Before adjusting for tumor type differences in overall survival rates, the log rank test p < 0.0001, and after adjusting for tumor type differences, p = 0.0006 (chi-square test).

We used cluster assignments from the six major TCGA platforms (mutations, SCNA, DNA methylation, mRNA, miRNA, and protein) to perform integrated clustering across the Pan-Gyn cohort using the CoCA algorithm ( Figure S6 A). The resulting CoCA clusters were heavily dominated by tumor type because the intrinsic gene expression patterns were lineage dependent. The association with tumor type was especially prominent in the DNA methylation, mRNA, miRNA, and protein clusters. Therefore, we turned to an alternative method (described next) to define subtypes that would span the Pan-Gyn tumor types and emphasize high-level similarities among them.

All PARADIGM clusters had distinct patterns of high or low immune-related signaling, assessed by inferred activation ( Figure 6 A) and pathway enrichment ( Figure 6 B), suggesting an important role for immune response in subsets of Pan-Gyn cancers. Interestingly, the two basal-like BRCA subtypes differed between inferred activation of immune-related signaling pathways. Enrichment with adhesion-related proteins, such as the integrins, matrix metalloproteinases, and syndecans, were also distinguished between the two basal-like subtypes, suggesting distinctive tumor microenvironments. As with basal-like BRCA, UCEC split into two clusters (C2 and C3) that did not correspond to obvious variations in UCEC histology. These clusters were mainly differentiated by proliferation, Notch signaling, and immune activity levels.

(B) Constituent pathways with differential single-sample gene set enrichment analysis (ssGSEA) scores across PARADIGM clusters. A comparison of ssGSEA scores of constituent pathways integrated by the PARADIGM algorithm identified 263 differentially enriched pathways across clusters. Samples are arranged in the same order as in (A), and differentially expressed pathways are arranged based on unsupervised clustering of their ssGSEA scores. Dominant themes within subgroupings of differential pathways across PARADIGM clusters are labeled. Examples of immune-related pathways include interleukin-12 (IL-12), IL-23, IL-27, IFNG, STAT, and T cell receptor signaling pathways. Proliferation and DNA damage repair-related pathways include FOXM1, PLK2, cyclins, MYC, E2F, ATM, ATR, BARD1, and Fanconi anemia pathways. See also Tables S4 and S5

We performed PARADIGM pathway analysis () followed by unsupervised consensus clustering of pathway scores that clustered samples primarily by tissue type, with a few notable exceptions ( Figures 6 A and 6B ; Tables S4 and S5 ). A subset of basal-like BRCA cancers co-clustered with a subset of UCEC and UCS in C2, whereas the remaining basal-like BRCA samples clustered with non-basal BRCA in C4. Contrary to transcriptomic analysis, pathway analysis clustered approximately half of the basal-like BRCA cancer samples together with the HER2and luminal B samples.

(D) Gene/lncRNA interaction networks in the overall Pan-Gyn lncRNA cohort and each of the four individual disease types. The nodes represent genes (green) or lncRNAs (burgundy), whereas each edge represents statistically significant Pearson's correlation coefficient between the connected nodes. See also Figures S4 and S5 Tables S4 and S5

(C) ERs modulate the TERC-DKC1 complex and its transcriptional activity. Estradiol (E2)-activated ERs bind to cis-regulatory DNA regions of both DKC1 and TERC and regulate their activity. Further, ERs bind to regulatory regions of DKC1-regulated lncRNAs (listed on the right) and modulate their expression.

Previous studies have suggested that estrogen receptors (ERs) regulate BRCA1 expression, dyskerin (DKC1) expression (a binding partner of the lncRNA TERC), and the lncRNA TUG1 ( Figure 5 B) (). ERs bind to regulatory regions of DKC1, either to induce or to repress multiple lncRNAs ( Figure 5 C). In the present study, our analysis has revealed significant Pearson's correlation (t test p < 0.05) between key lncRNAs and their regulator genes' transcripts, ESR1, OIP5, and DKC1, in a context-specific manner ( Figure 5 D). Using gene set enrichment analysis, we found 12.04% of the 1,537 gene ontology gene sets to be significantly enriched (FDR < 0.05) with TERC-correlated genes across all four cancer types ( Figure S5 ). Included were gene sets associated with TERT and telomere maintenance and packaging as well as gene sets linked to MYC. The latter result supports earlier findings of TERC binding peaks in the MYC promoter region ().

We processed raw RNA sequencing data to extract 1,986 lncRNAs that were predicted to regulate the 216 cancer-related proteins profiled by TCGA across 4 of the 5 tumor types (UCS did not have sufficient samples for the lncRNA extraction). An unsupervised consensus clustering of the data revealed six clusters (L1 to L6) that coincided significantly with protein-based clusters (C1 to C5) (p < 0.05, Fisher's exact test) ( Figures 5 A and S4 A; Tables S4 and S5 ). BRCA and CESC had very similar lncRNA profiles and grouped together in clusters L2 and L3. UCEC (in L5) and OV (in L6) each had very distinct lncRNA profiles from those of BRCA and CESC. Portions of the OV (31%) and UCEC (11%) samples were both present in cluster L4.

Unsupervised hierarchical clustering of the 293 most variable miRNAs in 2,417 samples grouped samples largely by disease type ( Figures S4 B–S4D; Tables S4 and S5 ). The miRNA profile for OV, however, was especially distinct from other Pan-Gyn tumor types. Basal-like BRCA samples were more similar to CESC (C6), and UCEC and UCS samples (C4 and C5), than they were to the non-basal BRCA subtypes in C2 and C3.

Unsupervised hierarchical clustering of protein expression data for 1,967 samples across 216 proteins identified 5 clusters ( Figure S4 A and Tables S4 and S5 ). C1 principally consisted of non-basal BRCA, C3 was enriched with endometrioid UCEC, and C4 was enriched with OV. Interestingly, C2 and C5 contained a mixture of samples across multiple disease types. C2 had high levels of caveolin1, MYH11, and HSP70 proteins, which have previously been identified as biomarkers for the reactive subtype found in luminal BRCA (). In addition to luminal BRCA samples, C2 included some basal-like BRCA, CESC, OV, and UCEC samples (but not UCS). Cluster C5 contained most of the basal-like BRCA, squamous CESC, serous UCEC, UCS, and 10% of the serous OV samples. It had a low hormone receptor pathway score () and high levels of cell-cycle and DNA damage-response activity, features that could indicate sensitivity to drugs that target DNA damage repair.

We investigated which genes were differentially expressed among the clusters ( Figure 4 D). ESR1 and AR were significantly higher in C1 and C2 than in others, whereas C3 had high expression of SOX2. C3 consisted of cervical cancer samples with squamous histology, characterized by 3q26 amplification (the SOX2 gene loci). C7 had significantly lower expression of the classical epithelial marker CDH1, which is consistent with an EMT signature.

(D) Differential expression of ESR1, AR, SOX2, and CDH1 in different clusters (Kruskal-Wallis test p < 0.0001 for all four genes). The bars represent mean expression of the gene (logscale) in each cluster, together with the upper or lower 95% confidence interval (whiskers above or below the bars, respectively). See also Tables S4 and S5

Unsupervised hierarchical clustering of 1,860 previously defined cancer genes () in 2,296 Pan-Gyn samples resulted in the identification of nine mRNA clusters with distinct clinicopathologic characteristics ( Figure 4 A; Tables S4 and S5 ). Both C1 and C2 were BRCA enriched, and C2 consisted of the majority of HER2and normal-like tumors. C2 was also significantly enriched with infiltrating lobular carcinomas, whereas over 95% of cases in C4 were basal-like ductal BRCA. C5 consisted mainly of OV and serous-like UCEC, a similarity noted previously (). Over 50% of cases in C7 were UCS and, given its high EMT signature (), C7 therefore likely exhibits EMT characteristics. Overall, the Pan-Gyn mRNA subtypes showed prognostic value, even after adjusting for lineage (p < 0.0001, chi-square test) ( Figure 4 B). UCEC, in particular, appeared in five of the nine clusters and exhibited significant differences in overall survival, depending on cluster membership ( Figure 4 C).

Unsupervised clustering of 2,586 cancer-specific, hypermethylated loci across all Pan-Gyn tumors revealed heterogeneity of DNA methylation patterns ( Figure S3 B; Tables S4 and S5 ). Unsurprisingly, tumor samples from the same tissue of origin (e.g., OV, UCS, or CESC) clustered together with the exception of two major groups, which were found to be highly robust via cluster stability analysis (83% and 90% for left and right branches, respectively) ( Figures S3 C and S3D). The left branch with lower degrees of hypermethylation consisted of the majority of OV and UCS, normal and basal-like BRCA, and microsatellite-stable UCECs (both endometrioid and serous subtype). The hypermethylator (right) cluster included most CESC tumors, the majority of BRCA, and microsatellite-unstable UCEC. The seven-cluster resolution was retained when perturbing samples across all of the TCGA Pan-Can cohort ( Figure S3 E), with a small subset of UCEC samples reassigned. C7 (mostly CESC) had the highest degree of hypermethylation across all tumor types in the study, followed by a luminal B BRCA-rich C4, which also consisted of HER2and a small fraction of basal-like BRCA. Within tumor subtype (e.g., endometrioid UCEC), the heterogeneity of DNA methylation patterns identified samples that showed greater deficiency in DNA mismatch repair pathways (via MLH1 silencing). Hypermethylation and concomitant downregulation of two genes in the homologous repair pathway, BRCA1 and RAD51C, were observed almost exclusively in OV (12.7% and 3.0%, respectively) and basal-like BRCA cancers (2.8% and 2.6%, respectively).

Unsupervised hierarchical clustering of the Pan-Gyn cohort using in silico admixture removal-corrected () segmentation data revealed six clusters with distinct copy-number profiles ( Figure 3 Tables S4 and S5 ). Prominent features that distinguished the clusters included SCNAs in chr 8, 16q, and 1q, among others. OV, serous UCEC, UCS and basal-like, HER2, and luminal B BRCA tumors clustered almost exclusively into C4 and C6. Conversely, luminal A BRCA and endometrioid UCEC samples were divided among all clusters, providing evidence for additional tumor subtypes beyond the traditional clinical classifications (). C4 and C6 showed a high degree of genomic copy-number instability, consistent with their prevailing TP53 mutation signatures (), and contained the highest numbers of advanced-stage cancers ( Figure S3 A). Unlike other clusters, more than 50% of the samples in C4 and C6 had undergone at least one whole-genome doubling event. C3 accounted for the largest proportion of CESC samples and uniquely exhibited a focal 11q22 amplification containing the oncogene YAP1. C2, with 74% endometrioid UCEC, contained a majority of the POLE-mutant cases and exhibited a quiet SCNA landscape with few broad-level gains or losses. C1 and C5 consisted primarily of endometrioid UCEC and luminal A BRCA tumors, accounting for 85% and 72% of the samples in the two clusters, respectively. Both clusters had similar alteration profiles genome wide, except in the frequencies of 1q and chr 8 gains (p < 2.2 × 10, Fisher's exact test); the former occurred twice as frequently in C1 and the latter seven times as frequently in C5. Overall, gain of 1q was the most frequent chromosomal arm-level event, occurring in 49.5% of samples across all five Pan-Gyn cancer types. Other frequently recurring arm-level events included gain of 3q, 8q, and chr 20, and loss of 4p, 13q, 16q, 17p, and 22q.

Unsupervised hierarchical clustering based on the contribution of each signature divided the Pan-Gyn samples into 10 clusters that showed associations with various molecular/clinical features ( Figures 2 C and S2 C; Tables S4 and S5 ). Cluster C1 was highly enriched with OV samples (and basal BRCA and UCEC to a lesser extent) and contributed strongly to S10, a signature associated with germline and somatic BRCA1 and BRCA2 mutations that correlate with responsiveness to PARP inhibitors and platinum-based therapy (). C1 also had samples with frequent TP53 mutations and homozygous deletions, supporting the association with an ineffective DNA double-strand break repair COSMIC signature. C2, which contained BRCA, OV, and UCEC samples, was associated with transcriptional strand bias for T > C substitutions, whereas C3, which contained BRCA and OV samples, was associated with transcriptional strand bias for T > A mutations. C4 consisted principally of breast samples and contributed to S8, the signature most associated with COSMIC 5 (etiology unknown). C5, principally composed of UCEC tumors with high microsatellite instability and mutations in MLH1, MSH2, MSH3, or MSH6, contributed most strongly to signature S6. S6 is correlated with COSMIC signatures 6, 15, and 20, which are associated with defective DNA mismatch repair (suggesting possible sensitivity to immune checkpoint inhibitors). C9 comprised CESC and BRCA samples and represented the AID/APOBEC signatures S1 and S2, providing further evidence for enrichment of APOBEC mutagenesis in these cancers (). C10 was associated with POLE-mutant UCEC samples.

Mutation signatures have provided insight into mechanisms underlying tumor development and have informed patient therapy (). Analysis by non-negative matrix factorization on the Pan-Gyn dataset suggested that 10 mutation signatures could explain nearly 90% of the variability observed in the original mutation/sample matrix ( Figures S2 A and S2B). The 10 Pan-Gyn signatures (S1 to S10) variably correlated with the 30 COSMIC signatures ( http://cancer.sanger.ac.uk/cosmic/signatures ) () ( Figure 2 B). S1 correlated strongly with COSMIC signature 13 (r = 0.99) and S2 correlated with COSMIC signature 2 (r = 0.95); both signatures suggest activity of the AID/APOBEC family of cytidine deaminases. S3 correlated with COSMIC signature 1 (r = 0.94), indicating an endogenous process initiated by spontaneous deamination of 5-methylcytosine. S4 and the ultramutator COSMIC signature 10 were highly correlated (r = 0.97), presumably reflecting altered activity of POLE. A smaller correlation was found between S10 and COSMIC signature 3 (r = 0.58), associated with germline and somatic BRCA1 and BRCA2 mutations. All of the correlations were statistically significant (FDR < 0.05).

(A) Mutation profiles of 2,029 Pan-Gyn samples (columns) in which at least one somatic mutation occurred in at least one of the 46 significantly mutated genes (SMGs). Top: mutation burdens per sample, divided into synonymous and non-synonymous mutation types. Middle: types of mutations in each of the 46 SMGs per sample. Bottom: covariate bars showing the mutation cluster, genomic alterations in six genes from the DNA damage-response pathway, and tumor type for each sample.

There were 46 SMGs based on the intersection of those genes identified by MutSigCV v.1.4 () and those identified by previous methods () ( Figure 2 A). The top five most frequently mutated genes were TP53 (44% of samples mutated), PIK3CA (32%), PTEN (20%), ARID1A (14%), and PIK3R1 (11%). Eleven of the 46 SMGs had not been previously reported in any of the TCGA gynecologic or breast marker papers () ( Table S3 ). Among them, ACVR2A, a member of the transforming growth factor β superfamily that functions in pathways implicated in both tumor progression and suppression (), was the most frequently mutated (in 4.8% of the cohort). LATS1 was the next most frequently mutated (3.8%) and functions in the Hippo signaling pathway, which controls organ size, restricts proliferation, promotes apoptosis, and has been implicated in multiple cancer types (). CCAR1 was mutated at 3.6%; its protein product functions as a p53 coactivator and plays roles in cell proliferation, apoptosis, and, in breast cancer, estrogen-dependent growth (). We found 220 patients (10%) that had no detectable SMGs.

We analyzed 2,258 patient samples with mutation data from TCGA for SMGs and operative mutational processes across the Pan-Gyn tumor types. The types of mutations in the Pan-Gyn cohort are summarized in Table S2 . The average mutation load varied widely by tumor type, with CESC samples having the highest median frequency (5.3 mutations/mbp). UCEC samples showed a bimodal distribution due to a subset of hypermutators described previously ().

We performed bootstrapping-based analyses to investigate whether there were greater numbers of shared mutated or copy-number altered genes among the five Pan-Gyn tumor types versus random sets of five tumor types. The results showed that 23 mutated genes were enriched in the Pan-Gyn tumor types versus only 6 mutated genes expected by random chance (p = 0.10) ( Figure S1 C), whereas 122 SCNA genes were enriched in Pan-Gyn versus 2 by random chance (p < 0.0001) ( Figure S1 D).

Next, we used GISTIC2.0 () to identify statistically significant recurring SCNAs in the Pan-Gyn cohort and, separately, in the non-Gyn cohort. We identified 61 significant regions in the Pan-Gyn tumors, 27 amplifications and 34 deletions, of which 12 amplifications and 6 deletions were not found in the non-Gyn cohort, suggesting a relative specificity for Pan-Gyn tumors ( Figures 1 B and S1 A; Table S1 ). Two of the 12 uniquely Pan-Gyn amplifications and one of the 6 deletions had not previously been reported in single-disease TCGA studies of the same tumor types (). One of the previously unreported amplifications was a focal region in 1q42.3 covering IRF2BP2, which encodes an interferon regulatory factor binding protein that is implicated in cellular differentiation, proliferation, and survival processes (). The other unreported amplification, located in 10p15.1, included an intergenic non-coding region downstream of KLF6 that bears striking resemblance to known oncogenic super-enhancer regions () and PFKFB3, a gene that is being investigated as a therapeutic target in various cancers (). The deletion consisted of a ∼7 MB region in 9q34.3 that contains the tumor suppressor genes TSC1 and NOTCH1.

We identified molecular features that differed in frequency among the five Pan-Gyn tumor types and the remaining 28 TCGA non-gynecologic (non-Gyn) tumor types ( https://tcga-data.nci.nih.gov/docs/publications/tcga/ ). After adjusting for sample size per tumor type, we found 23 genes (including ARID1A, ERBB3, BRCA1, FBXW7, KMT2C, PIK3CA, PIK3R1, PPP2R1A, PTEN, and TP53) that were mutated at higher frequencies across the Pan-Gyn tumor types than across the non-Gyn types (false discovery rate [FDR] < 0.01, Fisher's exact test) ( Figure 1 A). Eighteen of those genes were found to be significantly mutated genes (SMGs) in the Pan-Gyn cohort (as described later).

(B) Amplification (red) and deletion (blue) q values from GISTIC2.0 for SCNA peaks of significant copy-number gain and loss plotted for Pan-Gyn versus non-Gyn cohorts. Genes named are the suspected targets of amplification or deletion, if identifiable. Otherwise, peaks are labeled with the nearest cytoband's designation. Peaks found in only one cohort were assigned values of NS (not significant) in the other cohort. See also Figure S1 and Table S1

We used data generated from 2,579 TCGA patient samples (the “Pan-Gyn” cohort; n = 1,087 BRCA, 308 CESC, 579 OV, 548 UCEC, and 57 UCS) using fresh-frozen primary samples prior to any chemotherapy or radiation therapy. All sample collections were approved by local institutional review boards. We analyzed data of multiple types, including clinical, somatic copy-number alterations (SCNAs), mutations, DNA methylation, and expression of mRNA, microRNA (miRNA), long non-coding RNA (lncRNA), and proteins. The data were adjusted for batch effects before further analysis (see the STAR Methods ). Here, we (1) present results that distinguish Pan-Gyn from the rest of the TCGA tumor types, (2) summarize platform-specific analysis results, and (3) propose cross-tumor type subtypes with potential prognostic and therapeutic value.

Discussion

We performed an integrative, multi-platform analysis of the TCGA Pan-Gyn tumors based on 2,579 clinical cases. In addition to confirming the robustness of many observations cited in previous TCGA publications on the individual tumor types, our approaches also provided a considerable number of additional findings: (1) multiple genomic and epigenomic features that help to distinguish gynecologic and breast tumors from the other 28 TCGA tumor types; (2) 61 somatic copy-number peaks in the Pan-Gyn cohort, 11 not previously reported by TCGA; (3) 3 somatic copy-number alterations (containing genes of potential therapeutic relevance) unique to gynecologic cancers among the 33 TCGA tumor types; (4) 46 SMGs in the Pan-Gyn cohort, 11 not previously reported by TCGA; (5) 10 predominant mutation signatures, with 10% of the samples lacking identified SMGs; (6) analyses of the 10 mutation signatures in relation to the 30 COSMIC signatures, demonstrating relationships between the two sets of signatures; (7) shared similar miRNA profiles between most of the Pan-Gyn tumor types; the exception, OV, was extremely different from the rest, and, unexpectedly, the miRNA profiles of basal-like BRCA cancers closely resembled those of CESC cancers; (8) some OV and UCEC samples exhibited the “reactive” proteomic signature previously identified and shown to be prognostically relevant in BRCA; (9) identification of a subtype with low protein expression of ERs and AR (important markers for hormone therapy) that spanned all five tumor types; (10) large-scale lncRNA analysis not performed previously for any of the TCGA gynecologic or breast marker papers (our findings included several ER-regulated lncRNAs and an ER-TERC/DKC1-NEAT1/OIP5-AS1-TUG1 gene/lncRNA network); (11) similar lncRNA profiles in BRCA and CESC, in contrast to the very distinct profiles in UCEC and OV; (12) lineage-specific gene expression patterns and lineage-related (but not always cancer type-specific) features revealed by multi-platform clustering of tumor samples; (13) pathway analyses that revealed subsets of BRCA, OV, and UCEC samples with high levels of leukocyte infiltration, a primary marker of immune response and possible susceptibility to immunotherapy (most of the CESC samples, but virtually none of the UCS samples, showed high leukocyte infiltration); (14) roughly half of the basal-like BRCA samples resembled luminal/HER2+ BRCA samples at the pathway level (but not the gene expression level; this pattern suggests convergence of independent gene expression changes to drive a limited number of pathway outputs and could prove useful with respect to development and selection of therapies across BRCA subtypes); (15) five cross-Pan-Gyn subtypes defined by multi-platform clustering of 16 molecular features; these five clusters have possible clinical implications and predictive value for survival beyond that of tumor type alone; (16) reduction of the 16 molecular features to six in the form of a binary decision tree that retained prognostic value.

From a potential therapeutic perspective, two of the Pan-Gyn clusters (C1 and C5) in (15) showed high levels of hormone receptors (ERs, PR, and/or AR), suggesting possible responsiveness to hormone therapy. C3 showed high levels of immune markers, warranting further exploration for possible value in selecting patients for immunotherapy. C2 included hypermutators and ultramutators, which have been associated with relatively good survival on conventional therapy. A subset of C4 showed ERBB2 amplification, suggesting possible responsiveness to HER2-targeted therapy. ERBB2 mutation and amplification were mutually exclusive, but both sets of tumors might benefit from HER2-targeted therapy.

The decision tree we propose could potentially enable clinicians to classify patients more easily into one of the five Pan-Gyn subtypes. The tree is based on six features, three of which (ER, PR, and AR status) are already routinely used in the clinic. Widely available CLIA-certified gene-panel assays can estimate SCNA and mutation loads, and immune infiltration can be assessed by standard immunohistochemistry or new imaging technologies. Therefore, after further study and validation, our decision tree might be able to aid in assignment of patients to treatment groups. It should be understood, however, that all of the clinically interesting possibilities illuminated by a project like Pan-Gyn should be considered as hypothesis-generators, yielding clues to be tested and, if possible, validated in follow-up studies.

DNA methylation data revealed large high- and low-methylation clusters. CESC, as well as luminal B and HER2+ BRCA tumors, showed high levels of DNA methylation, suggesting epigenetics as a driving force in those tumor types. Clustering based on DNA methylation separated MLH1-silenced (i.e., hypermutator) endometrioid UCEC samples from the non-MLH1-silenced ones, suggesting that MLH1 may not be specifically targeted for epigenetic silencing but, instead, may be silenced by a more generic mechanism that silences multiple genes.

Chiu et al., 2018 Chiu H.-S.

Somvanshi S.

Patel E.

Chen T.-W.

Singh V.P.

Zorman B.

Patil S.L.

Pan Y.

Chatterjee S.S.

The Cancer Genome Atlas Network

et al. Pan-cancer analysis of lncRNA regulation supports their targeting of cancer genes in each tumor context. Gene sets associated with myeloid and stem cell development suggest that TERC activity, initially identified in zebrafish, might play a role in human development as well (). In the present study, CESC and OV showed positive correlation of TERC with MYC, TERT, telomere maintenance targets, miR-21, and CTNNB1 gene targets. However, serous UCEC showed a unique pattern of negative correlation with TERT targets, positive correlation with miR-21 targets, and no correlation with MYC, CTNNB1, or telomere maintenance targets. In luminal A BRCA, miR-21 targets were positively correlated with TERC.

Pathway and subtype analyses revealed an important role for immune markers. OV, basal-like BRCA, luminal BRCA, and HER2+ BRCA cancer samples split into immune-high and immune-low subtypes. Immune-high HER2+ tumors showed a trend toward longer survival than their immune-low counterpart, but the difference was not quite statistically significant for the sample size available. Most of the CESC samples showed high immune marker signatures, likely due to their almost 100% prevalence of HPV. In contrast, most of the UCEC and UCS samples showed little immune infiltration. The high-immune subsets might potentially benefit from immunotherapy.

Pathway analysis unexpectedly showed that approximately half of the basal-like BRCA cancers clustered together with the HER2 and luminal B samples, whereas the other half did not, suggesting pathway-level similarities not detected at the level of single RNAs. The similarities included higher inferred activation of AR signaling and lower enrichment of FOXA1, FOXA2, and XBP1/2, as well as the WNT and SHH pathways. Those observations are consistent with convergence of diverse transcriptional events on a limited number of functional pathways. Additional study will be required to test the robustness of those observations.

In summary, this integrative, multi-platform Pan-Gyn analysis has confirmed similarities previously identified across the five tumor types and identified relationships not observed in previous studies of the individual diseases. A number of the observations have possible prognostic and/or therapeutic relevance. Our capture of major molecular information content using a simple six-parameter binary decision tree could facilitate the clinical use of Pan-Gyn molecular subtypes and may help in selection for and administration of therapeutic trials across the Pan-Gyn spectrum. However, all of the clinical possibilities illuminated by this study will require extensive additional research, particularly functional validation (which is beyond the intended scope of TCGA studies), before they would be ready for practical application. In addition to its particular observations, this study presents a broad-based, curated atlas of Pan-Gyn molecular features that we believe will be useful as a starting point for many researchers in the field.