An individual's sex has been long recognized as a key factor affecting cancer incidence, prognosis, and treatment responses. However, the molecular basis for sex disparities in cancer remains poorly understood. We performed a comprehensive analysis of molecular differences between male and female patients in 13 cancer types of The Cancer Genome Atlas and revealed two sex-effect groups associated with distinct incidence and mortality profiles. One group contains a small number of sex-affected genes, whereas the other shows much more extensive sex-biased molecular signatures. Importantly, 53% of clinically actionable genes (60/114) show sex-biased signatures. Our study provides a systematic molecular-level understanding of sex effects in diverse cancers and suggests a pressing need to develop sex-specific therapeutic strategies in certain cancer types.

For many cancer types, men and women are very different in terms of susceptibility, survival, and mortality. But our knowledge about the differences between male and female cancer patients at the molecular level is very limited. This is a fundamental issue for cancer prevention and therapy but has not been investigated systematically. Through a rigorous, multidimensional analysis of sex-affected genes, we revealed a two-group molecular classification of cancer types (weak sex-effect group versus strong sex-effect group) and demonstrated that >50% of clinically actionable genes showed sex-biased molecular signatures in certain cancer types. Our study helps elucidate the molecular basis for sex disparities in cancer and lays a critical foundation for the future development of precision cancer medicine.

The availability of high-throughput molecular data over large, well-characterized patient sample cohorts of multiple cancer types through The Cancer Genome Atlas (TCGA) project provides an unprecedented opportunity to address this question (). Using these TCGA data, we performed a comprehensive, rigorous, pan-cancer analysis in order to address if and what the molecular-level differences are between the male and female cancer patients that have otherwise similar clinical and tumor characteristics.

An individual's sex is a key factor affecting the risk of cancer development and management during his or her lifetime. This is not only because some cancer types are sex-specific (e.g., ovarian cancer in women and prostate cancer in men), but there are significant sex disparities in the incidence of cancer, tumor aggressiveness, prognosis, and treatment responses for many other cancer types (). However, the molecular basis for these observed disparities remains poorly understood. Previous studies have reported some sex-related molecular patterns. For example, an elevated mutation rate of EGFR in female patients with non-small-cell lung cancer may contribute to enhanced response rates among female patients (); and H3K27me3 demethylase UTX has been identified as a sex-specific tumor suppressor in T cell acute lymphoblastic leukemia (). However, these studies on the sex effect have been limited to individual genes, single molecular data types, and single cancer lineages. Furthermore, previous cohort analyses of sex-affected molecular traits have been based on simple statistical tests without explicitly accounting for potentially confounding factors such as patient age, histological subtype, and tumor stage, which may introduce significant bias and misinterpretation. So far, a comprehensive characterization of molecular differences between male and female patients and related mechanisms across a broad range of human cancer types using a rigorous statistical approach has not been performed.

To investigate the clinical implications of the sex-biased molecular signatures, we focused on a set of clinically actionable genes, which includes 86 therapeutic targets of FDA-approved drugs or their associated predictive marker genes. Across the various molecular dimensions we examined ( Figure S3 ), we found that 60 genes are associated with at least one type of sex-biased signature, almost all of which were identified in seven of the eight cancer types in the strong sex-effect group, ranging from two in LUAD/LUSC to 32 in KIRC ( Figure 5 ). Among these genes, quite a few showed sex-biased signatures in the cancer type for which their relevant drugs are being used in clinical practice, which is of particular importance. For example, EGFR, arguably the most important therapeutic target in LUAD, shows female-biased mRNA expression, which may contribute to a higher response rate in female patients (). TOP2B shows male-biased DNA methylation in BLCA, and the relevant drug valrubicin is being used as an intravesical therapy for BCG-refractory carcinoma in situ of the urinary bladder. Varlrubicin can increase the risk of heart failure, but this side effect can be suppressed by tamoxifen, an agonist of estrogen (), which suggests that the innately distinct background level of sex hormone (e.g., estrogen levels) between male and female patients may lead to different drug responses or efficacy. Across cancer types, the two kidney cancers (KIRC and KIRP) contain the most clinically actionable genes, with a KIRP signature including FBXW7, FGFR1, RET, and TSC2, and a KIRC signature including FGFR3, AKT, TSC2, and KIT. These results highlight the clinical importance of sex-biased molecular signatures.

The mapping between FDA-approved drugs and their related clinically actionable genes (left) and the observed sex bias of these clinically actionable genes across cancer types (right). Different symbol shapes indicate different types of molecular signatures, and the filled shapes indicate that the gene is a therapeutic target of clinical practice in the corresponding cancer type. See Figure S3

As for protein expression, we found abundant sex-biased protein expression signals in HNSC and KIRC. Interestingly, 12 (of the 15) sex-biased proteins identified in HNSC or 18 (of the 25) sex-biased proteins identified in KIRC form well-connected regulatory networks ( Figures 4 C and 4D); SRC and MAPK proteins take a central position in both networks. SRC plays a key role in regulating a variety of cellular-signaling transduction pathways, and the frequent activation of the SRC kinase pathway has been observed in many caner types, especially in metastatic diseases (). Thus, targeted inhibition of SRC signaling has been suggested as an effective therapeutic strategy and has been under intensive clinical investigation in several cancers, including renal cell cancer (). Indeed, the factor of female sex has been reported to predict the response of imatinib, an inhibitor of BRC-ABL tyrosine kinase that also affects the SRC/MAPK pathway, in patients with chronic myeloid leukemia (). This finding may be relevant to the female-biased expression pattern observed here.

Combining the analyses on mRNA and DNA methylation, we identified several themes among the sex-affected pathways. The first group relates to the immune response, including allograft rejection, IL2 and STAT5 signaling, IL6, JAK, and STAT3 signaling, inflammatory responses, interferon alpha response, interferon gamma response, and TNF-α signaling and complement. These enrichments are well aligned with the long-standing observation that females and males often mount significantly different immune responses (). The second group relates to apoptosis and the cell cycle, including E2F targets, the G2/M checkpoint, mitotic spindle, and Myc targets. The targets of E2F transcription factors play a major role during the G1/S transition and, interestingly, some target genes encode differentiation factors that are transcribed in developmentally regulated and sex-specific patterns (). The third group is several metabolism-related pathways such as angiogenesis, bile acid metabolism, fatty acid metabolism, glycolysis, and xenobiotic metabolism. Notably, sex-related metabolic differences and hormonal regulation have been reported in several studies (). Finally, DNA repair and P53 pathways also show sex-biased expression signatures in several cancer types including HNSC, KIRC, and LIHC.

Focusing on the eight cancer types in the strong sex-effect group, we further examined the genes identified by RNA expression (FDR ≤ 0.05), and found that, in all the cancer types, the sex bias observed at the mRNA level of a gene tended to be the opposite of that at its DNA methylation level. This is consistent with the established role of DNA methylation in gene regulation: hypermethylation leads to gene silencing, while hypomethylation results in the up-regulation of gene expression ( Figure 4 A ). To gain insight into the global patterns of the sex effect on gene expression, we performed a gene-set-enrichment analysis (GSEA) given the gene ranks according to the sex bias, and identified the affected pathways. In general, we observed biologically sensible, contrasting sex-bias patterns at the mRNA and DNA methylation levels ( Figure 4 B). We obtained similar results after excluding the genes in the sex chromosomes ( Figure S2 B). Thus, both gene-based and gene-rank-based pathway-enrichment analyses indicated that the sex-biased mRNA expression patterns in the strong sex-effect cancers are partially the result of the corresponding sex-biased DNA methylation.

To characterize the sex-biased gene expression signatures in a comprehensive manner, we performed analyses on RNA expression (∼20,000 genes, including ∼17,000 protein-coding genes and ∼3,000 noncoding genes), DNA methylation (∼16,000 protein-coding genes), miRNA (∼500), and protein expression (191 proteins and phosphorylated proteins). For RNA expression, the number of sex-biased genes in the weak sex-effect group was very limited ( Figure 1 C); while the number of sex-biased genes in the strong sex-effect group was much higher (ranging from 79 in BLCA to 2,819 in KIRC, FDR ≤ 0.05), up to 14% of the whole gene set under survey. As expected, we found that the sex-biased genes were significantly enriched in the sex chromosomes (i.e., chrX and chrY); and, in particular, the vast majority (88%) of the sex-biased genes in at least four cancer types come from these two chromosomes ( Figure S2 A, Fisher's exact test, p = 2.2 × 10). For comparison, we performed a similar analysis of the mRNA expression data from related normal tissue samples of the five cancer types in the strong sex-effect group ( Table S4 ). Although much fewer sex-biased genes were detected in the normal samples (likely due to the much smaller sample sizes), we observed the same enrichment of sex-biased genes in the sex chromosomes. One of the most commonly identified genes is XIST, a major effector of chromosome X inactivation; and its role in cancer has been extensively studied (). In parallel, we found many more genes with a sex-biased DNA methylation pattern in the strong sex-effect group than in the weak sex-effect group ( Figure 1 C). We also identified sex-biased miRNA genes in six cancer types, five of which are in the strong sex-effect group.

To identify sex-biased SCNAs, we focused on the most significant SCNAs identified by GISTIC () in each cancer type. The number of region-based SCNAs (including both focal and arm-level amplifications/deletions) we surveyed ranged from 68 in KIRP to 122 in LUAD. At FDR = 0.05, we identified sex-biased SCNAs in LUSC, KIRP, and KIRC, all of which were in the strong sex-effect group. Figure 3 provides an overview of the statistical significance of sex-biased focal amplifications and deletions in these three cancer types, showing a total of 21 significant peaks (FDR ≤ 0.1). Notably, these sex-biased SCNAs cover quite a number of clinically actionable genes (as highlighted in Figure 3 ). Among them, two gene groups are of particular clinical interest. One group is related to the phosphoinositide 3-kinase (PI3K) pathway, which represents the signaling pathway most commonly activated in human cancer and has been under intensive clinical investigation (), and related genes include PIK3CA, MTOR, PTEN, NF1, and FBXW7. In LUSC, an SCNA (17q11.2) harboring NF1 is more frequently deleted in females, and the inactivation of this gene has been associated with sensitivity to mTOR inhibitors and resistance to MEK inhibitors (). In KIRP, the 4q34.3 deletion containing FBXW7 occurs more frequently in females, and the deletion of this gene may affect the sensitivity to rapamycin treatment and antitubulin chemotherapeutics (). In KIRC, the amplicon 3q26 containing PI3KCA occurs more frequently in females, and PI3KCA activation has been reported to predict the sensitivity to PI3K/AKT/mTOR inhibitors (); and the deletions of 1p36.23 (harboring MTOR) and 10q23.31 (harboring PTEN) are more prevalent in male patients. Another group is several therapeutic targets for cancer immunotherapy, which were detected in KIRC. TNFRSF8 (CD30) and CD52 are more frequently lost in males, and these two genes are the targets of Food and Drug Administration (FDA)-approved drugs for lymphoma and B cell chronic lymphocytic leukemia, respectively (). The deletion involving PDCD1 (PD-1) shows a similar bias; this gene represents an immune checkpoint and has been a major focus in the development of immunotherapy ().

(A–C) The genome-wide, sex-biased focal amplification/deletion patterns in (A) LUSC, (B) KIRP, and (C) KIRC. The male-biased SCNA peaks are shown in blue, and the female-biased ones are in red. The significant SCNA regions (FDR ≤ 0.1, indicated by the vertical green dotted lines) harboring important cancer genes are annotated, and the clinically actionable genes are highlighted in purple.

To identify sex-biased mutation patterns, we focused on highly mutated genes in each cancer type (≥5% mutation frequency). The number of mutated genes under survey ranged from two in THCA to 650 in LUSC ( Experimental Procedures ). At FDR = 0.05, we identified 11 sex-biased genes in LUAD and one in LIHC ( Figures 2 A and 2B ). The most striking gene identified in LUAD was STK11 (also known as LKB1), which encodes a major upstream kinase that activates the energy-sensing AMPK pathway and is frequently mutated in a variety of cancers (). Clinically, inactivating mutations in this gene may predict sensitivity to mTOR inhibitors, SRC inhibitors, and the metabolism drug phenformin in lung cancer (). Consistent with a previous analysis (), we found this gene to be more frequently mutated in males than in females, even after correcting for potential confounders (male versus female: 22.8% versus 11.3%, p < 6.9 × 10, FDR = 0.033). Another gene of interest in LUAD is DMD, which encodes a protein called dystrophin and is presumably responsible for Duchenne and Becker muscular dystrophies (). The mutations in this gene were highly biased toward female patients (8.4% versus 19.6%, p < 3.7 × 10, FDR = 0.029). In particular, compared with the mutations in males, those in females had a greater tendency to be loss-of-function truncating mutations (Fisher's exact test, p < 0.026, Figure 2 C). EGFR, a major therapeutic target in LUAD, showed a higher mutation frequency in female patients but did not reach the FDR cutoff (9.8% versus 15.8%, p < 0.042, FDR = 0.28), which is consistent with previous reports (). The only sex-biased mutation gene we identified in LICH was CTNNB1, the activation of which could affect the sensitivity to EGFR inhibitors, PI3K inhibitors, AKT inhibitors, and WNT inhibitors (). This gene is more frequently mutated in males (37.9% versus 12.2%, p < 1.2 × 10, FDR = 7.4 × 10), and we further confirmed this pattern in an independent sample cohort () (22.9% versus 12.5%, p < 0.044, Figure 2 C).

(A and B) Overview of the genes with a sex-biased mutation signature in (A) LUAD and (B) LIHC (FDR ≤ 0.05). Samples are displayed as columns with the sex label on the top, and different colors indicate different types of somatic mutations. The bar plots next to the heatmaps show the recalibrated mutation frequencies after propensity score weighting.

Strikingly, compared with those in the weak sex-effect group, the cancer types in the strong sex-effect group show a higher incidence sex-bias index (defined on the basis of the ratio of new cases of female and male patients, Figures 1 D and Table S2 ) and a higher mortality sex-bias index (defined on the basis of the ratio of the number of deaths among female and male patients, Figures 1 E and Table S3 ). Furthermore, according to the National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines in Oncology, a patient's sex has been suggested as a prognostic factor in five of the eight cancer types in the strong sex-effect group (i.e., LUSC, LUAD, HNSC, KIRC, and KIRP), but not in any cancer in the weak sex-effect group. We observed similar patterns based on other statistical cutoffs (e.g., FDR = 0.1 and 0.2, Figures S1 E and S1F). Taken together, these results provide an overview of molecular differences between male and female cancer patients, and the distinct patterns of the two sex-effect groups are well aligned with the empirical observations of disease behaviors across cancer types.

To identify molecular differences related to sex with appropriate controls for other factors that may bias findings (e.g., age, race, disease stage, and tumor purity, see potential confounders surveyed in Figure S1 A), we employed an analytic approach based on the propensity score. Introduced in the early 1980s (), the propensity score is an important statistical tool for adjusting for confounding factors in observational studies, and has been widely used in clinical research, economics, and social sciences (). Importantly, samples with the same propensity score have the same distribution of measured confounders, so balancing the confounders can be achieved by simply balancing the propensity score. As outlined in Figure 1 A , we calculated the propensity scores, “reweighted” the samples in a cohort, and then compared the molecular features between the two balanced sex groups ( Experimental Procedures ). Our statistical simulations further confirmed that the propensity score method outperformed alternative methods in terms of both sensitivity and specificity ( Figure S1 B). With this approach, we identified significantly differential molecular features (genes) between female and male patients in the 13 cancer types (false discovery rate [FDR] ≤ 0.05). For the significant feature set identified for a given data type and cancer type, we further confirmed its statistical significance by using permutation tests of randomly shuffling the sex labels of the patients ( Experimental Procedures Figure S1 C). Focusing on the significant feature sets confirmed by the permutation tests, we examined the global patterns of sex-biased genes across different molecular types and found a clear separation among the cancer types under survey. One group includes LGG, GBM, COAD, READ, and LAML, each of which shows a relatively small number of genes (44–104, mean 67) with a sex-biased pattern, which we therefore labeled the weak sex-effect group. The other group includes THCA, HNSC, LUSC, LUAD, LIHC, BLCA, KIRP, and KIRC, each of which shows much more extensive sex-biased molecular signatures (240–3,521, mean 1,112); we therefore labeled it the strong sex-effect group ( Figures 1 B and Table S1 ). Indeed, no sex-biased somatically mutated genes or SCNAs were identified in any cancer of the weak sex-effect group; and the numbers of sex-biased genes at the mRNA, DNA methylation, and miRNA expression levels in this group were much lower than those in the strong sex-effect group ( Figure 1 C, Wilcoxon rank test, DNA methylation p < 0.015; mRNA p < 0.0022; miRNA p < 0.074). Importantly, the sample sizes included in the analysis between these two cancer groups are similar, and so the observed distinct patterns cannot be attributed to the power to detect differences ( Figure S1 D).

(C) The distribution of significant feature counts in the weak sex-effect group versus the strong sex-effect group (from left to right: DNA methylation, mRNA expression, and miRNA expression). The boundaries of the box mark the first and third quartile, with the median in the center, and whiskers extending to 1.5 interquartile range from the boundaries.

(B) Relative abundance of multidimensional sex-biased molecular signatures identified by the propensity score algorithm across cancer types (FDR ≤ 0.05). The fraction of significant features over total features was first calculated in each cancer type and then normalized across all cancer types. A gray box indicates that the specific data are not available for that cancer type. The bar plot shows the total number of significant features (by aggregating across all platforms) for each cancer type. The weak sex-effect and strong sex-effect groups are marked in orange and purple, respectively.

We focused on 13 major TCGA cancer types with sufficient sample sizes (≥30 for both male and female patients) for at least five out of six molecular data types (somatic mutations, somatic copy-number alternations [SCNAs], mRNA expression, DNA methylation, miRNA expression, and protein expression, Table 1 ). These cancer types include bladder urothelial carcinoma (BLCA), colon adenocarcinoma (COAD), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), rectal carcinoma (READ), and thyroid carcinoma (THCA). It is important to note that these TCGA patient cohorts were not designed to study the sex effect, thus male and female patient groups in a cancer type are frequently different in other patient and tumor characteristics.

For each data type and each cancer type, the numbers of male and female patients available in the analysis are shown. NA, the data type in that cancer type was not included in the analysis. TCGA high-throughput characterization platforms are as follows. Somatic mutations: exome-sequencing data based on IlluminaGA/HiSeq automated DNA sequencing platform and SOLiD sequencing platform. Somatic copy-number alterations (SCNAs): Affymetrix Genome-Wide Human SNP Array 6.0. DNA methylation: Illumina Infinium Human DNA Methylation 450K Array. RNA expression: Illumina HiSeq 2000 RNA Sequencing V2. miRNA expression: Illumina Genome Analyzer/HiSeq 2000 miRNA sequencing platform. Protein expression: MD Anderson reverse-phase protein arrays. BLCA, bladder urothelial carcinoma; COAD, colon adenocarcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LAML, acute myeloid leukemia; LGG, brain lower grade glioma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; READ, rectal carcinoma; and THCA, thyroid carcinoma.

Discussion

Although the significance of the sex effect in cancer incidence, prognosis, and treatment responses has long been recognized in the literature, its molecular basis has largely remained elusive. Our study represents a comprehensive and well-controlled analysis that focuses on the molecular differences between male and female patients across a broad range of cancer types, and systematically catalogues the molecular signatures related to the sex effect from DNA to RNA to protein. Using the propensity score algorithm, we controlled the potential confounders (both patient characteristics such as age and smoking status, and tumor characteristics such as stage, histology subtype, and tumor purity) whenever possible in the analysis. Thus, for the TCGA datasets we assessed, the detected molecular differences between the two sex-effect groups could not be attributed to these potential confounders. Based on multidimensional molecular signatures, we defined two distinct cancer groups (weak sex-effect versus strong sex-effect): the cancer types in the weak group contain a very limited number of sex-biased genes and are associated with more balanced incidence and mortality ratios; whereas cancers in the strong group show extensive sex-biased molecular signatures and are associated with more distorted incidence and mortality ratios. This molecular classification of cancer types we put forward will help to achieve a molecular-level understanding of how the sex factor affects the behavior of different cancer types.

Given the widespread sex-biased gene expression signatures in the strong sex-effect group, it would be interesting to identify to what extent the observed sex bias is specific to cancer samples and elucidate the contributing factors. Our analysis based on the mRNA expression data of related TCGA normal samples detected much fewer sex-biased genes, suggesting the sex bias might be amplified during the tumorigenesis process. However, this observation should be interpreted with caution, because (1) the so-called normal tissues actually consist of quite distinct cell types from the corresponding tumor samples (e.g., the proportion of epithelial cells), which may confound the observed tumor-normal differences; and (2) the sample size of normal tissues is much smaller than that of the corresponding tumor samples, which may limit the detection power. Thus, further efforts are required to elucidate the relative contributions of various factors (e.g., sex chromosomes, hormones, and tumorigenesis) to the observed sex-biased gene expression signatures in cancer samples.