Data collected from 30 AML patients

We measured genome-wide gene expression (Supplementary Note 1) and in vitro drug sensitivity (Methods section) to a panel of 160 chemotherapy drugs and targeted inhibitors across 30 AML patient samples (Supplementary Data 1). The customized drug panel we used contained 62 drugs approved by the U.S. Food and Drug Administration (FDA) and encompassed a broad range of drug action mechanisms (Supplementary Data 2). The other drugs, investigational agents, have been studied in cancer patients. We chose 53 drugs that exhibited activity (cell viability ≤50%) against at least half of the patient samples (Supplementary Table 1). As was done previously in the CCLE study3, we processed the drug sensitivity data by curve fitting and then extracting summary statistics. We used the area under the curve (AUC) throughout the paper because it represents an average of drug sensitivity across a range of drug concentrations; indeed, AUC showed by far the strongest association with gene expression levels (Supplementary Note 2). Supplementary Data 3 describes usual evaluation (including risk group category and cytogenetic features), response to treatment, and duration of remission. Supplementary Note 3 summarizes the clinical information and describes our analysis on the consistency between clinical data and our in vitro drug sensitivity data. In brief, we showed a statistically significant association between FLT3 mutation status and 12 drugs known to have a FLT3 inhibitory role. Statistical significance of the association between the complete remission (CR) status and the AUC across all 53 drugs.

Drug sensitivity assayed in 14 AML cell lines

We utilized cell lines to intensify our focus on the hypotheses for which we could provide additional experimental evidence, since it is easier to perform overexpression or knockout experiments on cell lines than on primary patient tissues. However, we noted a very small overlap between our 160 drugs and the drugs tested on the 14 AML cell lines in the CCLE data (two drugs overlapping, each tested on three AML cell lines). Thus, for effective computational and experimental validation of the significant gene-drug associations discovered in the patient data, we measured in vitro drug sensitivity of 14 AML cell lines to the same set of 160 drugs in our high-throughput assay (Supplementary Data 1) while we used publicly available expression data of the 14 AML cell lines from CCLE. We observed a statistically significant overlap (Fisher’s exact test p-value = 3 × 10−6) of gene-drug pairs with significant association p-values between our discovery data from 30 AML patient samples and the CCLE validation data. Our unique validation setting let us measure the testability of discovered associations. We also surmised that in vitro drug sensitivity data that we measured on the AML cell lines, besides on the AML patient samples, would provide a valuable resource to the broader research community to generate or test hypotheses relevant to personalized medicine in AML.

The MERGE algorithm provides a new way to prioritize genes

Our MERGE algorithm provides a new way to prioritize gene-drug associations by incorporating prior information on genes’ relevance to AML and potential to drive it (Methods section). MERGE learns a priority score for each gene, called a MERGE score, based on the gene’s driver features: (1) mutation, (2) expression hubness, (3) whether the gene has a known Regulatory role, (4) genomic CNV, and (5) methylation. These driver features were extracted from publicly available sources, such as TCGA AML study12, AML expression studies14, and gene annotation databases. The details on the sources and the preprocessing of the driver features are included in Supplementary Notes 4–6.

The MERGE score represents a prior probability that the gene is a reliable (i.e., likely driven by biological mechanisms, not confounders) molecular marker for response to drugs, modeled as a weighted combination of the gene’s MERGE features (Fig. 1b). The MERGE algorithm jointly learns these driver feature weights and how the MERGE score of genes explains the observed gene-drug associations. Genes with high MERGE scores (i.e., high marker potentials) tended to have many observed associations with drugs. The learned driver feature weights provide new insights into what kind of molecular data is most informative of a gene’s potential to be a reliable marker for therapeutic response. After the MERGE score of each gene is estimated, we considered the top N genes based on MERGE scores as a prioritized subset and then selected the gene-drug pairs with significant association p-values (genome-wide FDR corrected p-value <0.1) (Supplementary Note 7).

Expression hubness significantly determines the MERGE scores

The driver feature weights learned by the MERGE algorithm (Methods section) indicate the relative importance of each driver feature on the MERGE scores (Fig. 1b). As described in Supplementary Note 8, the MERGE score of gene i is defined as \(\left( {\mathop {\sum }\limits_{k = 1}^5 v_kd_{ik}} \right),\) where \(v_k\) is the kth driver feature weight and \(d_{ik}\) is the kth driver feature of gene i. Expression hubness (i.e., number of neighbors in a gene network estimated based on publicly available AML expression data) has the highest weight (Fig. 2a) and makes the largest contribution to the MERGE scores (Fig. 2b, Supplementary Data 4).

Fig. 2 Importance of each driver feature in predicting the drug response based on the MERGE algorithm. a Learned driver feature weight values. The methylation feature has a negative weight, consistent with our prior knowledge that when DNA is methylated in the promoter region, the corresponding genes are inactivated and silenced. b We sort genes based on the MERGE score (x-axis) and plot the sum of the contribution of all driver features to the MERGE score (i.e., weighted combination of driver features) (y-axis). We decomposed the weighted combination into the five driver features and indicated the magnitude of the contribution of each feature (driver feature weight × driver feature value) with different colors. Expression hubness contributes the most to the score, followed by regulatory function and (lack of) methylation Full size image

This implies that gene expression data provide more information about a gene’s potential to predict drug response than other types of data, perhaps for the following reasons: (1) Gene expression data can reflect downstream effects of genetic or epigenetic changes that may not have been detected by existing mutation, CNV and methylation profiles. (2) Expression hubness was estimated from a larger number of patients because expression data are the most common type of molecular data from disease studies. (3) Expression hubness has been considered likely to indicate selective pressure in tumor genome evolution and to drive events11.

Identifying expression hubs has therefore been considered a powerful complementary way to identify candidate tumor drivers that are hard to detect from sequence data due to a large number of passenger mutations. The importance of the expression hubness feature (Fig. 2) suggests that these candidate tumor drivers might prove promising markers for drug response. The methylation feature is negatively weighted, consistent with prior knowledge that methylation in a promoter region silences the corresponding gene15.

Overview of our statistical and biological findings

We evaluated our MERGE algorithm in four different ways and we observed that the results were very promising in all four categories of evaluation we performed. Following is a summary of those categories: (1) Consistency of significant gene-drug associations in left-out test data. We compared MERGE to four other methods in terms of consistency of significant gene-drug associations, where we used all 30 patient samples for discovery and validated the discovered associations using (i) 14 cell lines, (ii) independent patient data from 12 new AML specimens. (2) Consistency of significant gene-drug associations within drug functional classes. If a gene X were associated with a drug Y, X would also be likely associated with another drug Y’ with the same mechanism of action (e.g., sunitinib and tandutinib are in the ‘Flt3 inhibitor’ class) (Supplementary Data 2). We compared MERGE to four other methods in terms of within-drug class consistency—the extent to which the gene-drug association was conserved across drugs in the same functional category. (3) Prediction of patient drug response. We compared MERGE to three other methods for evaluating consistency of patient rankings based on actual vs. predicted drug sensitivity. We tested in two ways: (ii) We used one batch containing 12 patient samples for training and a different batch containing 12 patient samples for validation (and vice versa). (ii) We used a leave-one-out cross validation (LOOCV) test to obtain the predicted drug sensitivity across 30 patient samples. (4) Biological interpretation and experimental validation. We discussed top-ranked genes for several drug classes based on our MERGE prioritization method and their associations with the corresponding drugs. Finally, we described results of the experimental validation on one of the top-ranked genes, SMARCA4, whose expression is significantly associated with increased sensitivity to mitoxantrone and etoposide.

Consistency of gene-drug associations in left-out data

We compared MERGE to four conventional methods: (1) Pearson’s P-value of correlation. For each gene-drug pair, we computed the t statistics and the associated p-value, measuring the correlation in a univariate linear regression model. We then selected the gene-drug pairs with the smallest association p-values. (2) Spearman P-value of correlation. For each gene-drug pair, we measured the rank association using the Spearman correlation and selected the gene-drug pairs with the smallest association p-values. (3) ElasticNet. For each drug, we solved the elastic net optimization problem4 using gene expression as input, as done by Barretina et al.3 and Garnett et al.2, and selected the gene-drug associations with the strongest weight. (4) Multi-task learning. We used the multi-task learning method16 implemented by Pong et al.17, which considers each drug as a different task. We then selected the gene-drug associations with the strongest weight.

We discovered gene-drug associations within the data from all 30 samples, and tested them on two independent data sets: (a) 14 CCLE cell lines (Fig. 3a, b) 12 additional AML patients who had relapsed or were refractory to at least two (up to six) prior regimens (Fig. 3b). Supplementary Note 9 shows the clinical information on the additional 12 patients, and Supplementary Data 5 presents the gene expression (processed RNA-seq) and the drug sensitivity (AUC) data from the same patients.

Fig. 3 Comparison of MERGE with four alternative methods in terms of the percentage of the significant associations replicated in the left-out test data. We discovered gene-drug associations within the data from all 30 samples, and we tested on a the 14 cell line samples, and b the data from the additional 12 refractory patient samples. Each gray line corresponds to a random ordering of all genes. We note that the x-axis in b contains a lower total number of genes; some of the genes existing in the microarray gene expression data from the 30 patient and cell line samples did not exist in the RNA-seq data from the 12 refractory patient samples Full size image

Each method prioritized gene-drug pairs differently (Supplementary Note 10). For evaluation, we computed the true discovery rate (that is, how many significant associations were replicated in the left-out test data) when considering the top N genes per drug on average (i.e., 53 × N gene-drug pairs in total). Specifically, we computed the consistency rate (y-axis)—defined as the number of significant gene-drug associations replicated in the left-out test data divided by the total number of significant gene-drug associations within the selected 53 × N gene-drug pairs—for varying values of N (x-axis) from one to all genes. In both settings (Fig. 3a, b), MERGE showed a much higher consistency rate for high-scoring genes (small N values) than the other two methods and the random ordering of genes (gray lines). As N increased, methods had more similar consistency rates because their top N genes became more similar.

Unlike the initial 30 samples, the additional 12 samples were highly refractory, and many of them exhibited extremely poor risk features. This could potentially account for the lower consistency rate in Fig. 3b even though both the discovery and validation samples were obtained from patient specimens. We did not expect that highly refractory patients (refractory after 2–6 prior regimens) would exhibit the same drug sensitivity patterns as the 30 patients with newly diagnosed disease or first or second relapses. In addition, we surmised that the refractory patients may have activated other survival pathways and mechanisms of drug resistance. Still, even in this challenging validation setting, MERGE outperformed four alternative methods.

Three reasons accounted for the poor performance of the ElasticNet and Multi-task learning methods in Fig. 3a, b. First, these multiple regression methods, intended to solve a prediction problem, were not specifically designed to capture robust gene-drug associations. Conversely, MERGE was designed to aggressively decrease the number of false positive gene-drug associations by incorporating prior knowledge about the genes’ potential to drive the disease. Second, since this problem was ultra-high-dimensional (30 samples and ~17 K variables), multiple regression methods were likely to learn models too complex to identify robust gene-drug associations, even with regularization (e.g., elastic net penalty). Finally, since multiple regression methods model each response as an aggregated effect of multiple features, highly correlated features would share a fixed amount of weight. This would assign a small magnitude of weight to each of many correlated features. Many robust gene-drug pairs whose associations would have been replicated in validation data were likely to have been eliminated in this way.

MERGE uses a strong prior and this might have been an important factor for MERGE’s strong feature consistency result. To show that the training data are also a critical factor for the high consistency rate of MERGE, we performed MERGE on 100 different permutations of the data where the training samples in the drug response data are shuffled. We then tested the learned models on both 14 cell lines and the 12 refractory samples, similarly to Fig. 3. As shown in Supplementary Fig. 1, the MERGE run using the original sample ordering achieved a higher consistency rate than most of the permutation runs, and many of the permutation runs perform worse than the competing methods, which shows that MERGE makes use of the information in the training data and the prior information alone is not very helpful when the training data is random.

Drugs in the same classes exhibited similar response pattern

The 53 drugs we considered were classified into 24 broad classes based on their mechanism of action; 15 classes contained >1 drug (Supplementary Data 2). Sixteen of the 53 drugs were shared across two classes, while the other 37 drugs were in a single class. It was easier to define the class of older, more classic drugs. It became more difficult for inhibitors that had differential actions for different targets, so we kept them in a more general class. Different drugs with similar mechanisms of action were expected to show similar response across patients and show similar patterns of gene-expression association. To test this hypothesis in an unbiased way, we first applied an agglomerative hierarchical clustering approach to compare the AUC values of different drugs across 30 patient samples. Figure 4a shows that drugs in at least 10 of 15 classes (with >1 drug) were clustered, which indicates that drugs with the same mechanism of action tended to have similar response patterns (Supplementary Note 11).

Fig. 4 Drug class specificity (DCS) of the associations prioritized based on the MERGE algorithm. a The result of the agglomerative hierarchical clustering that is applied to the AUC values of all drugs across 30 patient samples. AUC values are standardized before applying the hierarchical clustering, and the color bar at the top represents the standardized AUC values. For several branches in the resulting dendrogram, we report the drug classes that have significant enrichment with the drugs in that branch; we also report the corresponding Fisher’s exact test p-values. The result shows that drugs in at least 10 of 15 classes (that contain >1 drug) are expectedly grouped together in the dendrogram; for each group, Fisher’s exact test p-value associated with the overlap between the drug class and the dendrogram group is ≤0.06. b A QQ (quantile–quantile) plot of the observed DCS score p-values from 1000 random permutation tests for all genes with at least one significant gene-drug association. In each permutation test, drug labels were shuffled. Dots falling below the diagonal imply that the overlap with known drug classes is more significant than would be expected by random chance (permutation test p-value = 0.029). c For varying N (x-axis), we plotted the average DCS score over the N genes (y-axis) associated with the top (53 × N) gene-drug pairs based on the five methods in comparison Full size image

For each gene, we also examined whether the drugs with which the gene was significantly associated tended to be clustered into the drug mechanism classes. We computed each gene’s specificity measure, which we referred to as its drug class specificity (DCS) score (Supplementary Note 12). For each gene significantly associated with at least one drug, we then estimated the significance of its DCS score by comparing it with 1000 random DCS scores for the same gene, each from a permutation test where we shuffled the class labels of all drugs. Figure 4b shows a QQ plot of the empirical DCS score p-values from 1000 random permutation tests (y-axis) against the Uniform(0,1) distribution quantiles (x-axis) for each gene. The empirical DCS p-values were significantly lower than random (permutation test p-value: 0.029), indicating the statistical significance of the DCS in gene expression dependency of the drugs overall.

Specificity of gene-drug associations to drug classes

If a gene’s expression level were specifically associated with drugs in the same mechanism class, we would have higher confidence in the observed associations resulting from the underlying biological mechanisms. Genes with expression levels associated with too many drugs in a diverse set of classes would be less likely to be true markers, and the observed associations would more likely be due to confounders. Here, we used the specificity of gene-drug associations to a drug mechanism class as an evaluation metric, and we used the DCS score as the evaluation metric for each gene (as described above). We took the average DCS score (y-axis) over the genes associated with the top gene-drug pairs (i.e., 53 × N gene-drug pairs in total) for varying N values (x-axis) (Fig. 4c). MERGE showed a much higher degree of drug mechanism class specificity than the alternative methods.

Drug sensitivity prediction performance

We next compared MERGE to the following methods that predict patient drug response: (1) ElasticNet4, (2) multi-task learning16,17, and (3) Bayesian multi-task multiple kernel learning (MKL), the winner of the NCI-DREAM Drug Sensitivity Prediction Challenge18. Like MKL, we compared these methods in terms of the consistency among the ranking of patients based on their actual versus predicted drug sensitivity.

We considered the following two settings: (a) we trained a prediction model based on 12 samples in one batch and tested on the 12 samples in another batch, and vice versa; then we averaged prediction accuracy results (Fig. 5a, b). We measured the prediction performance via LOOCV using all 30 patient samples (Fig. 5b).

Fig. 5 Comparison of prediction performance of MERGE to the prediction performances of three other methods: ElasticNet, Bayesian multi-task MKL and multi-task learning. a One batch of the samples is used for training, and a different batch is used for validation. b LOOCV setting. Performance is measured in terms of the Spearman correlation of the predicted response with the actual response. This evaluation metric was used in the NCI-DREAM Drug Sensitivity Prediction Challenge. Each dot corresponds to a different drug, and each color to a different method’s prediction on the x-axis compared to the MERGE prediction on the y-axis. The mean correlation from each of the methods in comparison and the associated p-values from a one-sided Wilcoxon signed-rank test are reported in the legend Full size image

In Fig. 5a, b, we compared MERGE (y-axis) to the three other methods (x-axis) in terms of prediction performance measured by rank correlation between predicted and actual drug response in the test set across patients. Each dot corresponds to one of the 53 drugs, and each color to one of the methods compared to MERGE. In both experimental settings (a and b), MERGE performed competitively with the alternative methods in terms of prediction performance averaged over all drugs. We performed a one-sided Wilcoxon signed-rank test to show the significance of the outperformance of MERGE relative to the methods in comparison. In fact, the p-values are significant at a p ≤ 0.007 level for four out of six comparisons and at a p ≤ 0.1 level for the other two. We show the p-values in the legend of Fig. 5a, b for each comparison, and Supplementary Fig. 2a, b show the detailed performance for each of the drugs. As shown in the legend of Supplementary Fig. 2a, b, MERGE achieves the best prediction performance for a higher number of drugs than each of the three alternative methods. Indeed, in the LOOCV test (Supplementary Fig. 2b), MERGE achieved the best prediction performance for 62% of the drugs.

MERGE identifies the roles of several drug response markers

We now interpret the gene-drug associations identified by MERGE, which offer the potential to make novel discoveries about molecular markers (Fig. 6). We seek to derive hypotheses likely to lead to discoveries or experimental validation targets. Therefore, we further narrowed significant gene-drug associations to: (1) those that were consistently significant in cell lines so we could perform experimental validations, and (2) those that showed a high degree of specificity for a drug mechanism class. For each of the 24 drug mechanism classes, we considered the high MERGE-scoring genes whose associations with drugs were specific to that class (Fisher’s exact test p-value <0.05) and whose associations to that class were conserved in cell lines. Supplementary Table 2 lists these genes for the 20 drug classes that had at least one class-specific gene. Supplementary Data 6 shows the entire list of the genes for each class and the corresponding results in detail. Figure 6a depicts a heat map that shows the level of specificity of each gene (row) to each drug class (column), measured by \(- \log _{10}\) [Fisherʹs exact test p-value] for the top three MERGE-scoring genes in each drug class. Figure 6b highlights the drugs associated with each gene. In Fig. 6b, the red color indicates a negative association between gene expression and drug AUC measure (i.e., high expression indicates low AUC and hence sensitivity), while green indicates a positive association (i.e., high expression indicates resistance). We show in Supplementary Fig. 3 the heat maps for the four alternative methods with which we compared MERGE (in Figs 2, 3c). Supplementary Fig. 4 shows the amount of contribution of each driver feature on the MERGE score for the genes shown in Fig. 6, b.

Fig. 6 The 44 genes in total each of which was identified, by the MERGE approach, as being one of the top three important marker genes for a drug mechanism class. a A heat map that shows the level of specificity of each of the 44 genes (row) to each drug class (column) measured by −log 10 (Fisherʹs exact test p-value). For clarity, we considered only Fisher’s exact test p-value <0.05 to be significant; other values are indicated in yellow. The drug classes that are not assigned by MERGE any genes with associations specific to the class and consistent in the cell line data are not shown. We highlighted the genes whose biological significance, we discussed in the Results section with black-colored boxes. b A heat map that shows the gene-drug association for genes and drug classes shown in a. Yellow indicates that the corresponding gene-drug pair does not have a statistically significant association (genome-wide FDR corrected p-values <0.1), while green indicates a positive and red a negative association. The drugs are grouped by blue lines based on their classes, and the class names for each group are written on top of the heat map. Drugs that are members of more than one drug class (e.g., sunitinib) are shown multiple times for each class to which the drug belongs. The list on the right shows the genes whose biological significance we discussed in the Results section, and the drug classes they are specific to Full size image

The following sections summarize the eight top-ranked genes in some of the major drug classes (FLT3, CASP8AP2, L2HGDH, MNT, BAZ2B, MZF1, BEX2, and SMARCA4) that are highly likely to have notable biological significance in leukemia. We observed that for five of these eight genes (SMARCA4, FLT3, MNT, BAZ2B, and MZF1), MERGE provided a unique path toward identifying their roles that the other four methods simply could not identify (Supplementary Table 3). Except for ElasticNet, the alternative methods identified only one of these eight genes, L2HGDH.

FLT3 expression predicts response to Flt3 inhibitors

FLT3 (FMS like tyrosine kinase 3), the top-ranked gene in the Flt3 inhibitor class (Supplementary Table 2), is a significant expression hub and regulator that is highly mutated (Supplementary Fig. 4). FLT3 mutations are associated with a poor prognosis in AML19,20. The principal type that occurs in about 25% of AML patients is internal tandem duplication (ITD). After treatment with some Flt3 inhibitors21, these patients often develop FLT3 D835 (kinase domain) mutations, which is present in 8% of patients. One study shows a correlation between FLT3 expression levels and prognosis among patients with wild-type FLT3 22.

Our method led to a new finding: the high expression of FLT3 is associated with increased sensitivity to the drugs sunitinib, ponatinib (previously AP24534), midostaurin (PKC412), and tandutinib (Fig. 6a, b). Sunitinib is a multi-targeted receptor tyrosine kinase inhibitor FDA-approved for renal cell carcinoma and imatinib-resistant gastrointestinal stromal tumor. Ponatinib, another multi-targeted receptor tyrosine kinase inhibitor, is FDA-approved for chronic myelogenous leukemia that has failed to respond to first-line inhibitors. Tandutinib is an inhibitor of the type III receptor tyrosine kinases FLT3, platelet-derived growth factor receptor (PDGFR), and KIT. Midostaurin was studied in a prospective, randomized trial in newly diagnosed patients with FLT3 mutations undergoing induction, consolidation, and maintenance therapy, and the group that received midostaurin exhibited prolonged event free (p = 0.0044) and overall survival (p = 0.007) as compared to placebo23. Midostaurin recently received breakthrough therapy designation from the FDA for newly diagnosed FLT3-mutated AML.

Because FLT3 mutation status is an important prognostic indicator in AML with the potential to guide therapy, we compared FLT3 mRNA expression and FLT3 mutation status in terms of significance of correlation with response to 53 drugs. For (a) patients and (b) cell lines, Supplementary Fig. 5 shows the significance of association achieved by FLT3 expression level (y-axis) vs. by FLT3 mutation status (x-axis) for each drug. More dots appear above the diagonal in both Supplementary Fig. 5a, b, which implies that mRNA level achieved a more significant association than FLT3 mutation status for a larger number of drugs (36 vs. 17 in patients; 31 vs. 22 in cell lines). For fair comparison, to generate Supplementary Fig. 5a for both FLT3 mRNA and mutation status, we used only 27 patients for whom mutation status was known. Further, we used Quentmeier et al. [24] as the source for the FLT3 mutation status for the cell lines.

CASP8AP2 expression predicts sensitivity to Bcl2 inhibitors

CASP8AP2 (caspase 8 associated protein 2) is the top-ranked gene in the Bcl2 inhibitor class (Supplementary Table 2). Overexpression of BCL2 lets cancer cells evade apoptosis, the process of programmed cell death. The first Bcl2 inhibitor to gain approval for patients, venetoclax, was approved by the FDA on 11 April 2016 to treat the subset of relapsed patients with chronic lymphocytic leukemia that have the deletion of chromosome 17p, which contains the TP53 gene. The drug is also undergoing evaluation in early phase AML clinical trials (e.g. NCT02203773, NCT02287233 on ClinicalTrials.gov). We found an association between an increased expression of CASP8AP2 with an increased sensitivity to Bcl2 inhibitors (Fig. 6b). Caspases mediate the activation of apoptosis, and low expression of CASP8AP2 has been associated with a poor prognosis in pediatric acute lymphoblastic leukemia25 and an increased risk of relapse26. Moreover, a single report notes a translocation of the poor risk MLL (mixed lineage leukemia) gene to the CASP8AP2 gene in an AML patient with a t(6;11)(q15;q23) translocation27. It is therefore not surprising that expression of a gene in the apoptotic pathways (e.g., a caspase associated protein) would be associated with sensitivity to inhibiting the activity of one of the main antagonists of apoptosis. Restoring apoptosis by inhibiting BCL2 may enhance cell death by increasing the level of expression of apoptotic pathway proteins, such as CASP8AP2.

L2HGDH expression is the predictor for CDK inhibitors

Cyclin-dependent kinase (CDK) inhibitors prevent proliferation of cancer cells. L2HGDH (L-2-hydroxyglutarate dehydrogenase) is an enzyme that catalyzes conversion of L-2-hydroxyglutarate to alpha ketoglutarate. We found that the increased expression of L2HGDH, the only gene associated with the CDK inhibitor class (Supplementary Table 2), was associated with increased sensitivity to CDK inhibitors. Elevated serum 2-hydroxyglutarate (2-HG) levels have been associated with isocitrate dehydrogenase (IDH1 and IDH2) mutations in AML28,29, as well as clinical outcomes28. In addition, elevated 2-HG inhibits the function of TET2, resulting in DNA hypermethylation30. In addition, histone demethylases are competitively blocked by 2-HG31. The elevated expression of L2HGDH would be expected to lower levels of 2-hydroxyglutarate and thus abrogate the deleterious secondary effects of DNA and histone methylation. CDK inhibitor genes were the most frequently downregulated genes in IDH1 mutants32 with high 2-HG levels, thus linking the association we observed with susceptibility to CDK inhibitors with L2HGDH.

MNT and BAZ2B are expression markers for HDAC inhibitors

We found a correlation between susceptibility to HDAC inhibitors and increased expression levels of MNT and BAZ2B (Fig. 6b), top two genes in the HDAC inhibitor class (Supplementary Table 2). MNT encodes the protein Mnt, a member of the Myc/Max/Mad network of transcription factors that controls cell proliferation, differentiation, and death. It is known to be critical for normal myeloid differentiation in AML cell lines, and its loss leads to proliferation33. Mnt has a Sin3-interaction domain (SID) that lets it interact with mSin3A and mSin3B, which recruit histone deacetylases (HDACs), resulting in transcriptional repression34. Therefore, a high expression of MNT would be predictive of whether inhibiting HDACs has an effect.

BAZ2B (Bromodomain Adjacent to Zinc Finger Domain, 2B), is a bromodomain containing protein. The sole function of bromodomain is to recognize acetyl-lysine on histones and non-histone proteins35. Histone deacetylases remove acetyl groups from these sites. This suggests that high expression levels of such genes are associated with chemotherapy drug sensitivity for HDAC inhibitors, bringing a new insight into mechanisms that govern individual patient response to chemotherapy involving regulation of chromatin state. Additionally, a popular class of bromodomain inhibitors has shown potential for treating aggressive leukemia36.

Histone deacetylases impair myeloid differentiation. This is why HDAC inhibitors are potentially useful in therapy and have long been long investigated to treat myelodysplastic syndrome and AML37. Various publications address the efficacy of some HDAC inhibitors in clinical trials in AML, including vorinostat, panobinostat, and romidepsin38,39,40.

BEX2 expression is a potential marker for NFkB inhibitors

NFkB (Nuclear factor kappa B), a transcription factor, is constitutively active in many cancers and inhibits both apoptosis and drug resistance. BEX2, one of the top genes in the NFkB inhibitor class (Supplementary Table 2), is overexpressed in breast cancer and glioma41. In KIT-driven AML, NFkB binds to Sp1 and transactivates KIT42. In addition, inhibiting both NFkB and JNK is effective in AML expressing the tumor necrosis factor43. Lastly, BEX2 expression occurs in AML with translocations involving the MLL gene44.

SMARCA4 and MZF1 are markers for topoisomerase II inhibitors

SMARCA4 and MZF1 are the top two genes in the anthracycline/topoisomerase inhibitor/DNA intercalator class (Supplementary Table 2), and both genes are correlated with sensitivity to the drugs in that class (Fig. 6a). The drugs SMARCA4 is associated with are mitoxantrone, etoposide, and topotecan (Fig. 6b). Mitoxantrone and etoposide are topoisomerase II inhibitors; topotecan is a topoisomerase I inhibitor. SMARCA4 (SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 4) is a component of one of the adenosine triphosphate (ATP)-dependent SWI/SNF-like Brg/Brm-associated factor (BAF) chromatin remodeling complexes. Mutations of SMARCA4 (BRG1) occur in 10–35% of non-small cell lung carcinoma, 15% of Burkitt’s lymphoma, 5–10% of childhood medulloblastoma, and less frequently in other cancers45. Further, SMARCA4 mutations characterize small cell carcinoma of the ovary of the hypercalcemic type46. Furthermore, one recent study demonstrated a direct role for SMARCA4 in facilitating decatenation of DNA by topoisomerase II47. In leukemia, the core ATPase subunits are BRG/SMARCA4 and BRM/SMARCA2, where BRG/SMARCA4 is essential for the proliferation of both normal hematopoietic stem cells and leukemia stem cells48.

MZF1 is a member of the SCAN-zinc finger family of transcription factors, frequently mutated in many types of cancer49. Initially, it was believed to have a role in promoting myeloid malignancy50, but because knockout animals exhibited proliferation of hematopoietic progenitors, it was also thought to potentially suppress hematopoietic malignancy51. MZF1 localizes in the PML-NBs (promyelocytic leukemia nuclear bodies)52, which are protein complexes involved in post-translational modification of nuclear proteins and response to DNA damage53. It was shown that the topoisomerase II inhibitors etoposide and doxorubicin, known to create double-stranded DNA breaks54, increased the number of PML-NBs by a fission mechanism, and then the number became regulated by cell cycle checkpoint control53. Thus, topoisomerase inhibitors directly affect the PML-NBs that incorporate MZF1.

Experimental validation of the marker role of SMARCA4

Given the high level of specificity of the association between SMARCA4 and the topoisomerase inhibitor class and prior knowledge of the relationship between SMARCA4 and topoisomerase II described above, we experimentally validated the association between SMARCA4 and the topoisomerase II inhibitors (etoposide and mitoxantrone) by overexpressing SMARCA4. The U937 leukemia cell line expressed high levels of SMARCA4 protein, while the KG1 leukemia cell line expressed very low levels of the protein, as analyzed by western blot (Fig. 7e, Supplementary Fig. 6) and flow cytometry (Fig. 7g). In vitro high-throughput drug sensitivity testing showed that U937 was much more sensitive to mitoxantrone and etoposide than KG1 (Supplementary Data 1). After selection of stably transfected KG1 cells overexpressing SMARCA4, confirmed by western blot of whole-cell lysates (Fig. 7e, Supplementary Fig. 6) and flow cytometry (Fig. 7g), in vitro chemo sensitivity testing demonstrated enhanced cytotoxicity (reduced viability) compared to non-transfected cells, more pronounced at the 72 h times (Fig. 7a, b) than 48 or 24 h (Supplementary Fig. 7) after addition of mitoxantrone or etoposide. Transfected U937 cells did not show significantly more sensitivity to these drugs compared to non-transfected cells (Fig. 7c, d), likely because U937 already showed a high expression of SMARCA4. Supplementary Table 4 summarizes the AUC and IC 50 values of transfected/non-transfected cells for each cell line and each drug. This result supports the hypothesis that SMARCA4 expression drives sensitivity to etoposide and mitoxantrone.

Fig. 7 SMARCA4 plasmid transfection experiments on cell lines KG1 and U937 for comparison of response to etoposide and mitoxantrone between original and transfected cells. a, b Comparison of the 72-h dose-response curves between KG1 cells (blue) and transfected KG1 cells (red) when cells are treated with (a) etoposide, and (b) mitoxantrone. c, d Comparison of the dose-response curves between U937 cells (blue) and transfected U937 cells (red) when cells are treated with (c) etoposide and (d) mitoxantrone. Three triangular marks at each point on the line indicate individual data points in duplicates and the average among them. The line connects averages of duplicates in each concentration measured. e Representative cropped western blot of control and transfected AML cell lines: KG1, U937, HL60, and MV4.11. Uncropped version is shown in Supplementary Fig. 6. f Quantifications of the SMARCA4 protein expression pattern of each of AML cell lines in (e). g Flow cytometry of SMARCA4 surface expression data confirm the overexpression for KG1 (blue) vs. transfected KG1 (red). h Flow cytometry of SMARCA4 surface expression data for U937 (blue) vs. transfected U937 (red). We note that U937 already strongly expresses SMARCA4, while KG1 exhibits minimal expression until after transfection. Abbreviations in d and f are as follows: PE-A, P-phycoerythrin area; MFI, mean fluorescence intensity; D anti Ri, Donkey anti-Rabbit Full size image

We repeated these experiments using two more cell lines–HL60 and MV4.11–that show higher SMARCA4 expression levels than KG1 but lower levels than U937 (Fig. 7f). HL60 shows lower SMARCA4 expression than MV4.11; thus, we would expect to see more increased sensitivity in HL60 than in MV4.11 when overexpressing SMARCA4. As expected, we saw increased sensitivity to etoposide when HL60 was transfected to overexpress SMARCA4 (Supplementary Fig. 8i) and no sensitivity change for MV4.11 (Supplementary Fig. 8k, l). HL60 had already been very sensitive to mitoxantrone before transfection to overexpress SMARCA4; thus, it was not surprising to see no change in sensitivity after transfection (Supplementary Fig. 8j). HL60 had been as sensitive to mitoxantrone as U937, which is why overexpressing SMARCA4 did not lead to a change in mitoxantrone response of HL60.