Current machine learning techniques enable robust association of biological signals with measured phenotypes, but these approaches are incapable of identifying causal relationships. Here, we develop an integrated “white-box” biochemical screening, network modeling, and machine learning approach for revealing causal mechanisms and apply this approach to understanding antibiotic efficacy. We counter-screen diverse metabolites against bactericidal antibiotics in Escherichia coli and simulate their corresponding metabolic states using a genome-scale metabolic network model. Regression of the measured screening data on model simulations reveals that purine biosynthesis participates in antibiotic lethality, which we validate experimentally. We show that antibiotic-induced adenine limitation increases ATP demand, which elevates central carbon metabolism activity and oxygen consumption, enhancing the killing effects of antibiotics. This work demonstrates how prospective network modeling can couple with machine learning to identify complex causal mechanisms underlying drug efficacy.

Here, we integrate biochemical screening, network modeling, and machine learning to form a white-box machine learning approach to reveal drug mechanisms of action. We apply this approach to elucidating metabolic mechanisms of action for bactericidal antibiotics. We discover that metabolic processes related to purine biosynthesis, driven by antibiotic-induced adenine limitation, participate in antibiotic lethality. We show that adenine limitation increases ATP demand via purine biosynthesis, resulting in elevated central carbon metabolism activity and oxygen consumption, thereby enhancing the killing effects of antibiotics. This work demonstrates how network models can facilitate machine learning activities for biological discovery and provide insights into the complex causal mechanisms underlying drug efficacy.

Antibiotics, a cornerstone of modern medicine, are threatened by the increasing burden of drug resistance, which is compounded by a diminished antimicrobial discovery pipeline (). Although the primary targets and mechanisms of action for conventional antibiotics are well studied (), there is growing appreciation that secondary processes, such as altered metabolism, actively participate in antibiotic efficacy () and that extracellular metabolites may either potentiate () or suppress () the lethal activities of bactericidal antibiotics. Although features of central metabolism () and cellular respiration () are implicated in antibiotic lethality across diverse microbial species (), the biological mechanisms underlying antibiotic-induced changes to metabolism () remain unclear. A deeper understanding of how bacterial metabolism interfaces with antibiotic lethality has the potential to open new drug discovery paradigms (), making antibiotic-induced cellular death physiology an attractive topic to investigate with white-box machine learning.

Chemical and genetic screens are workhorses in modern drug discovery but frequently suffer from poor (1%–3%) hit rates (). Such low hit rates often underpower the bioinformatic analyses used for causal inference because of limitations in biological information content. Experimentally validated network models possess the potential to expand the biological information content of sparse screening data; however, biological screening experiments are typically performed independently from network modeling activities, limiting subsequent analyses to either post hoc bioinformatic enrichment from screening hits or experimental validation of existing models. Therefore, there is a need to develop biological discovery approaches that integrate biochemical screens with network modeling and advanced data analysis techniques to enhance our understanding of complex drug mechanisms (). Here we develop one such approach and apply it to understanding antibiotic mechanisms of action.

Recent advances in high-throughput experimental technologies and data analyses have enabled unprecedented observation, quantification, and association of biological signals with cellular phenotypes. Data-driven machine learning activities are poised to transform biological discovery and the treatment of human disease (); however, existing techniques for extracting biological information from large datasets frequently encode relationships between perturbation and phenotype in opaque “black-boxes” that are mechanistically uninterpretable and, consequently, can only identify correlative as opposed to causal relationships (). In natural systems, biological molecules are biochemically organized in networks of complex interactions underlying observable phenotypes; biological network models may therefore harbor the potential to provide mechanistic structure to machine learning activities, yielding transparent “white-box” causal insights ().

The metabolic modeling simulations further predicted that decreases in oxidative phosphorylation under adenine supplementation lead to decreases in cellular oxygen consumption ( Figure 6 F, left). We tested these modeling predictions using a Seahorse XF analyzer and measured changes in the oxygen consumption rate (OCR) following antibiotic treatment with or without adenine or uracil supplementation. Antibiotic treatment with AMP, CIP, or GENT increased the cellular oxygen consumption rate ( Figure 6 F, black), in contrast to control conditions ( Figure S5 E), supporting previous observations that cellular respiration is important for antibiotic lethality (). Importantly, adenine supplementation significantly repressed changes in cellular oxygen consumption under antibiotic treatment ( Figure 6 F, red), consistent with model predictions, whereas uracil enhanced cellular oxygen consumption ( Figure 6 F, blue). These results directly support the hypothesis that central carbon metabolism activity and cellular respiration are increased under antibiotic stress to satisfy the elevated ATP demand resulting from purine biosynthesis. Collectively, our data and simulations indicate that adenine limitation resulting from antibiotic treatment drives purine biosynthesis, which increases ATP demand, fueling the redox-associated metabolic alterations involved in antibiotic lethality ( Figure 7 ).

In addition to the lethal effects of inhibiting their primary targets, bactericidal antibiotics disrupt the nucleotide pool, depleting intracellular purines and inducing adenine limitation. Adenine limitation triggers purine biosynthesis, increasing ATP demand, which drives increased activity through central carbon metabolism and cellular respiration. Toxic metabolic byproducts generated by this increased metabolic activity damage DNA and exacerbate antibiotic-mediated killing. Futile cycles and other stress-induced phenomena may also elevate ATP demand.

We tested these metabolic modeling predictions by quantifying the intracellular concentrations of central carbon metabolism and energy currency metabolites from E. coli cells grown in MOPS minimal medium and supplemented with either adenine or uracil ( Figure 6 B; Table S8 ). Under these conditions, cell growth did not significantly change in the first hour of supplementation ( Figure S5 A), but intracellular adenine nucleotides did accumulate under exogenous adenine addition ( Figure S5 B). Consistent with model predictions that adenine supplementation would inhibit succinate dehydrogenase activity, intracellular succinate increased, whereas intracellular fumarate decreased ( Figure 6 C). Model simulations additionally predicted that ATP synthesis would decrease under adenine supplementation ( Figure 6 D, left). Consistent with this, we observed a modest decrease in the adenylate energy charge ( Figure 6 D, right), an index for the energy state of a cell (). We also examined the relative changes in intracellular nicotinamide adenine dinucleotides under adenine or uracil supplementation ( Figure S5 C) and observed a modest decrease in the NADPH:NADPratio, but not the NADH:NADratio, following exogenous adenine addition ( Figure 6 E). Together, these results support the model predictions that adenine supplementation decreases central carbon metabolism activity (decreased adenylate energy charge) and cell anabolism (decreased NADPH:NADPratio) without significantly changing cell catabolism (unchanged NADH:NADratio) ( Figure S5 D;).

Data are represented as mean ± SEM from n ≥ 3 independent biological replicates. Significance reported as FDR-corrected p values in comparison with control: †: p ≤ 0.1, ∗ p ≤ 0.05, ∗∗ p ≤ 0.01, ∗∗∗ p ≤ 0.001.

(E) Cellular respiration following adenine or uracil supplementation in the absence of antibiotic treatment.

Adenine nucleotide concentrations and turnover rates. Their correlation with biological activity in bacteria and yeast.

Charges of nicotinamide adenine nucleotides and adenylate energy charge as regulatory parameters of the metabolism in Escherichia coli.

Adenine nucleotide concentrations and turnover rates. Their correlation with biological activity in bacteria and yeast.

Purine biosynthesis is energetically expensive, costing eight ATP molecules to synthesize one adenine molecule from one glucose molecule (). To better understand the mechanistic basis for the observed differences in antibiotic lethality under adenine or uracil supplementation, we examined the simulated metabolic network states corresponding to these perturbations ( Table S5 ). Model simulations predicted that adenine supplementation would decrease purine biosynthesis and, consequently, decrease ATP utilization by nucleotide synthesis and salvage reactions whereas uracil supplementation would not ( Figure 6 A). Model simulations also predicted that, as a result of these changes, overall flux through central carbon metabolism pathways would decrease, reducing the activity of enzymes involved in cellular respiration and oxidative phosphorylation, such as succinate dehydrogenase ( Figure S4 ). These modeling results are consistent with previous observations that glycolytic flux is controlled by ATP demand ().

Data are represented as mean ± SEM from n = 3 independent biological replicates. Significance is reported as FDR-corrected p values in comparison with the control: †p ≤ 0.1, ∗ p ≤ 0.05, ∗∗ p ≤ 0.01, ∗∗∗∗ p ≤ 0.0001.

(F) Cellular respiration following adenine or uracil supplementation during antibiotic treatment. Metabolic modeling simulations predict a decrease in oxygen consumption following adenine supplementation (left), reported by the oxygen exchange reaction. Adenine supplementation (red) reduces respiratory activity, whereas uracil (blue) increases respiratory activity. Changes in the oxygen consumption rate following treatment with AMP, CIP, or GENT and adenine or uracil supplementation were measured using the Seahorse extracellular flux analyzer.

(E) NADPH:NADPand NADH:NADratios following adenine or uracil supplementation. Metabolomic measurements of intracellular NADPH, NADP, NADH, and NAD Figure S5 C) reveal modest decreases in the NADPH:NADPratio following adenine supplementation (left), indicating reduced anabolic metabolism. The NADH:NADratio is largely unchanged (right), indicating preserved catabolic metabolism.

(D) ATP synthesis following adenine or uracil supplementation. Metabolic modeling simulations predict a decrease in ATP synthesis following adenine supplementation (left), reported by the ATP synthase reaction. Metabolomic measurements of intracellular ATP, ADP, and AMP ( Figure S5 B) reveal a similar decrease in adenylate energy charge following adenine supplementation (right).

(C) Intracellular succinate or fumarate concentrations following adenine or uracil supplementation. Adenine supplementation increases intracellular succinate and decreases intracellular fumarate, consistent with model predictions for inhibited succinate dehydrogenase activity (A, right).

(B) Intracellular adenine or uracil concentrations following adenine or uracil supplementation. Intracellular metabolite concentrations were measured by targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS).

(A) Metabolic modeling predictions. Adenine supplementation decreases activity through purine biosynthesis, consequently decreasing ATP utilization by purine biosynthesis, central carbon metabolism, and oxidative phosphorylation ( Figure S4 ) in comparison with the simulated control (CTL). E. coli metabolism under adenine (ADE) or uracil (URA) supplementation was simulated by parsimonious flux balance analysis (pFBA) in the iJO1366 metabolic model, with exchange reactions for adenine or uracil opened, respectively. Nucleotide biosynthesis activity was computed by summing fluxes through reactions in the purine and pyrimidine biosynthesis subsystem (left). ATP consumption was summed across all reactions in the purine and pyrimidine biosynthesis and nucleotide salvage pathway subsystems (center left). Central carbon metabolism activity was computed by summing fluxes through reactions in the glycolysis and TCA cycle subsystems (center right). Oxidative phosphorylation is proxied by the succinate dehydrogenase reaction (right); additional oxidative phosphorylation reactions are depicted in Figure S4 . All fluxes were normalized by the biomass objective function.

The glycolytic flux in Escherichia coli is controlled by the demand for ATP.

Bactericidal antibiotics significantly alter bacterial metabolism as part of their lethality, increasing the abundance of intracellular central carbon metabolites and disrupting the nucleotide pool (). Nucleotide pool disruptions include rapid depletion of free intracellular adenine, guanine, and cytosine and marked accumulation of intracellular uracil ( Figure S3 ). Additionally, nucleotide biosynthesis pathways auto-regulate, with internal feedback inhibition driven biochemically by their nucleotide end products ( Figure 5 A;). Based on the predictions from our white-box machine learning approach and the above observations, we hypothesized that purine supplementation would rescue antibiotic-induced purine depletion and, consequently, decrease the demand for purine biosynthesis, reducing antibiotic lethality. Of note, supplementation with adenine ( Figure 5 B, red), but not guanine, decreased antibiotic lethality in wild-type cells; these results suggest that adenine limitation rather than guanine limitation drives purine biosynthesis activity under antibiotic stress. We also hypothesized that pyrimidine supplementation would inhibit pyrimidine biosynthesis and promote purine biosynthesis activity via prpp accumulation and, consequently, increase antibiotic lethality. Indeed, supplementation with uracil or cytosine potentiated antibiotic lethality ( Figure 5 C, blue). Collectively, these data support the hypothesis that purine biosynthesis participates in antibiotic lethality and suggest that antibiotic-induced purine biosynthesis is driven by adenine limitation.

(A) Feedback inhibition in the purine and pyrimidine biosynthesis pathways. Purine and pyrimidine biosynthesis auto-regulate through internal feedback inhibition by nucleotide end products.

Purine nucleic acid bases (A: adenine, G: guanine) are depleted (red), while pyrimidine nucleic acid bases (C: cytosine, T: thymine, U: uracil) accumulate (blue) in E. coli cells treated with ampicillin (AMP), norfloxacin (NOR) or kanamycin (KAN). Data reanalyzed from. Data are represented as mean ± SEM from n = 3 independent biological replicates.

We further hypothesized that stimulation of purine biosynthesis would elicit opposite effects on antibiotic lethality than inhibition by these genetic and biochemical perturbations. Indeed, biochemical supplementation with the purine biosynthesis substrates phosphoribosyl pyrophosphate (prpp) and glutamine (gln) ( Figure 4 A, blue) led to increased AMP and CIP lethality and decreased GENT lethality ( Figure 4 E). Collectively, these data support the model-driven hypothesis that purine biosynthesis participates in antibiotic lethality and demonstrate how model-guided machine learning can provide reductive, hypothesis-driven mechanistic insights into drug efficacy.

Cells deficient for glyA (serine hydroxymethyltransferase), which participates in producing tetrahydrofolate co-factors through the folate cycle, also exhibited decreased AMP and CIP lethality but increased GENT lethality ( Figure 4 D). Similar phenotypes were observed under combination treatment with trimethoprim, a potent biochemical inhibitor of FolA (dihydrofolate reductase) ( Figure S2 B), consistent with previous findings ().

Synergistic activity of gentamicin with trimethoprim or sulfamethoxazole-trimethoprim against Escherichia coli and Klebsiella pneumoniae.

Motivated by the above model-guided machine learning predictions, we sought to test whether perturbations to purine biosynthesis would alter antibiotic lethality. From the predictions, we hypothesized that genetic deletion of enzymes involved in purine metabolism would exert differential effects on AMP and CIP lethality compared with GENT lethality. Indeed, E. coli mutants deficient for purD (glycinamide ribonucleotide synthetase), purE (N-carboxyaminoimidazole ribonucleotide mutase), purK (5-(carboxyamino)imidazole ribonucleotide synthase), or purM (phosphoribosylformylglycinamide cyclo-ligase), early steps in purine biosynthesis ( Figure 4 A), exhibited significant decreases in AMP and CIP lethality but increased GENT lethality compared with the wild type ( Figure 4 B). Similarly, biochemical inhibition of purine biosynthesis with 6-mercaptopurine, a PurF (amidophosphoribosyltransferase) inhibitor, decreases AMP and CIP lethality but increases GENT lethality ( Figure 4 C). These effects appear to be specific to purine metabolism because genetic deletion of enzymes involved in pyrimidine biosynthesis did not elicit significant differences in AMP, CIP, or GENT lethality ( Figure S2 A).

(B) Biochemical disruption of the folate cycle by trimethoprim (TRI) decreases AMP and CIP lethality, but increases GENT lethality.

(A) Antibiotic lethality in pyrimidine biosynthesis deletion mutants. Genetic inhibition of pyrimidine biosynthesis by pyrC (dihydroorotase) or pyrE (orotate phosphoribosyltransferase) deletion does not significantly change ampicillin (AMP), ciprofloxacin (CIP) or gentamicin (GENT) lethality.

(E) Antibiotic lethality following enhanced purine biosynthesis. Substrate-level stimulation of purine biosynthesis with prpp and glutamine (gln) supplementation increases AMP and CIP lethality but decreases GENT lethality.

(D) Antibiotic lethality in a glyA (serine hydroxymethyltransferase) deletion mutant. Genetic inhibition of glycine (gly) and N 10 -formyl-tetrahydrofolate (10fthf) by glyA deletion decreases AMP and CIP lethality but increases GENT lethality.

(C) Antibiotic lethality following biochemical inhibition of purine biosynthesis. Biochemical inhibition of PurF (amidophosphoribosyltransferase) by 6-mercaptopurine (6-MP) decreases AMP and CIP lethality but increases GENT lethality.

(B) Antibiotic lethality in purine biosynthesis deletion mutants. Genetic inhibition of purine biosynthesis by purD (glycinamide ribonucleotide synthetase), purE (N 5 -carboxyaminoimidazole ribonucleotide mutase), purK (5-(carboxyamino)imidazole ribonucleotide synthase), or purM (phosphoribosylformylglycinamide cyclo-ligase) deletion decreases AMP and CIP lethality but increases GENT lethality.

Interestingly, a second cluster appeared, possessing purine biosynthesis pathways (“superpathway of histidine, purine, and pyrimidine biosynthesis” and “superpathway of purine nucleotides de novo biosynthesis II”) with shared directionality between AMP and CIP and opposite directionality for GENT. To our knowledge, purine biosynthesis has not been implicated previously as a mechanism of antibiotic lethality from any biochemical or chemogenomic screen. To better understand these differences in pathway directionality, we examined the regression coefficients for each reaction and computed a reaction score by log-transforming their magnitudes. These analyses identified early steps in the purine biosynthesis pathway as being primarily responsible for the predicted differences for AMP and CIP from GENT ( Figure S1 ). These findings illustrate how white-box machine learning can reveal new mechanisms of action with high biochemical specificity.

Differences in purine biosynthesis pathway scores for ampicillin (AMP) and ciprofloxacin (CIP) from gentamicin (GENT) are primarily explained by early reactions in the purine biosynthesis pathway (gray box).

Because our white-box machine learning approach yields pathway mechanisms, we can quantify the relative contributions of each metabolic pathway to the lethal mechanisms of each antibiotic. We computed pathway scores for each pathway and antibiotic by performing least-squares regression on the changes in antibiotic ICand then log-transforming the average non-zero regression coefficients for all reactions in each pathway. Identified pathways primarily clustered into three groups based on their pathway scores ( Figure 3 ). One cluster possessed central carbon metabolism pathways (“superpathway of glycolysis, pyruvate dehydrogenase, tricarboxylic acid [TCA], and glyoxylate bypass”; “superpathway of glyoxylate bypass and TCA”; and “TCA cycle I (prokaryotic)”) with similar pathway directionality for AMP, CIP, and GENT (indicated by the sign of the pathway score). These findings are consistent with several studies demonstrating the TCA cycle to be a shared mechanism in antibiotic lethality () and validate the fidelity of our white-box machine learning approach.

Shown are pathway scores for metabolic pathways identified by white-box machine learning. Identified pathways include several central carbon metabolism and nucleotide biosynthesis pathways, and these cluster into three groups based on pathway score. Central metabolism pathways primarily exhibit a similar pathway directionality for AMP, CIP, and GENT, whereas purine biosynthesis pathways exhibit a different pathway score directionality for GENT than from AMP or CIP. Pathway scores were computed for each antibiotic by log-transforming the average regression coefficient for all non-zero reactions annotated in a given pathway.

For each antibiotic, metabolic pathway mechanisms were identified by first conducting a dimension-reducing machine learning regression task and then performing hypergeometric statistical testing on metabolic reactions comprising the resulting predictive model using pathway-reaction sets curated by Ecocyc. The measured changes in antibiotic ICwere jointly learned on the set of simulated metabolic network states using a multitask elastic net (), yielding 477 reactions predicted to alter antibiotic lethality. For each antibiotic, reactions with coefficients whose magnitude were less than or equal to half the SD of all coefficients were removed to exclude spurious reactions selected by joint learning. For AMP, CIP, and GENT, this yielded 189, 208, and 204 reactions, respectively ( Table S6 ). Next, hypergeometric statistics were performed on Ecocyc-curated pathways. Of the 431 metabolic pathways curated by Ecocyc, only 13 were found to be statistically significant, with less than a 5% FDR for at least one antibiotic ( Table S7 ).

We next applied our white-box machine learning approach and prospectively modeled metabolic network states corresponding to supplementation with each metabolite used in the screen. For each metabolite, metabolic states were simulated by first adding exchange reactions to the E. coli metabolic model, which enabled uptake of each metabolite from the extracellular environment. We then performed parsimonious flux balance analysis (pFBA) () under conditions simulating MOPS minimal medium and optimized for the biomass objective function ( Table S5 ). Although this approach does not explicitly model contributions by gene expression toward changes in metabolism, benchmarking studies demonstrate that principles of growth maximization and parsimony are sufficient for accurately predicting metabolism in defined metabolic environments ().

Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism.

To test the capabilities of conventional bioinformatic analyses to yield mechanistic insights into how the screened metabolites alter antibiotic lethality, we first performed an enrichment analysis of metabolites that elicited a 2-fold or more change in IC, a conventional definition for a screening “hit” ( Table S3 ). For each antibiotic, a metabolite set enrichment analysis was performed in Ecocyc. For AMP (2 metabolites ≥ 2-fold change in IC) and GENT (8 metabolites ≥ 2-fold change in IC), no pathways were enriched with less than a 5% false discovery rate (FDR) (q ≤ 0.05). For CIP (19 metabolites ≥ 2-fold change in IC), several non-specific pathways related to protein translation were identified, with top enrichments including “aminoacyl-tRNA charging” (p = 1.98e−6), “proteinogenic amino acid biosynthesis” (p = 2.50e−6), and “amino acid degradation” (p = 1.27e−5) ( Table S4 ). These findings are consistent with previous observations that protein translation inhibitors generally exert antagonistic effects on antibiotic lethality (). Collectively, these results illustrate two common weaknesses in conventional bioinformatic approaches for analyzing biochemical screens: statistical power limitations and low-specificity associations.

Changes in antibiotic ICvalues were modest; in most cases, less than 2-fold ( Figure 2 B; Table S2 ). Hierarchical clustering of the measured ICvalues revealed that the metabolite response profiles differed between AMP, CIP, and GENT, highlighting their different biochemical targets. However, several metabolites appeared to commonly potentiate or inhibit efficacy across multiple antibiotics, indicating shared metabolic mechanisms of action. Interestingly, many nitrogen, phosphorus, and sulfur metabolites increased antibiotic ICvalues, whereas many carbon metabolites decreased ICvalues, similar to previous observations (). These raw data indicate that the measured antibiotic lethality responses to metabolite perturbations occurred through specific metabolic pathways rather than generically as a response to medium enrichment.

Input-output relationships between E. coli metabolism and antibiotic lethality were systematically quantified by measuring antibiotic ICvalues following supplementation with metabolites known to participate in E. coli metabolism ( Figure 2 A). To avoid the potentially confounding effects of stationary-phase physiology on antibiotic tolerance, we performed experiments using exponentially growing E. coli MG1655 cells. These cells were grown in 3-(N-morpholino)propanesulfonate (MOPS)-defined minimal medium () and systematically screened with an unbiased and semi-comprehensive library of metabolites against AMP, CIP, and GENT. Screened metabolites were derived from the Biolog phenotype microarrays (PMs) 1–4 (), which are comprised of diverse carbon, nitrogen, phosphorus, and sulfur species. These PMs contain 206 unique amino acids, carbohydrates, nucleotides, and organic acids that are included in the iJO1366 genome-scale model of E. coli metabolism. Antibiotic responses to these 206 metabolites were used for subsequent analyses ( Table S1 ).

(B) Antibiotic IC 50 responses to metabolite supplementation. Metabolically induced sensitivity profiles differ by antibiotic, but several metabolites commonly protect (red) or sensitize (blue) cells to multiple antibiotics. Carbon metabolites were screened using Biolog PMs 1 and 2, nitrogen metabolites were screened using Biolog PM 3, and phosphorus and sulfur metabolites were screened using Biolog PM 4.

(A) Overall experimental design for measuring metabolite effects on antibiotic lethality. Overnight cultures of E. coli MG1655 were inoculated into MOPS minimal medium, grown to early exponential phase, and back-diluted to OD= 0.1. Cells were dispensed into Biolog phenotype microarray (PM) plates 1–4 () with different concentrations of ampicillin (AMP), ciprofloxacin (CIP), or gentamicin (GENT) added. ODwas measured after 4 h of incubation at 37°C and shaking at 900 rpm. Antibiotic ICvalues were estimated for each antibiotic-metabolite combination.

Here we applied this integrated screening-modeling-learning approach to investigate metabolic mechanisms of antibiotic lethality, demonstrating the ability of this workflow to reveal new mechanistic insights ( Figure 1 C). Specifically, we designed biochemical screens to measure the effects of diverse metabolite supplementations on the lethality of three bactericidal antibiotics: ampicillin (AMP, a β-lactam), ciprofloxacin (CIP, a fluoroquinolone), and gentamicin (GENT, an aminoglycoside). We screened combinations of these antibiotics and metabolites in Escherichia coli, measuring their antibiotic half-maximal inhibitory concentrations (IC) after 4 h of treatment. Next we prospectively simulated metabolic network states corresponding to each metabolite perturbation using the iJO1366 genome-scale model of E. coli metabolism () with quantitative information from the biochemical screens as modeling constraints. These simulations comprehensively yield flux estimates for each metabolic reaction in E. coli under each screening condition. For each antibiotic, we applied machine learning regression analyses to train a predictive model that could reveal pathway mechanisms underlying differences in antibiotic lethality measured in our screen. These pathways were identified by regularizing the simulated metabolic network states, regressing the measured ICvalues, and performing enrichment analyses from metabolic pathway annotations curated in Ecocyc v.22.0 ().

Our approach integrates biochemical screening with prospective network modeling to provide mechanistically linked training data for machine learning ( Figure 1 B). In contrast to existing data-driven approaches, which generate predictive models from only the variables or perturbations available in a screen, we first use prospective network modeling to quantitatively transform screening perturbations into biologically enriched network states. Biological information from experimental screens are applied as boundary conditions to the network simulations, computing a network representation for each perturbation in the screen (e.g., metabolic fluxes following metabolite perturbations). These network representations are then used as input data to train predictive models with the empirical screening measurements (e.g., quantified cellular phenotypes in response to screening perturbations) as output data. Because biological networks are mechanistically constructed, the features comprising the predictive models trained by machine learning are, by definition, mechanistically causal and represent tangible biochemical species that can be directly tested experimentally.

Machine learning aims to generate predictive models from sets of training data; such activities are typically comprised of three parts: input data, output data, and the predictive model trained to compute output data from input data ( Figure 1 A;). Although modern machine learning methods can assemble high-fidelity input-output associations from training data, the functions comprising the resulting trained models often do not possess tangible biochemical analogs, rendering them mechanistically uninterpretable. Consequently, predictive models generated by such (black-box) machine learning activities are unable to provide direct mechanistic insights into how biological molecules are interacting to give rise to observed phenomena. To address this limitation, we developed a white-box machine learning approach, leveraging carefully curated biological network models to mechanistically link input and output data ().

(C) E. coli MG1655 cells were treated with three bactericidal antibiotics at 13 or more different concentrations. Antibiotic ICvalues were quantified following supplementation with 206 diverse metabolites and normalized by their on-plate controls. Metabolic network states corresponding to each metabolite were prospectively simulated using the iJO1366 model of E. coli metabolism (). For each antibiotic, changes in ICwere regressed on the simulated fluxes, and pathway mechanisms were identified by hypergeometric testing on metabolic pathways curated by Ecocyc (). Identified pathways were validated experimentally.

(B) An overall framework for white-box machine learning. Input screening perturbations (e.g., metabolite conditions; gray) are first transformed into enriched biological network states by prospective network modeling (e.g., metabolic fluxes; blue). These network simulations are then used as machine learning inputs to train a predictive model (purple), revealing pathway mechanisms underlying the output data (e.g., antibiotic lethality measurements; red). Because biological networks are mechanistically constructed, features comprising the predictive models trained by machine learning are, by definition, mechanistically causal.

(A) Machine learning activities are typically comprised of three parts: input data (blue), output data (red), and a predictive model trained to compute output data from input data (purple).

Discussion

Ideker et al., 2001 Ideker T.

Galitski T.

Hood L. A new approach to decoding life: systems biology. Ma et al., 2018 Ma J.

Yu M.K.

Fong S.

Ono K.

Sage E.

Demchak B.

Sharan R.

Ideker T. Using deep learning to model the hierarchical structure and function of a cell. Network modeling has long provided a foundation for systems biology (), and researchers are now beginning to integrate machine learning with retrospective network modeling for improving the fidelity of genotype-to-phenotype predictions (). Such activities demonstrate how hierarchically organized prior knowledge can deconvolve complex biological data; however, these efforts rely on post hoc analyses of experimental data and can only perform inductive association of phenotypes with perturbations rather than deductive identification of the causal mechanisms driving phenotypes. Here we present a complementary approach, combining machine learning with prospective network modeling to infer biological mechanisms based on their combined information content.

Vander Heiden and DeBerardinis, 2017 Vander Heiden M.G.

DeBerardinis R.J. Understanding the Intersections between Metabolism and Cancer Biology. Kanarek et al., 2018 Kanarek N.

Keys H.R.

Cantor J.R.

Lewis C.A.

Chan S.H.

Kunchok T.

Abu-Remaileh M.

Freinkman E.

Schweitzer L.D.

Sabatini D.M. Histidine catabolism is a major determinant of methotrexate sensitivity. Brunk et al., 2018 Brunk E.

Sahoo S.

Zielinski D.C.

Altunkaya A.

Dräger A.

Mih N.

Gatto F.

Nilsson A.

Preciat Gonzalez G.A.

Aurich M.K.

et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. White-box machine learning can be broadly extended across diverse biological systems and, as demonstrated here, be impactful for revealing drug mechanisms of action for treating human diseases. For instance, cell metabolism is increasingly recognized as being important in cancer pathogenesis (), and histidine metabolism was recently demonstrated to participate in the efficacy of some cancer therapeutics (). Similar to the present work on antibiotics, cancer drugs may be counter-screened against a library of metabolites in human cancer cells and coupled with network simulations using models of human metabolism () to discover metabolic mechanisms of action for existing cancer drugs. Insights gained by such an approach may help guide the design of cancer treatment regimens, accounting for a tumor’s local metabolic microenvironment and leveraging metabolic perturbations to optimize treatment outcomes.

Keenan et al., 2018 Keenan A.B.

Jenkins S.L.

Jagodnik K.M.

Koplev S.

He E.

Torre D.

Wang Z.

Dohlman A.B.

Silverstein M.C.

Lachmann A.

et al. The Library of Integrated Network-Based Cellular Signatures NIH Program: System-Level Cataloging of Human Cells Response to Perturbations. Litichevskiy et al., 2018 Litichevskiy L.

Peckner R.

Abelin J.G.

Asiedu J.K.

Creech A.L.

Davis J.F.

Davison D.

Dunning C.M.

Egertson J.D.

Egri S.

et al. A Library of Phosphoproteomic and Chromatin Signatures for Characterizing Cellular Responses to Drug Perturbations. Wang et al., 2014 Wang T.

Wei J.J.

Sabatini D.M.

Lander E.S. Genetic screens in human cells using the CRISPR-Cas9 system. Moreover, our integrated screening-modeling-learning approach is agnostic to the experimental datasets and network models used to train machine learning models. NIH Common Fund programs such as Library of Integrated Network-Based Cellular Signatures (LINCS) and Big Data to Knowledge are providing increasingly comprehensive measurements of cellular physiology in response to genetic or small-molecule perturbations (). Our white-box machine learning approach could be extended to such datasets to reveal molecular mechanisms mediating cellular responses to biochemical stimuli. For instance, simulations may be performed on human signaling networks to transform LINCS small-molecule perturbations into signaling network configurations that can be utilized as input data to learn signaling mechanisms of epigenetic regulation from measured chromatin signatures (). Similarly, prospective network simulations may be performed on gene-regulatory networks to interpret CRISPR screening perturbations () and reveal transcriptional programs underlying screened phenotypes.