(a) Basic workflow for identifying drug-induced perturbed pathways and linking them to their indication. (b) Detailed workflow for the Genetic Algorthm: 1) Inputs to the algorithm are a set of response variables for each gene expression set (either MetCHANGE scores or gene expression changes), a binary presence/absence vector for whether each sample was treated with a drug that has the side effect or indication, and the desired maximum number of predictor variables desired. The latter was set based on the number of treated gene expression sets in order to minimize the potential for overfitting. 2) At initiation, the genetic algorithm generates a ‘population’ of random guesses at the predictor variables, termed ‘individuals’, and assigns them either a value of -1, 0, or 1. For each individual, all gene expression samples are scored as the response variables (MetCHANGE or gene expression changes) multiplied by the candidate signature. 3) Each gene expression sample is then ranked and a receiver operator characteristic (ROC) curve is generated and area under the curve (AUC) is calculated using the input presence/absence vector for the side effect or indication. The sample AUCs are the maximization objective of the genetic algorithm. 4) The genetic algorithm subroutines are then used to generate a new population, biasing towards higher AUCs. Best solutions are maintained without modification, and lower scoring individuals are combined (‘crossed over’) and modified (‘mutated’) to search the solution space in a heuristic fashion. The termination criteria is typically a number of generations without improvement; however, we applied a simple maximum time termination criteria, as obtaining a global optimum was not deemed essential to gain biological insight. 5) The signature yielding the highest prediction AUC is considered the best predictor set. In the example case, the resultant AUC is 1.0, a perfect predictor for the sample set. 6) To assess overfitting and hence the predictive potential of the metabolic signature, 10-fold cross validation is performed by generating 10 partitions of 90% of the data to train signatures and predict the remaining 10 partitions of 10% of the data. To find signatures that have constant predictive power, the cross validation signatures were summed, and high scoring metabolites were considered the conserved metabolic response signature for the side effect or indication.