Late-stage or post-market identification of adverse drug reactions (ADRs) is a significant public health issue and a source of major economic liability for drug development. Thus, reliable in silico screening of drug candidates for possible ADRs would be advantageous. In this work, we introduce a computational approach that predicts ADRs by combining the results of molecular docking and leverages known ADR information from DrugBank and SIDER. We employed a recently parallelized version of AutoDock Vina (VinaLC) to dock 906 small molecule drugs to a virtual panel of 409 DrugBank protein targets. L1-regularized logistic regression models were trained on the resulting docking scores of a 560 compound subset from the initial 906 compounds to predict 85 side effects, grouped into 10 ADR phenotype groups. Only 21% (87 out of 409) of the drug-protein binding features involve known targets of the drug subset, providing a significant probe of off-target effects. As a control, associations of this drug subset with the 555 annotated targets of these compounds, as reported in DrugBank, were used as features to train a separate group of models. The Vina off-target models and the DrugBank on-target models yielded comparable median area-under-the-receiver-operating-characteristic-curves (AUCs) during 10-fold cross-validation (0.60–0.69 and 0.61–0.74, respectively). Evidence was found in the PubMed literature to support several putative ADR-protein associations identified by our analysis. Among them, several associations between neoplasm-related ADRs and known tumor suppressor and tumor invasiveness marker proteins were found. A dual role for interstitial collagenase in both neoplasms and aneurysm formation was also identified. These associations all involve off-target proteins and could not have been found using available drug/on-target interaction data. This study illustrates a path forward to comprehensive ADR virtual screening that can potentially scale with increasing number of CPUs to tens of thousands of protein targets and millions of potential drug candidates.

Introduction

Adverse drug reactions (ADRs) are detrimental, rare and complex perturbations of biological pathways by pharmacologically active small molecules. Each year ADRs cause 100,000 fatalities in the US [1]. One cost estimate of drug-related morbidity and mortality is $177 billion annually [2], which is comparable to the public health burden of chronic illnesses like diabetes ($245 billion in 2012 [3]). A systematic and accurate capability for reliably ruling out severe ADRs early in the drug development process currently does not exist. As a result, billions of research and development dollars are wasted as drugs present with serious ADRs either in late stage development or post-market approval. Highly publicized examples of phase IV failures include rosiglitazone (“Avandia”) [4] and rofecoxib (“Vioxx”) [5]. Early identification of serious ADRs would be ideal.

Although many ADRs are multi-factorial and depend on patient- and treatment-specific factors (e.g. genetic polymorphisms and medical history of the patient, treatment dosages, environmental exposures, dynamics and kinetics of the relevant systems biology, etc.), all ADRs are initiated by the binding of a drug molecule to a target, whether these binding events are intended, on-target binding or promiscuous binding to one or more off-target proteins. Currently, pharmaceutical companies commonly employ experimental in vitro toxicity panels to assay small molecule binding to potentially critical protein receptors [6]. Unfortunately, these panels probably do not include all of the proteins and receptors needed for high-accuracy prediction of serious ADRs [7]. Even if it were known how to augment toxicity panels to include a minimally complete set of receptors relevant for serious ADRs, there is uncertainty about how efficiently it could be screened.

An in silico platform that could accurately predict serious ADRs prior to costly in vitro screening panels and clinical safety trials is highly desirable and has been the focus of several recent studies.

A popular approach is to data-mine the publicly available databases for experimentally elucidated interrelationships between the chemical structures of drugs, their known interactions with proteins (most often their intended targets), and their known ADR profiles. An early study by Fliri and co-workers [8] clustered drugs based on their ability to inhibit a selected set of proteins. They showed that similar inhibition profiles indicate a similar set of side effects. More recently, Cobanoglu and co-workers [9] performed probabilistic matrix factorization on a 1,413 drug×1,050 known target protein matrix to learn a latent variable correlation structure between drugs and proteins. Drugs were then clustered in this latent variable space, and it was found that drugs with similar therapeutic actions clustered together, independent of similarities in chemical structure. A highly cited effort by Campillos et al. [10] indicated that drugs with similar side effects have a correspondingly similar profile of protein targets. Another series of studies applied statistical machine learning approaches like support vector machines and sparse canonical correlation analysis (SCCA) to publicly available datasets to train models for ADR prediction. Pauwels et al. [11] used SCCA to relate PubChem [12] chemical substructure fingerprints of 888 approved drugs to 1385 side effects in SIDER. Yamanishi and co-workers [13] used a similar approach to integrate drug-protein target data found in DrugBank and Matador with PubChem fingerprints to predict 969 SIDER side effects, applying both SCCA and a kernel regression method. They used the models to predict side effects in 730 previously uncharacterized small molecules in DrugBank, where side-effect information was not available in SIDER. Finally, Liu et al. [14] found that adding phenotypic data on the drug (i.e. the presence or absence of side effects, excluding the one being predicted) to a similar feature representation to that considered in [13] greatly enhances prediction of the ADR of interest, obtaining AUCs>0.9. However, since their approach relies on health outcomes data on the drug compound, the method is unsuitable for ADR prediction in the early-stage development of nascent drug compounds, prior to in vitro studies or clinical trials. In all of the cases listed above, only global quality-of-performance metrics, aggregated across all considered side effects, are reported, making it difficult to assess how the models performed on individual side effects or classes of side effects.

There is another group of studies that more fully exploit the network structure of drug, protein, and ADR entity relationships. A network-oriented approach by Cami [15] analyzed a dataset consisting of 809 drug feature vectors (consisting of drug features from DrugBank and PubChem) and proprietary data on the drug side effect profiles. A unique aspect of the dataset is that the time ordering of when specific side effects appeared is reported. Starting with side effect profiles on the drugs from 2005, they trained a logistic regression model that could predict the side effects that manifested between 2006–2010, preserving the temporal order of how they manifest. The preservation of the time-ordering of the side effect appearance is appealing, but it is unclear how their approach would generalize to a different dataset. Mizutani [16] applied SCCA to find relationships between the drug-protein interaction network of 658 drugs from DrugBank and 1368 proteins extracted from DrugBank and Matador [17] databases to 1339 side effects associations as found in SIDER [18]. They found significant enrichment in most of the correlated protein-side effect sets for proteins involved in the same KEGG [19] and Gene Ontology biological pathways [20]. Similarly, Kuhn [21] constructed an explicit network to predict and characterize proteins that cause side effects by drawing statistical inferences between drug-target and drug-ADR links. Their method is able to reveal causal relationships between targets and ADRs but is highly sensitive to outliers. For instance, there was insufficient statistical power to associate side effects to proteins that were an off-target of only a small number of drugs.

Indeed, the main weakness of these QSAR-like studies is their reliance on what is present in experimental data, which will tend to feature a strong bias towards approved drugs (i.e. little representation of serious ADRs) and on-target or intended effects. It is difficult to see how analysis of drug-intended target binding data could be applied to explore correlations between off-target drug-protein binding and possibly rare ADRs.

Recently, systems biology approaches have been used to predict ADRs by viewing ADRs as perturbations of biological pathways. These approaches seek to transcend the “one drug-one target” paradigm used in traditional drug design, which ignores system-wide effects that cause a drug to have unforeseen pharmacological effects [22]. Scheiber et al. [23] integrated several chemical and biological databases by comparing perturbed and unperturbed pathways in a set of compounds that have a common toxicity phenotype. They use this analysis to link pathways with particular ADRs. Huang and co-workers [24] combined clinical observation data with drug-target data and the gene ontology (GO) annotations of the target proteins to predict ADRs. They find a significant improvement in the quality of their models by incorporating features from the protein-protein interaction (PPI) network of the targets. Similarly, Huang et al. [25] increased the median AUCs of their support vector machine models from 0.591 to 0.700 by adding both PPI network and small molecule structural features to their feature set.

In all of these cited cases, the efforts to solve the ADR prediction problem have focused on integrating publicly available and (in some cases proprietary) biological (e.g. physical and chemical small molecule properties, drug-protein associations, protein-protein interaction networks, biological pathway and gene annotations, etc.) and epidemiological data on side effect-related health outcomes (e.g. FDA package label data, clinical trial data) to train statistical models to predict ADRs with various degrees for success.

A key drawback of using experimental data is that the type and quality of data that exists is influenced as much by the financial limitations of experimental drug development as by the relevant biological science. The drug-protein associations aggregated from DrugBank and Matador can be represented as a Boolean matrix where ‘1’s (‘0’s) would indicate the presence (absence) of an association. This matrix has been used for some of the previous efforts, as noted above, and is highly sparse with ‘0’s indicating both negative results of assays and unperformed assays. ADR-protein associations derived from these data limit us to patterns in known, intended “on-target” associations and limit the ability to find novel off-target associations. Also, data on lead compounds that have failed in the development pipeline are typically regarded as proprietary information and are generally unavailable for inclusion in analysis. Clearly, the majority of publicly available data is biased in ways which are difficult to correct.

An alternative approach is to leverage ever-growing databases of high-resolution, experimentally-solved, protein structures, such as the Protein Data Bank (PDB) [26], and use molecular modeling to infer putative off-target interactions of drugs with known ADRs. Technical advances in drug-protein binding modeling, protein sequencing, and homology modeling allow high-throughput virtual screening early in the drug discovery process. Vast libraries of small molecules can be docked to a large array of protein structures in order to simultaneously predict putative drug targets and ancillary off-target binding interactions that may have associations to serious ADRs. Yang et al. [27] used virtual docking to propose possible interactions between a set of 845 proteins and a set of 162 drugs that induced at least one of four ADRs. Lounkine et al. [28] predicted the activity of 656 marketed drugs on 73 targets from the Novartis in vitro safety panel using the similarity ensemble approach (SEA). This was not a true docking study per se, in that SEA calculates the chemical similarity of each drug with each of the native ligands of the 73 targets.

Two previous efforts, in particular, are similar to our current study. First, Wallach and co-workers [29] applied multiple stages of logistic regression to docking scores involving 730 drugs, 830 human protein targets and then applied multiple stages of logistic regression to these data and data on 506 ADRs, producing 32 ADR-pathway associations supported by the scientific literature (i.e. PubMed). Second, Xie et al. [30] developed a methodology that identified 3D protein structures in the PDB that had similar ligand binding sites to those of the primary targets of Cholesteryl Ester Transferase Protein (CETP) inhibitors. Subsequently, they applied molecular docking to help rank order the atomic-level interactions of the drugs with the putative off-targets. This analysis led to 204 structures with binding sites similar to CETP. This set of off-targets was then integrated into a network that included multiple metabolic signal transduction and gene regulation pathway constituents, drugs, and clinical outcomes. From this network, they were able to elucidate several ADRs known to be associated with the CETP inhibitors: the negative effect of Torcetrapib on blood pressure observed in Phase III clinical trials and the increased death rates from infection and cancer.

These studies used the “first principles” approach to circumvent the bias issues in experimental data outlined above, but none of these previous efforts describe computational frameworks scalable to the data sizes required for a high-accuracy, high-throughput ADR screening panel for nascent compounds.

More recently, Reardon [31] reported on a computational effort that uses publicly available profiles of 600,000 chemical compounds and assesses their ability to bind to ∼7000 chemical pockets on 570 human proteins. The known expression profiles of the proteins and receptors on human organs is then used to predict where in the body a given drug will most likely have effects. While these efforts certainly operate at the necessary scale, they do not report a method to statistically associate the docking scores with ADR phenotypes, which is precisely the goal of our work here.

Our working hypothesis is that it is valuable to predict ADRs as early in the lead identification phase as possible. Structure-based, high throughput, virtual screening is already widely applied in the early stages of drug discovery because of its low cost and high efficiency in identifying putative drug-candidate/drug-target interactions. Molecular docking-based screening studies involve fitting a large library of N small molecules into the active sites of M target protein structures, to calculate estimates of binding affinities. M and N can be quite large. Currently, the PDB has M>90 K protein structures, increasing at a rate of over 7500 per year [26]. The combinatorics of the possible chemical structural space occupied by small molecules is immense, i.e. N ≈ 1060 possible drug compounds [32].

These numbers, combined with the complexities of conformational sampling to find the best fit of the small molecule (i.e. “pose”) in the target, and the computational cost of the scoring function itself, make high-throughput ADR screening ideal for high-performance computing.

Zhang et al. [33] implemented a mixed parallel scheme using Message Passing Interface (MPI) and multithreading in a parallel molecular docking program, called VinaLC, by modifying the existing AutoDock Vina molecular docking program. One million flexible docking calculations took about 1.4 hours to finish on ∼15 K CPUs. The docking accuracy of VinaLC has been validated against the DUD (Directory of Useful Decoys) database [34] by the re-docking of X-ray ligands and an enrichment study. The statistical results presented in their study [33] show VinaLC is one of the better performing docking codes on the DUD set of decoys/ligands, having a mean receiver operator characteristic area-under the curve (ROC AUC) of 0.64 (95th CI: 0.60-0.68). VinaLC identified 64.4% of the top scoring poses with an RMSD under the 2.0 Å cutoff, while that for the best poses is 70.0%. For the best poses, all the targets have RMSD values within 10 Å and about half of the targets have RMSD values less than 1 Å. Overall, the VinaLC docking program performed well for re-docking the X-ray ligands back into the active site of the X-ray structures with the default setting for the grid sizes and exhaustiveness = 8. To improve the enrichment of the docking results, Zhang et al. [35] have also developed a massively parallel virtual screening pipeline using Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) rescoring and have shown improvements in the docking benchmark AUC to 0.71, on average. Overall, the results demonstrate that MM/GBSA rescoring has higher AUC values and consistently better early recovery of actives than VinaLC docking alone.

A significant fraction of these molecules (e.g. drugs approved by regulatory agencies like the U.S. Food and Drug Administration) are annotated with known associated ADRs in public databases, such as SIDER. As in the prior work we cited, machine learning methods can identify statistical associations between these ADR outcomes and patterns in drug-protein binding as revealed by our VinaLC docking scores. The results can be used to build predictive models so the probabilities of certain ADRs can be predicted for a nascent or theoretical small molecule drug candidate that may not have undergone in vitro or clinical trial testing.

This study potentially provides a technological and methodological path forward to large-scale, high-throughput, in silico, comprehensive ADR screening. Our results indicate that molecular docking performed with sufficiently detailed docking models on high performance computers (HPC) may provide reliable, cost-effective, comprehensive high-throughput screening of a drug candidate for binding across many known on- and off-targets to predict clinically important ADRs.