A graphical overview of DIPULSE can be found in Fig. 1. The individual steps of the pipeline corresponding to each panel of the figure are described in detail below. Briefly, we used AE reporting frequencies for individual drugs to identify an AE fingerprint for increased risk of TdP. We then apply this model to a test data set of AE reporting frequencies for drug pairs. We filtered for high-confidence predictions and proceeded to validate these putative QT-DDIs in the EHR by comparing the QTc (heart rate-corrected QT) intervals of patients prescribed the flagged drug pair with patients prescribed either drug alone. Finally, we perform a confounder analysis to remove any associations that can be explained by co-prescribed medications, and generated a final candidate list of novel QT-DDIs.

Fig. 1 Overview of DIPULSE pipeline, which combines mining of FAERS and EHRs to flag novel QT-prolonging DDIs. FAERS: We generate an AE reporting frequency table (dimensions, N drugs by M AEs) for single drugs in FAERS. The value at a row and column represents the fraction of reports for drug i containing AE k (F ik ). We label a drug as a positive example (shown in red) if it has a known risk of TdP (obtained from http://www.CredibleMeds.org). All drugs not found in CredibleMeds were labeled as negative examples (shown in green). We use machine learning to generate an AE fingerprint model that identified the most predictive subset of features (AE reporting frequencies, F ik ) as latent evidence for predicting whether a drug does or does not prolong the QT interval (gray boxes). We then apply this fingerprint model to an independent test data set consisting of a matrix (with AE reporting frequencies F ijk ) for drug pairs. We send pairs receiving high classifier probabilities (but where neither individual drug is known to prolong the QT interval) for EHR validation (in this case pairs (D N−1 , D N−2 ) [purple-blue] and (D N−1 , D N ) [purple-orange]). EHR: We validate putative interactions using electrocardiogram laboratory results in the EHRs by determining whether patients prescribed a predicted interacting drug pair had increased QTc intervals compared with patients taking either drug alone. In this example, patients prescribed the drug pair (D N−1 , D N−2 ) have a significantly increased QT interval compared with patients on either drug alone. This is not observed for drug pair (D N−1 , D N ) so it is filtered out. Finally, we performed a confounder analysis to confirm that the significant increase observed in QTc interval is not due to other co-prescribed medications. DIPULSE Drug Interaction Prediction Using Latent Signals and EHRs, EHRs electronic health records, FAERS FDA Adverse Event Reporting System, DDIs drug–drug interactions, AE adverse event, TdP torsades de pointes, QTc heart rate-corrected QT interval Full size image

In developing the pipeline, our rationale was to prioritize high precision over high recall to obtain a final list of high-confidence interactions; therefore, the choices we made in designing the filtering steps described below reflect this conservative approach. We implemented the method using Python 2.7.9 and R 3.1.0.

Primary Data Sources

We downloaded a snapshot of the FAERS database containing 1,851,171 reports (corresponding to the first quarter of 2004 to the first quarter of 2009). Each report in FAERS contains the drugs prescribed to the patient, the drug indications, and the observed AEs. We included suspected, interacting, and concomitant drugs on the reports.

As positive controls, we downloaded a list of 180 drugs with known (n = 47), possible (n = 75), conditional (n = 31), or congenital (n = 27) risk of TdP from CredibleMeds, an online compendium of drugs associated with LQTS [8]. We also obtained a list of 2856 critical and significant DDIs from the Veteran Affairs Hospital [24].

To validate our DDI predictions, we used EHR data from Columbia University Medical Center (CUMC). In addition to patient demographics, drugs prescribed, and diagnosis codes, we also used QTc (heart rate-corrected QT interval) values obtained from ECG laboratory results. The study was approved by the CUMC Institutional Review Board.

Generating Adverse Event (AE) Reporting Frequency Tables

We pre-processed the reports from FAERS to generate the intermediate AE reporting frequency tables in the Offsides (single drug) and Twosides (drug pair) databases [25]. Offsides and Twosides were created by training propensity score matching models to match patients exposed to a single drug or drug pair to unexposed controls on the basis of co-prescribed medications and drug indications; an advantage of this approach is that only patients for whom controls could be matched are used for drug safety prediction [25].

An intermediate step in this process is the assembly of AE frequency reporting tables for both single drugs and drug pairs, as seen in Fig. 1, with each row representing a drug and each column representing one of the AEs in FAERS. For single drugs, the value at a given row and column represents the frequency of reporting F ik , defined as the fraction of reports for drug i containing the AE k. Similarly, for drug pairs, the reporting frequency F ijk corresponds to the fraction of reports for drug pair (i, j) containing the AE k. We used the former matrix to train the fingerprint model, and the latter for DDI prediction.

Training AE Fingerprint Model

We used the AE reporting frequencies (F ik ) in the frequency table for single drugs as features to train a logistic regression classifier. The binary classifier models the log odds ratio of a drug prolonging the QT interval as a linear combination of each AE reporting frequency in the model multiplied by a weight (known as a β coefficient); depending on the probability threshold set, a drug above the threshold is classified as increasing the risk of TdP, and a drug below the threshold is classified as safe. Training the model requires both positive and negative examples. As positive examples, we used the subset of the 47 drugs with a known risk of TdP in CredibleMeds that were also in FAERS (n = 23). As negative controls, we selected all drugs in FAERS that did not appear in CredibleMeds (i.e. have no known, possible, conditional, or congenital risk of TdP; n = 530).

Because the number of features (11,305 AEs) is much greater than the number of examples (553 drugs), overfitting of the model to the training data is a concern. To ensure the model generalized to our test data set (drug pairs), we reduced the number of features by using L1 (lasso) regularization [26]. Unlike L2 (ridge) regularization (which penalizes the squares of the feature weights), L1 regularization penalizes their absolute values and is therefore preferred because it results in sparse models (i.e. most of the feature weights will be driven to zero). We generated five models, each of which contained between 5 and 20 features obtained by varying the regularization strength for the given model. We evaluated these models using 10-fold cross-validation, and then re-fit the classifier using only the selected features. The features for each of these models constitute an AE fingerprint that represents latent evidence for QT interval prolongation.

As a control, we generated a logistic regression model built solely using direct evidence of QT interval prolongation (standardized Medical Dictionary for Regulatory Activities [MedDRA] query for ‘Torsade de Pointes/QT prolongation’). There were only six AEs corresponding to QT interval prolongation or TdP (electronic supplementary Table 1), and therefore feature selection was not necessary.

Predicting Novel Drug–Drug Interactions (DDIs) Using the Fingerprint Model

We next applied the QT fingerprint model to an independent test data set consisting of the AE reporting frequencies (F ijk ) in the frequency table for drug pairs. The model outputs a probability for a given drug pair to prolong the QT interval. We assessed model performance using two references. In the first, we labeled each drug pair containing a drug known to increase the risk of TdP as a positive example. While these may not be bonafide DDIs, they demonstrate the ability of the fingerprint model to ‘re-discover’ drugs known to prolong the QT interval within the drug pair data. We used this validation to select the optimal fingerprint model. We also performed an additional validation using a list of critical and significant DDIs from the Veteran Affairs Hospital. For both of these evaluations, we compared the performance of the ‘latent’ AE fingerprint model with the ‘direct evidence’ control model using DeLong’s test [27].

To obtain a candidate list of novel DDIs predicted by the fingerprint model, we first removed all drug pairs containing a drug in the CredibleMeds list. We then filtered for all novel predictions found at a classifier probability below a 4 % false positive rate according to the CredibleMeds evaluation. We chose this false positive rate threshold by modeling the expected increase in false discovery rate as a function of false positive rate (see electronic supplementary Fig. 1 and accompanying legend for a description of the analysis). Finally, we removed drug pairs that would receive high classifier scores regardless of the features used in the model by generating 100 logistic regression models using randomly chosen features and estimating empirical p values for each drug pair. We removed any drug pairs receiving an empirical p value ≥0.01.

Validating Novel DDIs Using Electronic Health Records

While the novel DDIs predicted using our signal detection algorithm each contain latent evidence for prolonging the QT interval, ECG values in EHRs allow us to retrospectively evaluate the effect of these drug pairs (our cases) on QT interval duration compared with either drug alone (our controls). Because QT interval durations differ between males and females [28], we evaluated the effects of a given drug pair on each sex separately.

To obtain cases, we selected patients at New York-Presbyterian Hospital/Columbia University Medical Center who were prescribed each drug in a given drug pair within a 7-day period. Patients were also required to have an ECG lab—and corresponding QTc (heart rate-corrected QT interval)—within 36 days of the second drug prescription. We chose this limit to minimize the potential for new confounding drug prescriptions or interventions; additionally, because follow-up visits are often scheduled in units of weeks, we allowed for 5 weeks plus 1 day for laboratory tests to be performed [22]. For patients with multiple QTc values within this time period, we used the maximum value.

To obtain controls, we selected patients taking whichever individual drug in the pair yielded the greatest median QTc within a 36-day period from drug prescription; we call this drug the ‘control’ drug. We then compared QTc values between cases and controls and assessed significance using a Mann–Whitney U test, correcting for multiple hypothesis testing using Bonferroni’s method.

In order to demonstrate that the predictions being sent for EHR validation were enriched for drug interactions that actually prolonged the QT interval, we ran the above EHR case-control analysis on a set of drug pairs equal in number to that generated by the latent signal detection but randomly chosen from the frequency table for drug pairs. To generate a more representative comparison, we required that each pair be comprised of a randomly chosen drug paired with a ‘control’ drug (i.e. the drug with the greatest QTc interval alone from the latent evidence pairs). Additionally, to ensure equivalent statistical power we matched the number of patients in the case groups of the randomly chosen pairs to the case group sizes of the pairs prioritized by the latent signal detection. We counted the number of random pairs that had significant increases in QT interval, and repeated this sampling procedure 1000 times to build an empirical distribution of how many significant results would be expected after EHR analysis by chance alone.

Finally, we adjusted for confounders by confirming that the elevated QTc interval on the drug pair was not due to other co-prescribed medications. For each of our sets of cases (patients on a given drug pair) and controls (patients on an individual drug in the pair), we identified possible confounder drugs by counting the number of exposures to each drug prescribed up to 36 days prior. We evaluated each potential confounder by confirming that it was correlated both with the exposure condition and with QTc values. For the former, we determined whether the covariate was more likely to be prescribed with the drug pair compared with the single drug using a Fisher’s exact test; for the latter, we compared the QTc values for patients exposed to the covariate versus those unexposed using a Mann–Whitney U test. Both of these evaluations were performed using a Bonferroni correction for multiple hypothesis testing. We collected all drug covariates that passed these two requirements and assessed their significance (for males and females separately) using an analysis of covariance (ANCOVA). To obtain the final list of validated novel DDIs, we only kept those results (drug pairs for a given sex) receiving significant ANCOVA p values (p < 0.05) for the DDI.