Leave-one-out cross validation

LOOCV was implemented on known miRNA-disease associations obtained from HMDD51 to evaluate the predictive performance of WBSMDA. For each given disease d, each known disease-related miRNA was left out in turn as test miRNA and other known disease-related miRNAs were taken as training miRNAs. All miRNAs without known evidences to be associated with the disease d were selected to be candidate miRNAs. Then we can get the rank of this test miRNA among the candidate miRNAs. If the rank exceeds the given threshold, the WBSMDA model was considered to have made a correct prediction of this miRNA-disease association. Receiver-Operating Characteristics (ROC) curve was drawn by plotting true positive rate (TPR, sensitivity) versus false positive rate (FPR, 1-specificity) at different thresholds. Here, Sensitivity refers to the percentage of the test miRNA-disease associations which are ranked higher than the given threshold. And specificity (also called the true negative rate) refers to the percentage of negative miRNA-disease pairs below the threshold. When we vary the thresholds of successful prediction, we can obtain the corresponding TPR and FPR. In this way, ROC could be drawn and the area under ROC curve (AUC) could be calculated to evaluate the performance of WBSMDA. If AUC = 1, it means that the WBSMDA has perfect performance. And AUC = 0.5 indicates random performance. As a result, WBSMDA achieved a reliable AUC of 0.8031 (See Fig. 1).

Figure 1 The comparison result between WBSMDA and these three methods was shown, which demonstrated the superiority performance of WBSMDA to previous computational models. Full size image

Compared with other methods

We further compared WBSMDA with the following three classical methods which have been confirmed to achieve excellent prediction accuracy based on the previous version of known miRNA-disease associations in HMDD51: 1)RLSMDA35, which predicted disease-related miRNAs based on the framework of regularized least squares; 2)RWRMDA37, which implemented random walk on the miRNA functional similarity network to predict novel miRNA-disease associations; 3)HDMP43, which predicted potential disease-related miRNAs based on weighted k most similar neighbors. The comparison result between WBSMDA and these three methods was shown in Fig. 1, which demonstrated the superiority performance of WBSMDA to previous computational models. Especially, WBSMDA significantly improved the performance of RLSMDA with the AUC increase of 0.11. RWRMDA and HDMP can’t be used to diseases without any known associated miRNAs and miRNAs without any known related diseases. Therefore, except for performance improvement over these two computational models, WBSMDA could effectively overcome this important limitation.

Furthermore, we implemented 5-fold cross validation for miRNA-disease association prediction evaluation. All the known miRNA-disease associations have been divided into 5 groups with equal sizes, where 4 groups would be regarded as training samples for model learning and the other group would be used for model evaluation. We implemented 100 randomized divisions of known associations to minimize the performance difference resulting from samples divisions. As a result, WBSMDA has obtained the reliable performance (the mean and the standard deviation of AUCs is 0.8185 and 0.0009, respectively.).

Case studies

WBSMDA was applied to predict potential miRNA-disease associations for all the diseases investigated in this paper. To further demonstrate the prediction ability of WBSMDA, case studies of Colon Neoplasms, Lymphoma and Prostate Neoplasms were implemented here. The prediction results were validated based on another two important miRNA-disease association databases, miR2Disease52 and dbDEMC database53. One important fact must be pointed out is that only the associations which are not recorded in the HMDD database would be regarded as validation datasets. Therefore, validation datasets is totally independent of datasets used for prediction.

Colon Neoplasms (CN) are a big threaten to people’s lives with a low detection rate at early stages54,55. There is an increasing need of novel sensitive biomarkers that could help improve the detection of CN56. For example, miRNA hsa-mir-145 can inhabits the growth of CN cells by targeting the insulin receptor substrate-1 and hsa-mir-126 could suppress the growth of CN cells by targeting phosphatidylinositol 3-kinase signaling57,58. Taking CN as a case study, WBSMDA was implemented to prioritize candidate miRNAs (See Table 1 and Supplementary Table 1). As a result, nine of the top ten potential related miRNAs were confirmed to be associated with CN. Furthermore, forty-five out of top fifty potential CN-associated miRNAs predicted by WBSMDA were confirmed to be associated with CN. Among those predicted CN-associated miRNAs, hsa-mir-20a (1st in the prediction list) was confirmed to up-regulated in three or more types of solid cancers, including CN24. Studies have found mir-18a (2nd in the prediction list) may function as a tumor suppressor by targeting K-Ras in CN59. What’s more, hsa-mir-19b and hsa-mir-19a (3rd and 4th in the prediction list, respectively) were confirmed to be differentially expressed between CN and normal colorectal tissue60.

Table 1 WBSMDA was applied to Colon Neoplasms, lymphoma, Prostate Neoplasms to identify their potential associated miRNAs. As a result, 9, 10 and 8 of top 10 predicted pairs for these diseases have been confirmed based on recent experimental literatures. Full size table

Lymphoma could be divided to two main categories: Hodgkin lymphomas (HL) and the non-Hodgkin lymphomas (NHL). HL is more frequently occurring lymphatic cancer with three to four novel cases per 100,000 individuals every year in the Western population. Furthermore, HL is difficult to be diagnosed at early stages61,62. NHL is a heterogeneous group of malignancies that originate in lymphatic hematopoietic tissue. NHL is treated mainly through chemotherapy treatment and local radiotherapy and could be further classified into B-cell lymphomas and T-cell lymphomas63. Recent experimental studies showed that the down-regulation of mir-16, mir-101 and mir-138 in the t (14;18)-negative FL (follicular lymphoma) subset was connected to profound mRNA expression changes of potential target genes involving cell cycle control and apoptosis64. MiRNA hsa-mir-19a showed an increased expression level compared with normal canine peripheral blood mononuclear cells (PBMC) and normal lymph nodes (LN) in canine B-cell lymphomas65. Taking lymphomas as a case study to implement WBSMDA for potential miRNA-disease association prediction, top ten potential lymphoma-associated miRNAs in the prediction list were all successfully verified based on recent experimental reports (See Table 1 and Supplementary Table 2). Furthermore, for the top fifty predicted lymphoma-associated miRNAs predicted by WBSMDA, forty-two of them have experimental literature evidences. For example, the up-regulation of miRNA hsa-mir-183 (1st in the prediction list), hsa-mir-215(2nd in the prediction list), hsa-mir-9 (3rd in the prediction list), hsa-mir-34a (5th in the prediction list) and down-regulation of hsa-mir-30b (4th in the prediction list) are all related to the development of lymphoma.

Prostate Neoplasms (PN) is the second leading cause of cancer-related death among men in developed countries66,67. About 29,720 patients died of PN in 2013 in the USA and it is estimated that there will be about 220,800 new cases in 201566,67,68. The initial treatment for most patients with PN is generally effective, while then PN will progresses to CRPC (castration-resistant prostate cancer) which is difficult to treat66. MiRNA mir-145 was deregulated in PN by targeting the proto-oncogene ERG69. It was also reported that androgen represses the mir-99a/let7c/125b-2 cluster through androgen receptor (AR) which can stimulate and repress gene expression to promote the initiation and progression of PN70. Taking PN as a case study to implement WBSMDA, eight predicted PN-associated miRNAs of the top ten prediction list and forty of top fifty prediction list were verified based on experimental reports (See Table 1 and Supplementary Table 3). For example, the expression of hsa-mir-143 (1st in the prediction list) and hsa-mir-199a (4th in the prediction list) is different in PN compared with the benign prostatic hyperplasia samples71. Studies also found that miRNA hsa-mir-126 (2nd in the prediction list) was one of the upregulated miRNAs in PN with perineural invasion (FDR 10%)72. Ectopic has-mir-34a (4th in the prediction list) expression could induce apoptosis of PN cells and could result in cell cycle arrest, growth inhibition and attenuated chemoresistance to anticancer drug camptothecin, suggesting that has-mir-34a could sever as a potential choice for the treatment of p53-defective PN73.