Preparation of MicroRNA dataset

First of all, we prepare the list of miRNAs whose function we attempt to characterize (Figure1). For the whole human miRNAome, we could retrieve the complete list from miRBase Release 19 (http://www.mirbase.org), as described previously[17]. For the selection of focused miRNAome, we could download microRNA expression profiling datasets from Gene Expression Omnibus (GEO) repository (http://www.ncbi.nlm.nih.gov/geo). They are derived from experimental data performed on microarray, quantitative RT-PCR (qPCR), and high-throughput sequencing. In the next step, we extract a set of differentially expressed miRNAs (DEMs), either upregulated or downregulated among distinct samples and/or different experimental conditions, following statistical evaluation with Bioconductor on R statistical package (http://www.r-project.org), and so on.

Figure 1 The workflow of molecular network analysis of microRNA targetome. First, differentially expressed miRNAs (DEMs) among distinct samples and experimental conditions are extracted from microRNA expression profiling datasets based on microarray, qPCR, and next-generation sequencing (NGS) experiments by the standard statistical evaluation. Next, predicted targets and/or validated targets for DEMs are obtained by using target prediction programs, such as TargetScan, PicTar, MicroCosm and Diana-microT 3.0, or by searching them on databases of experimentally validated targets, such as miRTarBase, miRWalk, and miRecords. The expression of DEM targets in the cells and tissues examined is verified by searching them on UniGene, BioGPS, and HPRD. Molecular networks and pathways relevant to DEM targets are identified by using pathway analysis tools, such as KEGG, IPA, and KeyMolnet. Finally, the functionally inverse relationship between miRNAome and targetome is validated by loss-of-function or gain-of-function experiments in an in vitro and/or in vivo model. Full size image

MicroRNA target prediction

In general, miRNAs could form an energetically stable Watson-Crick base pair with target mRNAs[2]. In most occasions, the seed sequence located at positions 2 to 8 from the 5′ end of the miRNA serves as an essential scaffold for recognizing the target mRNA in the condition of a perfect seed match with miRNA recognition element (MRE) sequences of mRNA. Target sites often avoid the sequences immediately after the stop codon, which have the possibility of falling into the ribosome shadow[19]. The thermodynamic rule and the evolutional conservation of MRE sequences make it possible to fairly accurately predict miRNA target mRNAs by computational approaches[2]. Open source miRNA target prediction programs, including TargetScan version 6.2 (http://www.targetscan.org), PicTar (pictar.mdc-berlin.de), MicroCosm version 5 (http://www.ebi.ac.uk/enright-srv/microcosm), miRanda (http://www.microrna.org), and Diana-microT version 3.0 (diana.cslab.ece.ntua.gr/microT), are mostly armed with unique algorithms that survey MRE sequences in the 3′UTR of target mRNAs. As a result, the predicted targets vary greatly among the distinct programs utilized[20]. Increasing evidence suggests that MRE sequences are located occasionally in the 5′UTR or coding sequences (CDS)[21, 22], both of which are ignored by the conventional prediction programs. Furthermore, predicted targets are usually cell- and tissue-type non-specific. These drawbacks confer a substantial risk for detecting numerous false positive and negative ones. The integration of the results from several prediction programs, along with examination of tissue-specific interactions, might provide an advantage for reducing unreliable targets to some extent[23, 24].

Recently, several databases of experimentally validated miRNA targets are established to overcome the unreliability of target prediction (Figure1). The miRecords database (mirecords.biolead.org) includes 2,286 records of experimentally validated interactions between 548 miRNAs and 1,579 target genes derived from 9 species extracted after thorough literature curation, accompanied with the storage of predicted targets collected from datasets of 11 established miRNA target prediction programs[25]. The miRTarBase (mirtarbase.mbc.nctu.edu.tw) represents the collection of 4,270 manually curated miRNA-target interactions validated experimentally between 669 miRNAs and 2,533 target genes among 14 species[26]. It is followed by a clear description of experimental methods for target validation on each interaction, such as luciferase reporter assay, western blot, quantitative RT-PCR, and microarray experiments. The miRWalk database (http://www.umm.uni-heidelberg.de/apps/zmf/mirwalk) contains both predicted and validated information on miRNA-target interactions focused on 449 human biological pathways and 2,356 disorders of Online Mendelian Inheritance in Man (OMIM)[27]. Predicted targets are originated based on its own algorithm that covers MRE sequences located both inside and outside the 3′UTR of target mRNAs, and are also collected from datasets of 8 established miRNA target prediction programs. Validated targets are identified by an automated text-mining search on PubMed to extract experimentally validated miRNA-target interactions, including those involved in miRNA processing, followed by PubMed article identifiers (PMID).

In silico validation of tissue-specific expression of MicroRNA target mRNAs

Although experimentally validated targets represent a source of reliable candidates, it is worthless when they are not expressed in the cells and tissues examined. Most simply, we could verify the expression of target mRNAs in specified tissues and cells by analyzing them on UniGene (http://www.ncbi.nlm.nih.gov/unigene), an organized view of the transcriptome that evaluates semi-quantitatively the expression sequence tag (EST) calculated as the number of transcripts per million (TPM) (Figure1)[18]. We could investigate mRNA expression levels based on microarray data in specified tissues and cells by searching them on a gene annotation resource named BioGPS (biogps.org)[28]. Similarly, H-Invitational Database (H-InvDB) (http://www.h-invitational.jp) includes the Human Anatomic Gene Expression Library (H-ANGEL) that provides gene expression data from microarray experiments and EST profiles determined on a panel of normal adult human tissues[29]. Human Protein Reference Database (HPRD) (http://www.hprd.org) linked to NCBI Entrez is also useful to identify the tissue-specific expression of proteins and their subcellular location.

Genome-wide analysis of MicroRNA target mRNAs

The simultaneous assessment of miRNA and mRNA expression profiles provides a rational approach to identify a set of miRNAs whose expression levels are negatively correlated with the levels of their target mRNAs[30–32]. However, it is often difficult to determine the optimum experimental time required for miRNA-induced degradation of target mRNAs, because time lags exist in expression changes between miRNAs and target mRNAs. Time course-dependent profiles of miRNA-mRNA expression make it possible to more exactly identify the inverse relationship between relevant miRNAs and mRNAs[33]. However, the interaction of a miRNA with a target mRNA does not always cause mRNA degradation. Instead, it often leads to reduction in protein expression levels by translational repression.

Recently, the methods of quantitative proteomics are established to overcome the difficulties attributable to the dissociation of miRNA and mRNA dynamics. They include stable isotope labeling with amino acids in culture (SILAC), isobaric tag for relative and absolute quantitation (iTRAQ), and two-dimensional difference gel electrophoresis (2D-DIGE), all of which are combined with miRNA expression profiling[34–36]. Nevertheless, these techniques could not exclude indirect alterations of protein expression. To enrich a class of mRNAs directly bound to the RISC complex, the method designated as ribonucleoprotein immunoprecipitation (IP) followed by GeneChip (RIP-Chip) has been established. By this advanced technique, a previous study has characterized miRNA target mRNAs recovered from the Ago2-IP fraction of Hodgkin lymphoma cells[37]. They found that approximately 40% of miRNA target transcripts are derived from targets for abundantly co-expressed miRNAs in the cells, although this technique could not specify the exact pair of miRNAs and their target mRNAs.

A recent progress in the next-generation sequencing (NGS) technology has revolutionized the field of genomic research. Currently, we could efficiently identify endogenous miRNAs and target mRNAs on a genome-wide scale at one time by using the method named as high-throughput sequencing of RNAs isolated by crosslinking immunoprecipitation (HITS-CLIP-Seq) or alternatively by the improved version designated as the photoactivatable-ribonucleoside-enhanced crosslinking immunoprecipitation (PAR-CLIP-Seq)[38, 39]. In both of them, the RISC complex components comprising miRNAs, mRNAs, and RISC proteins are crosslinked by ultraviolet (UV) prior to immunoprecipitation with an antibody specific for the RISC component protein. Then, deep sequencing data are processed for target prediction programs to identify interaction sites between miRNAs and target mRNAs. By these techniques, a previous study showed that MRE sequences are located in 3′UTR (40%), 5′UTR (1%), CDS (25%), intron (12%), intergenic regions (6%), and non-coding RNA (4%), respectively, in the postnatal mouse neocortex[38]. A different study revealed that the GCACUU motif, enriched in 3′UTR and CDS of target mRNAs, matches the seed of a miRNA family that constitutes 68% of entire miRNAs in mouse embryonic stem cells (mESCs)[40].

Molecular network analysis of MicroRNA target mRNAs

To identify biologically relevant molecular networks and pathways extracted from high-throughput data, we could analyze them by using a battery of bioinformatics tools for analyzing molecular interactions on the comprehensive knowledgebase (Figure1). They include Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.kegg.jp), Panther (http://www.pantherdb.org), Reactome (http://www.reactome.org), Ingenuity Pathways Analysis (IPA) (Ingenuity Systems,http://www.ingenuity.com), and KeyMolnet (Institute of Medicinal Molecular Design,http://www.immd.co.jp). KEGG, Panther, and Reactome are open sources, whereas IPA and KeyMolnet are commercial ones, all of which are updated frequently. After July 1, 2011, the KEGG FTP site for academic users has been transferred from GenomeNet at Kyoto University to NPO Bioinformatics Japan. Therefore, the FTP access is currently available only to paid subscribers, although the publicly funded domain is freely accessible at GenomeNet. This review focuses on the application of KEGG, IPA, and KeyMolnet to molecular network analysis.

KEGG includes manually curated reference pathways that cover a wide range of metabolic, genetic, environmental, and cellular processes, and human diseases[41]. Currently, KEGG contains 198,560 pathways generated from 428 reference pathways. When importing of Entrez Gene IDs into the Functional Annotation tool of Database for Annotation, Visualization and Integrated Discovery (DAVID) version 6.7 (david.abcc.ncifcrf.gov), DAVID identifies the most relevant KEGG pathway and gene ontology (GO) categories, composed of the genes enriched in the given set, followed by an output of statistical significance evaluated by the modified Fisher’s exact test[42].

IPA is a knowledgebase that contains approximately 3,000,000 biological and chemical interactions and functional annotations with definite scientific evidence, curated by expert biologists. By uploading the list of Gene IDs and expression values into the Core Analysis tool, the network-generation algorithm identifies focused genes integrated in a global molecular network. IPA calculates the score p-value that reflects the statistical significance of association between the genes and the networks by the Fisher’s exact test.

KeyMolnet contains knowledge-based contents on 150,500 relationships among human genes and proteins, small molecules, diseases, pathways and drugs, curated by expert biologists[12, 17]. They are categorized into the core contents collected from selected review articles and textbooks with the highest reliability or the secondary contents extracted from PubMed abstracts and Human Reference Protein database (HPRD). By importing the list of Gene IDs and expression values, KeyMolnet automatically provides corresponding molecules as a node on networks. The neighboring network-search algorithm selects one or more molecules as starting points to generate the network of all kinds of molecular interactions around starting molecules, including direct activation/inactivation, transcriptional activation/repression, and the complex formation within the designated number of paths from starting points. The generated network is compared side by side with 484 human canonical pathways, 892 diseases, and 219 pathological events of the KeyMolnet library (the April 2012 version). The algorithm counting the number of overlapping molecular relations between the extracted network and the canonical pathway makes it possible to identify the canonical pathway showing the most significant contribution to the extracted network[12, 17].

Experimental validation of biological implications

Molecular network analysis enables us to characterize the most relevant networks and pathways involved in the miRNA targetome in silico. When the expression of DEMs is downregulated, theoretically, the targetome is predicted to be upregulated, and presumably hyperactivated under pathological conditions. In contrast, when the expression of DEMs is upregulated, the targetome is predicted to be downregulated, and possibly hypoactivated under disease conditions. The functionally inverse relationship between miRNAs in the miRNAome and mRNAs in the targetome should be validated by loss-of-function or gain-of-function experiments by introducing antagomirs (anti-sense miRNAs) or premir oligonucleotides in an in vitro and/or in vivo model in an adequate setting (Figure1). This step is highly important but often labor intensive. For example, a recent study by using microarray and qPCR showed that the expression of a set of miRNAs, most robustly miR-206, are upregulated in Tg2576 AD transgenic mice and human AD brain samples[43]. Importantly, intraventricular injection of a miR-206 antagomir restored decreased levels of brain-derived neurotrophic factor (BDNF), a highly likely target of miR-206, followed by a remarkable improvement of memory function.