Abstract Background MicroRNAs (miRNAs) are highly conserved, short (18–22 nts), non-coding RNA molecules that regulate gene expression by binding to the 3′ untranslated regions (3′UTRs) of mRNAs. While numerous cellular microRNAs have been associated with the progression of various diseases including cancer, miRNAs associated with retroviruses have not been well characterized. Herein we report identification of microRNA-like sequences in coding regions of several HIV-1 genomes. Results Based on our earlier proteomics and bioinformatics studies, we have identified 8 cellular miRNAs that are predicted to bind to the mRNAs of multiple proteins that are dysregulated during HIV-infection of CD4+ T-cells in vitro. In silico analysis of the full length and mature sequences of these 8 miRNAs and comparisons with all the genomic and subgenomic sequences of HIV-1 strains in global databases revealed that the first 18/18 sequences of the mature hsa-miR-195 sequence (including the short seed sequence), matched perfectly (100%), or with one nucleotide mismatch, within the envelope (env) genes of five HIV-1 genomes from Africa. In addition, we have identified 4 other miRNA-like sequences (hsa-miR-30d, hsa-miR-30e, hsa-miR-374a and hsa-miR-424) within the env and the gag-pol encoding regions of several HIV-1 strains, albeit with reduced homology. Mapping of the miRNA-homologues of env within HIV-1 genomes localized these sequence to the functionally significant variable regions of the env glycoprotein gp120 designated V1, V2, V4 and V5. Conclusions We conclude that microRNA-like sequences are embedded within the protein-encoding regions of several HIV-1 genomes. Given that the V1 to V5 regions of HIV-1 envelopes contain specific, well-characterized domains that are critical for immune responses, virus neutralization and disease progression, we propose that the newly discovered miRNA-like sequences within the HIV-1 genomes may have evolved to self-regulate survival of the virus in the host by evading innate immune responses and therefore influencing persistence, replication and/or pathogenicity.

Citation: Holland B, Wong J, Li M, Rasheed S (2013) Identification of Human MicroRNA-Like Sequences Embedded within the Protein-Encoding Genes of the Human Immunodeficiency Virus. PLoS ONE 8(3): e58586. https://doi.org/10.1371/journal.pone.0058586 Editor: Fatah Kashanchi, George Mason University, United States of America Received: August 22, 2012; Accepted: February 5, 2013; Published: March 8, 2013 Copyright: © 2013 Holland et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by the University of Southern California-Rasheed Research Endowment Fund. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Discussion The potential involvement of miRNAs in HIV-1 proliferation and life cycle is the subject of much research. While the dysregulation of miRNA expression in HIV-1 infected cells has been known since 2005 [12], studies have now identified HIV-TAR and Nef-LTR regulating miRNAs [21], [22]. In addition target sites have been predicted for cellular miRNAs such as nef-LTR regions in HIV-1 genomes [10], indicating that human cellular miRNAs can modulate HIV-1 expression and replication [13]. Yeung et al [22] had first suggested that small RNA molecules in HIV-infected cells may represent products of Dicer cleavage. Subsequently a high-throughput deep sequencing study of siRNA from HIV-infected cells suggested that the viral dsRNA intermediates may be processed by Drosha and Dicer [23]. However, the small RNAs found in these studies are non-coding and preliminary transfection of these “viral microRNA” clones did not show significant changes in virus production although some small RNAs showed some inhibition [23]. Thus, the miRNAlike sequences we have discovered in HIV genomes are not related to those reported in any published studies. HIV-1 genomes can interact with miRNA in two ways: direct binding of a cellular miRNA with a viral transcript or the RNA genome itself, or indirectly, by interacting via a host-cellular factor that is required for HIV-1 infection and viral life cycle. While most miRNAs regulate gene expression by suppressing their target mRNAs, a cellular miRNA could target host factors, which either suppress or enhance HIV-1 infection. Also, a single cellular miRNA can target as many as 100 transcripts. This makes it likely that any given cellular miRNA involved in HIV-1 infection is probably serving as both an activator and a suppressor of HIV-1 at the same time. While the multitude of interactions inside a cell are extremely complex and dynamic for determining the exact role of a given cellular miRNA as purely an up- or down-regulator of HIV-1 infection, there is evidence for both direct and indirect regulation of cellular and viral gene expression by cellular miRNAs [31]. We have conducted a thorough literature search for the possible biological functions of cellular hsa-miR-195, miR-30d, miR-424 and miR-374a as it relates to HIV infection. A total of 17 papers were found to report on a wide range of functionalities of miR-195, 6 citations were associated with miR-30d function, 8 papers discussed miR-424 gene function and only 1 citation described miR-374a function. Most papers associate these microRNAs to cancer, apoptosis, Alzheimer’s disease and signal transduction [32]–[35] and none was related to any of the miRNAs expressed during HIV infection. The miRNA-like sequences we have identified in HIV-1 are unique in that they do not seem to be derived from cellular miRNA, nor do they appear to represent viral miRNAs or their targets. The hsa-miR-195-like sequence corresponds to the first 18 nucleotides of the mature hsa-miR-195, which has a length of 21 nucleotides. Any functional similarity of this sequence to the cellular hsa-miR-195 may be speculative. However, it should be noted that there are miRNAs as short as 17 nt; and that HIV-1 has been reported to encode a viral miRNA, designated TAR-3p, whose cloned length is 17 nt [20]. Further, the potential action of a miRNA is mostly dependent on base pairing between the miRNA seed sequence and its target; positions 13–16 of the miRNA may aid in pairing as well [5], [36]. The miR-195-like sequence we have identified in #GU216763 contains both the seed region and positions 13–16 of hsa-miR-195 and is 100% conserved in these regions. It has also been predicted computationally that the cellular hsa-miR-195 may interact with the HIV-1 Nef in the3’ LTR region based on a perfect complementarity of a 7 nucleotide seed sequence with its viral target [31]. Whether the microRNA-like sequences are of viral origin or products of provirus integration millions of years ago remains to be explored. However, the fact that this sequence is part of a functional coding region of a vital viral gene makes the integration event scenario seem unlikely. Our data suggest that the miR-like sequences present in the HIV-1 envelope region are viral RNA sequences, which emulate a cellular miRNA. The implications of a viral miRNA mimicking a cellular miRNA would be speculative. However, there is precedence for a viral miRNA to outcompete its cellular competitor. HIV-1 TAR RNA has been reported to act as a sort of miRNA-‘decoy’, decreasing the host cell’s RNAi activity by binding and sequestering TRBP, a TAR RNA-binding protein and an essential Dicer-cofactor [37]. We therefore propose that the hsa-miR-like sequences we have identified in the Env genes of several viruses may similarly be titrating out the related cellular miRNA targets. A major point of distinction between miRNAs and our newly identified miRNA-like sequences is that while cellular miRNAs are derived from non-coding regions of the DNA, the miRNA-like sequences we have identified are located in the coding regions of vital HIV-1 genes. The HIV-1 envelope glycoprotein contains 5 variable regions (V1–V5) interspersed by conserved regions C1–C5. The miRNA-like sequences we discovered have been mapped to the V1, V2, V4 and V5 regions of the HIV-1 envelope and are integral components of the HIV-1 gene, which codes for a functional envelope gp120 (Figure 3). The finding of several viral sequences homologous to cellular miR-30d, miR-30e, miR-374a, miR-424 and miRNA-195 in different regions of the HIV-1 genome but primarily in the envelope regions of several HIV-1 strains indicate that the phenomenon of cellular miRNA-like sequences in the HIV-1 genome may be widespread. Changes in the length of amino acid sequences or glycosylation patterns in the variable V-regions are critical to HIV-1 infection because they can affect not only the cellular tropism but can also modulate sensitivity to virus neutralization and disease progression [38]. Some of the major determinants that contribute to biological activities of HIV-1 strains including replication, viral tropism (ability to infect T-cells versus macrophages or other cell types), sensitivity to neutralization, modulation of the CD4 antigen, and cytopathogenicity are localized in the V1 to V5 regions of the HIV-1 envelope glycoprotein gp120 [39]–[41]. These sequences are critical for effective humoral responses and virus neutralization. The V1 toV5 regions of the HIV-1 envelope have been associated with the rate of replication, virus neutralization and pathogenesis of HIV-1 strains [42], [43]. While the primary virus replicates in the body, it becomes resistant to neutralization because some of the specific V1 to V5 domains are either lost or have been modified by increased numbers of N-linked glycosylation sites and therefore cellular immune responses to HIV-1 infection are compromised [39]–[41]. Finally, our findings of four different miRNA-like sequences within the V1, V2, V4 and V5 regions of the HIV-1 env gene may provide new insights that will contribute to a better understanding of the molecular complexities of HIV-1 infection and pathogenesis. The miRNA-like sequences may emulate a non-coding cellular miRNA and therefore could represent the first examples of human cellular homologues of miRNAs in HIV-1 coding regions. These sequences may therefore play a role in HIV-1 replication, immunity, and virus neutralization, and thus may influence pathogenesis in HIV-infected individuals. Detailed in vivo studies and construction of vectors containing the human microRNA-like sequences that we have discovered, would yield critical results that would allow us to develop a humanized mouse model system to test the effects of these vectors in vivo. These experiments will demonstrate the influence of microRNA-like sequences on virus replication, as well as antibody and antigen production in vitro and in vivo.

Materials and Methods Source of Data Previous proteomics and bioinformatics research in our laboratory had identified >200 differentially expressed, functionally relevant proteins in an HIV-1 infected CD4+ T-cell line (RH9) analyzed sequentially over a period of approximately 2 years [25], [26]. In this study, we used GeneSet2miRNA [44] to identify potential microRNAs that could impact the activities of our differentially regulated proteins. Using an adjusted p-value of the enrichment (adjusted for multiple testing by Monte-Carlo simulations) cutoff of 0.05, we identified 7 miRNAs that may significantly bind to multiple mRNA targets. We also selected one other miRNA because it was identified as the best single-model match. These 8 miRNAs are listed in Table 1. Identification of Homologous Sequences To identify sequences that could be homologous to HIV-1, we downloaded the full length and mature sequences of the 8 human microRNAs from the miRBase database (http://mirbase.org/) that had been shown to be significantly associated with the proteins modulated by HIV-infection of CD4+ T-cells. Each of the 8 miRNAs was used as a separate query, utilizing both the mature and full-length versions of each miRNA. The BLAST (Basic Local Alignment Search Tool) [45] program was used to search against the entire HIV-1 databases (http://blast.ncbi.nlm.nih.gov/) (HIV taxid: 11676) at the National Center for Biotechnology Information (NCBI) and the Los Alamos HIV databases (http://www.hiv.lanl.gov/content/sequence/BASIC_BLAST/basic_blast.html). All full-length and partial HIV-1 genome sequences, representative of all HIV-1 clades and strains, were used for the analyses. These sequences have been identified by the International Committee on Taxonomy of viruses (ICTV) and are available in the global public databases. The outputs from both database searches were compared and the best matches from all microRNA query searches were selected based on the length of the match, percentage of identity of match, lack of gaps or deletions, and inclusion of the seed sequence. Clustal Analyses and Mapping of Newly Identified Sequences The Clustal algorithm was used for multiple sequence alignments [46], [47] (http://www.ebi.ac.uk/Tools/msa/clustalw2/). We used the five most homologous HIV-1 sequences to hsa-miR-195 as identified by our BLAST searches and shown in Table 2, as well as the sequences of 17 other representative HIV-1 strains from 6 clades to generate alignments using the Clustal algorithm. Complete genome sequences from each of the representative HIV-1 strains were used to perform alignments of the different clades with our best matches. Results from the Clustal algorithm were then checked against the Los Alamos HIV Compendium (http://www.hiv.lanl.gov/content/sequence/HIV/COMPENDIUM/compendium.html) to verify that the alignments from both sources were in agreement. In addition to defining the specificity of sequence alignments, we used the TreeDyn software program for the construction of a sequence-based relational tree using the alignment data generated by the Clustal algorithm (http://www.treedyn.org/). The target regions of the alignment were then mapped to the HXB2 strain gene map using the Los Alamos National Laboratory HIV genome database (http://www.hiv.lanl.gov/) map, because this is one of the most complete reference sequence data maps available for HIV-1.

Acknowledgments We thank Zisu Mao and Jane M.C. Chan for technical assistance with our proteomics studies using two-dimensional gel electrophoresis and mass spectrometry respectively.

Author Contributions Conceived and designed the experiments: SR BH. Performed the experiments: SR BH JW. Analyzed the data: SR BH JW ML. Contributed reagents/materials/analysis tools: SR BH. Wrote the paper: SR BH.