Identification of human-specific miRNAs

To identify miRNAs specific to the human genome, we searched for orthologs of all 1,733 annotated mature human miRNAs (miRBase11 version 17) in the genomes of 11 species: chimpanzee, gorilla, orangutan, rhesus macaque, marmoset, mouse, rat, dog, cow, opossum and chicken. To do so, we mapped miRNA precursors to each genome using reciprocal BLAST12 or reciprocal LiftOver13. For 1,412 out of 1,426 annotated human miRNA precursors (99%) there was at least one ortholog in at least one species (Supplementary Data 1). We next extracted mature miRNA orthologs from the precursor sequence alignment made using the Muscle sequence alignment algorithm14. On the basis of these data, we identified 10 mature human miRNAs with no detectable orthologs in any of the 11 species and 12 mature miRNAs with sequence changes in seed region that took place in the human lineage after the split with chimpanzee (Supplementary Table S1).

Expression pattern of human-specific miRNAs

To estimate functional roles of newly emerged or newly mutated human miRNAs, we examined expression levels of these miRNAs in two brain regions, the prefrontal cortex and the cerebellum, of humans, chimpanzees and rhesus macaques using high-throughput RNA sequencing (RNA-seq). In agreement with previous observations in flies10, more ancient miRNAs, such as those conserved among mammals, tended to have higher expression levels than more recently emerged miRNAs, such as primate-specific miRNAs (Fig. 1a,b). Accordingly, all but one human-specific miRNA were expressed at extremely low levels in the human brain or not expressed at all (Fig. 1a,b). The only exception was miR-941. In both brain regions it was expressed higher than other human-specific or primate-specific miRNAs. Furthermore, miR-941 expression in the brain was comparable to the median level of conserved mammalian miRNAs (Fig. 1a,b). No miR-941 expression was observed in brains of chimpanzees and macaques.

Figure 1: miR-941 expression features. Expression levels of miR-941 and other human-specific miRNA, primate-specific miRNA and miRNA conserved among mammals in the human prefrontal cortex (a) and cerebellum (b). Expression of miR-941 in human tissues (green), human tonsillar B-cell populations (TBCs) (purple), human cell lines (orange), AGO co-immunoprecipitations in THP-1 cells (yellow) and human ESC and EB cells (blue) (c). miR-941 expression levels were estimated based on RNA-Seq data as Transcripts Per Million reads (TPM): number of reads mapped to the transcript normalized by the number of total mapped reads. Northern blot analysis of miR-941 expression in human prefrontal cortex (PFC), kidney, cerebellum (CB) and visual cortex (VC). U6 RNA (RNU6) was used as a loading control (d). Sequence heterogeneity of 5′ termini of miR-941 and other human miRNA. Lower sequence heterogeneity corresponds to a more defined seed region sequence, characteristic of functional miRNA (e). Cytoplasmic enrichment of miR-941 and other human miRNA in THP-1 cells. Enrichment of mature miRNA in the cytoplasm rather than in the nucleus is characteristic of the majority of functional miRNA (f). Co-immunoprecipitation with AGO proteins of miR-941 and other human miRNA in THP-1 cells (right panel) and Jurkat cells (left panel). Association with AGO proteins, the key components of the RISC complex, is characteristic of functional miRNA (g). Full size image

Using published RNA-seq data from 23 tissues and cell lines, we further assessed miR-941 expression across human tissues and cell lines to obtain information for its tissue specificity. Besides the prefrontal cortex and the cerebellum6, miR-941 was expressed in liver, prostate, endometrium and six human tonsillar B-cell populations15,16,17, as well as in a wide range of human cell lines18,19 (Fig. 1c; Supplementary Table S2). Notably, miR-941 expression levels were substantially higher in cancer-derived cell lines and human embryonic stem cells (hESCs) than in normal tissues or differentiated hESCs (embryoid body cells) (Fig. 1c).

Is miR-941 a bona fide miRNA? By conducting northern Blot experiments, we confirmed the presence of mature miR-941 in human prefrontal cortex, cerebellum and kidney (Fig. 1d, see Methods). Further, our analysis of sequence variations in miR-941 reads indicated reduced heterogeneity of the mature miRNA 5′ terminus—a sequence feature associated with functional miRNA20 (Fig. 1e). Using RNA-seq data from THP-1 (human acute monocytic leukaemia cell line) nucleus and cytoplasm21, we further found that miR-941, like most functional miRNAs, is enriched in the cytoplasm (Fig. 1f). Finally, miR-941 was associated with AGO proteins, the key components of the RISC complex, in multiple AGO immunoprecipitation experiments conducted using various sequencing platforms-454, Illumina and SOLiD-in a number of human cell lines: hESCs, hNSCs, THP-1 and Jurkat cells22,23,24 (Supplementary Table S2, Fig. 1c,g). Notably, miR-941 was associated with AGO proteins at levels compatible to or exceeding those observed for conserved functional miRNAs (Fig. 1g). Thus, miR-941 displays all features of a functional miRNA.

miR-941 sequence evolution

In humans, miR-941 resides in the first intron of the DNAJC5 gene in chr20 q13.33. According to miRBase annotation, this region contains three copies of pre-miR-941, all capable of forming canonical stable hairpin structures (Fig. 2a). Remapping miR-941 precursor sequences to the human reference genome, we found not three, but seven copies of putative pre-miR-941 (Supplementary Fig. S1). Each of the seven precursor copies contained a stable hairpin structure including mature miR-941 and miR-941-star sequences (Fig. 2b,c). Mature miR-941 and miR-941-star sequences complement each other, leaving two-nucleotide overhangs—a feature indicative of processing by Drosha and Dicer enzymes7 (Fig. 2b). Reads corresponding to miR-941 and miR-941-star sequences could be identified in human (Fig. 2c), but not in chimpanzee or rhesus macaque RNA-seq data.

Figure 2: miR-941 sequence evolution. Alignment of the genomic regions containing miR-941 precursors between the human, chimpanzee, rhesus macaque (Indian and Chinese) and Denisova genome (a). For the Denisova genome, sequence read coverage is shown. miR-941 precursor locations in the human genome are drawn based on the miRBase annotation. Secondary structure of transcripts corresponding to the miR-941 precursor sequence from the human, Denisova, chimpanzee, rhesus macaque (Indian and Chinese) genomes (b). Locations of the mature miR-941 (red) and miR-941-star (blue) sequences in the human precursor sequence and their corresponding locations in the other species' precursors are shown. RNA-seq read coverage of the mature miR-941 (red) and miR-941-star (blue) in human tissues (prefrontal cortex and cerebellum), AGO co-immunoprecipitations experiments and human cell lines (c). The complete list of data sets used is listed in Supplementary Table S3. Full size image

In the human and macaque genomes, the miR-941 precursor region are composed of tandem repeats displaying greater interspecies than intraspecies variation, indicating rapid locus evolution (Supplementary Fig. S2a-e). Correspondingly, almost the entire repeat region is lost in the chimpanzee genome (Fig. 2a). One of the repeat copies present in the macaque genome differs from the rest and more closely resembles the human variant of the tandem repeats. It is therefore likely that tandem repeats present in the human genome were derived from this repeat variant, which has undergone copy number expansion and replaced other repeat variants in the human lineage (Supplementary Fig. S2f). It takes two copies of the human version of tandem repeats to form pre-miR-941, with the apex of the precursor stem loop structure coinciding with the boundary between repeats (Supplementary Fig. S2g). As a consequence, corresponding genomic regions in chimpanzees and macaque could not form stable miRNA precursor hairpins (Fig. 2a,b). To confirm the validity of the reference genome sequences, we amplified and sequenced the pre-miR-941 locus in one human, eight chimpanzees and six rhesus macaques (Supplementary Table S3). The sequences matched the reference genome sequences (Supplementary Fig. S3). These results demonstrate that miR-941 precursor sequence has evolved in humans, most likely after the human–chimpanzee split, through tandem repeat replacement and expansion.

To obtain more precise estimates of the miR-941 precursor emergence in the human evolutionary lineage, we examined the genome of Denisova—an extinct hominid species that diverged from the human lineage approximately one million years ago. Although overall genome sequencing coverage was relatively low (1.9-fold), we found that the corresponding genomic locus in the Denisova genome contains at least two copies of the miR-941 precursor sequence (Fig. 2a, see Methods). Thus, pre-miR-941 formation, as well as copy-number increase, took place between the chimpanzee and the Denisova bifurcations: between six to seven million and one million years ago (Fig. 3a).

Figure 3: miR-941 sequence copy number variation among human populations. Phylogenetic tree showing miR-941 precursor copy numbers in humans, Denisova, chimpanzees and rhesus macaques (a). Distribution of miR-941 copy numbers in human populations from the HGDP-CEPH Human Genome Diversity Cell Line Panel (b). Each circle represents a population, circle size is proportional to the number chromosomes sampled, colours represent proportions of copy miR-941 precursor copy numbers in each population. The number next to each circle indicates population identity, as listed in Supplementary Table S7. Average miR-941 precursor copy numbers differences among populations (c) as well as geographical regions (d). miR-941 copy number variation among geographical regions (e). Variation of the average copy number estimates and variation estimates was calculating by bootstrapping sequenced precursor loci 1,000 times. The labels indicate: AF: Africans, WA: Western Asians, EU: Europeans, CA: central and Southern Asians, EA: Eastern Asians, OC: Oceanians, NA: native Americans. Full size image

Interestingly, pre-miR-941 copy number might continue to change after human and Denisova split. In the human genome, pre-miR-941 is located in a genomic region displaying copy-number variation among four contemporary human populations: Yoruba, Caucasian, Chinese and Japanese25. This is not unexpected, given general instability of genomic regions formed by tandem repeats. To examine this further, we amplified and sequenced the pre-miR-941 locus in 558 individuals from 38 populations from the HGDP-CEPH Human Genome Diversity Cell Line Panel26. We found a large degree of variation in pre-miR-941 copy number among contemporary humans, ranging from 2 to 11 copies (Fig. 3b). This variation was not caused by PCR amplification artifacts, as indicated by replicate amplifications from six individuals of African descent. Further, both pre-miR-941 copy number and copy-number variation differed significantly among populations from different geographical regions (Kruskal–Wallis test for copy number difference, P=0.000065, Bartlett's test and Levene's test for copy number variation difference, P<0.000073). The average pre-miR-941 copy number decreased from the west to the east: from eight copies in sub-Saharan Africans to six copies in Eastern Asians (Fig. 3c,d). miR-941 precursor copy number variation was also significantly higher in sub-Saharan Africans compared with 'out of Africa' populations, with the exception of Oceanians and native Americans (Bartlett's test, P=0.00064, Levene's test, P=0.00078) (Fig. 3e).

Identification of miR-941 target genes

The seed sequence of miR-941 differs from seed sequences of other human miRNAs, suggesting specific regulatory effects. To identify potential targets and potential functions of miR-941, we transfected three human cell lines, 293T, HEK and HSF2, with miR-941 duplex or mock duplex. We then measured gene expression changes in each cell line 24 h after transfection, using Affymetrix Human Genome U133 Plus 2.0 microarrays. In all three cell lines, we observed significant overrepresentation of gene expression inhibition among miR-941 targets predicted by TargetScan27 or other five miRNA target prediction algorithms28,29,30,31,32 (Fig. 4a-c; Supplementary Fig. S4 and Supplementary Table S4). Because of the evolutionary novelty of miR-941, target site conservation was not required during the target prediction.

Figure 4: miRNA-941 targets identification and functional analysis. Cumulative distribution plots of log2-transformed gene expression fold-changes (LFC) for genes containing miR-941 target sites predicted by TargetScan (red) and all other expressed genes (grey) after transfection with miR-941 duplex or mock duplex in the three human cells lines: HSF (a), 293T (b) and HEK (c). Prevalence of the negative LFC measurements among predicted miR-941 targets indicates inhibitory effect of miR-941 duplex transfection, the y axis shows cumulative distribution function (CDF) of LFC distribution. Experimentally verified miR-941 target genes (pink) in hedgehog signalling pathway (d) and insulin signalling pathway (e). Full size image

We then classified predicted miR-941 targets downregulated after transfection with miR-941 duplex, but not the negative control, in all three human cell lines, as experimentally verified miR-941 target genes (Supplementary Data 2, see Methods). Compared with other genes expressed in the cell lines, experimentally verified miR-941 target genes showed significant enrichment in two KEGG pathways: hedgehog-signalling pathway and insulin-signalling pathway (hypergeometric test, Bonferroni corrected P<0.032). Notably, in both pathways, miR-941 targets some of the key annotated pathway components, including SMO, SUFU and GLI1 in the hedgehog-signalling pathway33 and IRS1, PPARGC1A and FOXO1 in the insulin-signalling pathway34 (Fig. 4d,f). To further test whether these experimentally verified miR-941 targets represent direct targets of miR-941, immunoprecipitation by Ago2 (Ago2-IP) was conducted in the human 293T cell line. We then used Affymetrix microarrays to compare concentrations of transcripts captured in Ago2-IP in cells overexpressing miR-941 and in negative controls. We found that genes containing predicted miR-941-binding sites and enriched in Ago2-IP in cells overexpressing miR-941, showed significant downregulation in miR-941 transfection experiments (Supplementary Fig. S5a and Supplementary Data 2). Repeating our analyses based on these targets we confirmed significant enrichment of these Genes in insulin signalling pathway (hypergeometric test, Bonferroni corrected P=0.041).

Evolution of miRNA-941-guided regulation

The emergence of miR-941 in humans might have led to the downregulation of genes containing corresponding binding sites in their 3′ UTRs. To test this, we examined expression of miR-941 target genes in human, chimpanzee and rhesus macaque brains. mRNA expression was measured in the prefrontal cortex of five human, five chimpanzee and two rhesus macaque adult individuals and in the cerebellum of five human, five chimpanzee and one rhesus macaque adult individuals using Affymetrix Gene1.0 ST arrays6. Using macaque as an out-group, we assigned expression changes to either the human or the chimpanzee lineages assuming maximum parsimony. For miR-941 target genes verified by miR-941 transfection, we found a significant excess of transcriptional inhibition in the human but not the chimpanzee lineage (Binomial test, P<0.004, Fig. 5a; Supplementary Table S5 and Supplementary Data 2). The excess of transcriptional inhibition in the human linage could also be observed using putative direct miR-941 target genes identified in Ago2-IP (Supplementary Data 2 and Supplementary Fig. 5b). Further, the inhibitory effects of miR-941 in the human brain largely overlapped between the two brain regions (binomial test, P=0.0000045). Notably, among the miR-941 target genes showing human-specific downregulation in both brain regions is the host gene of miR-941, DNAJC5, containing three candidate miR-941-binding sites in its 3′ UTR.

Figure 5: Evolution of miRNA-941-guided regulation. Transcriptional inhibition of experimentally verified miR-941 target genes in the prefrontal cortex (PFC) and cerebellum (CB) (a). Inhibition ratio was calculated as the ratio between the numbers of experimentally verified miR-941 target genes showing lineage-specific expression decrease and not showing such as a decrease on the human (black) and the chimpanzee (white) evolutionary lineages. The proportions of experimentally verified miR-941 target genes, showing lineage-specific expression decrease on the human and the chimpanzee lineages were compared using Binomial test. The asterisks show significance of transcriptional inhibition excess on the human lineage (*P<0.05, **P<0.01, ***P<0.001). Human-specific loss (HSL) (b) and human-specific gain (HSG) (c) of binding sites for miR-941 (blue) and for all annotated human miRNA conserved between humans, chimpanzees and macaques (grey). The inserts show numbers of miR-941 target gene gains (red) and losses (green) on the human and the chimpanzee evolutionary lineages. Regulatory effect of miR-941 on genes containing miR-941-binding sites lost on the human evolutionary lineage (d). Genes containing miR-941 predicted binding sites in the rhesus macaque and chimpanzee genomes, but not in the human genome, (red) were significantly downregulated compared with non-target genes (black) after miR-941 transfection in the three macaque cell lines (one-sided Wilcoxon signed-rank test, P=0.042) (right), but not in the three human cell lines (one-sided Wilcoxon signed-rank test, P=0.46) (left). The y axis shows log2-transformed gene expression fold-changes (LFC) in the cell lines after transfection. Full size image

Taken together, our results demonstrate that miR-941 is highly expressed compared with other newly emerged human-specific miRNA in the prefrontal cortex and cerebellum, as well as in multiple human cell lines, and has detectable regulatory effects on gene expression in the human brain. Although some of these regulatory effects might be beneficial, introduction of a new miRNA into an established regulatory network might also result in deleterious expression changes. In this case, natural selection would lead to rapid elimination either of miRNA-941 itself or of miR-941-binding sites responsible for deleterious regulatory effects. As miR-941 was not eliminated, we predict that more miR-941-binding sites would be lost in the human than in the chimpanzee lineage, as the latter was not exposed to miR-941 expression. Using macaque as an outgroup, we indeed found greater loss of predicted miR-941-binding sites in human than in chimpanzee. Specifically, eight miR-941-binding sites were lost in the human lineage and three in the chimpanzee lineage (Fig. 5b; Supplementary Table S6, see Methods). Compared with the predicted loss of binding sites based on all annotated miRNAs present in the human, chimpanzee and rhesus macaque genomes, this excessive loss of miR-941 sites is not expected to occur by chance (P=0.03, Fig. 5b). In contrast, we observed no difference in miR-941-binding site gains between human and chimpanzee (P=0.54, Fig. 5c).

Although these results are based on predicted miRNA-binding sites, we set out to test them further using experimentally verified targets of miR-941. We transfected two macaque kidney cell lines (LLCMK2 and FrhK-4) and one macaque skin fibroblast cell line with miR-941 duplex or a mock duplex. We then measured gene expression changes in each cell line 24 h after the transfection using microarrays. In all three cell lines we observed significant overrepresentation of gene expression inhibition among miR-941 targets predicted by TargetScan (Kolmogorov–Smirnov test, P<0.000037) (Supplementary Fig. S6). Furthermore, for genes containing miR-941 targets sites in both macaques and humans, there was a significant overlap of experimentally verified miR-941 targets between human and macaque transfection experiments (binomial test, P=0.0027). Importantly, genes containing predicted miR-941-binding sites in rhesus macaque and chimpanzee, but not in human, were significantly downregulated in the three macaque cell lines upon miR-941 transfection (P=0.042), but not in the three human cell lines (P=0.46) (Fig. 5d). Furthermore, genes that lost miR-941 target sites in the human lineage were significantly overrepresented among miR-941 targets experimentally verified in the three macaque cell lines (P=0.036). Taken together, these results show that miR-941 emergence and copy number increase in the human lineage indeed resulted in accelerated loss of miR-941-binding sites.

miRNA-induced downregulation could be avoided by other mechanisms that do not involve binding sites loss, such as competitive binding of RNA-binding proteins preventing target incorporation into the RISC complex35. To test whether any putative miR-941 targets might avoid downregulation by such indirect mechanisms, we looked for genes showing downregulation in miR-941 transfection experiments in rhesus macaque cell lines, but not in human cell lines. We identified 49 genes that contained miR-941-binding sites in both the human and the macaque genomes and showed no detectable downregulation after miR-941 transfection in the three human cell lines. Out of these 49 genes, 19 were downregulated after miR-941 transfection in macaque cell lines. Testing the abundance of these 19 genes in Ago2-IP conducted in human cells, we found significant under-representation of these transcripts in Ago2-IP compared with other predicted miR-941 target genes expressed in the brain (Wilcoxon signed-rank test, P=0.00067) (Supplementary Fig. S7). This result indicates that some of the predicted miR-941 targets might avoid downregulation by escaping incorporation into the RISC complex.