Imprinted small nucleolar RNAs (snoRNAs) are only found in eutherian genomes and closely related to brain functions. A complex human neurological disease, Prader-Willi syndrome (PWS), is primarily attributed to the deletion of imprinted snoRNAs in chromosome 15q11-q13. Here we investigated the snoRNA repertoires in the PWS locus of 12 mammalian genomes and their evolution processes. A total of 613 imprinted snoRNAs were identified in the PWS homologous loci and the gene number was highly variable across lineages, with a peak in Euarchontoglires. Lineage-specific gene gain and loss events account for most extant genes of the HBII-52 (SNORD115) and the HBII-85 (SNORD116) gene family, and remarkable high gene-birth rates were observed in the primates and the rodents. Meanwhile, rapid sequence substitution occurred only in imprinted snoRNA genes, rather than their flanking sequences or the protein-coding genes located in the same imprinted locus. Strong selective constraints on the functional elements of these imprinted snoRNAs further suggest that they are subjected to birth-and-death evolution. Our data suggest that the regulatory role of HBII-52 on 5-HT 2C R pre-mRNA might originate in the Euarchontoglires through adaptive process. We propose that the rapid evolution of PWS-related imprinted snoRNAs has contributed to the neural development of Euarchontoglires.

Funding: This research was supported by the National Natural Science Foundation of China (No. 30830066) to Liang-Hu Qu. National Natural Science Foundation of China (No. 81301431), China Postdoctoral Science Foundation (No. 2012M511866 and No. 2013T60824) to Yi-Jun Zhang. The National Natural Science Foundation of China (No.31370791); the funds from Guangdong Province (No. S2012010010510, S2013010012457); the project of Science and Technology New Star in ZhuJiang Guangzhou city (No. 2012J2200025) to Jian-Hua Yang. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Although an increasing number of imprinted snoRNAs have now been identified, they are mainly limited to a few model species, including human, mouse and rat. Since the PWS-related imprinted snoRNAs are closely associated with mammalian brain functions, it is of great interest to investigate the evolution of these highly dynamic non-coding RNAs. Nahkuri et al. suggest that the HBII-52 gene family originates from a non-imprinted snoRNA, SNORD119 [33] . Ogorelkova et al. revealed that the HBII-52 family was subjected to positive selection in Europeans populations, suggesting these gene are still undergoing evolution in human populations [34] . However, these studies did not fully reveal the evolutionary process of these imprinted snoRNA genes along with the differentiation of the eutherian or its significance to mammalian neural development. In this study, using a genome-wide snoRNA identification program developed by our group, snoSeeker [35] , we performed comprehensive identification and evolutionary analyses to reveal the pattern and process underlying the evolution of mammalian imprinted snoRNAs.

SnoRNA genes have traditionally been regarded as conservative components of the genome. SnoRNAs have been identified in a wide spectrum of organisms spanning eukaryotes to archaebacteria, suggesting that they represent an evolutionary ancient group of non-coding RNAs [25] . Homologues of almost all yeast snoRNAs can be found in the human genome [26] , [27] , and their modification sites in rRNAs are also highly conserved among distantly related eukaryotes [1] , [28] . However, the discovery of a large number of eutherian-specific imprinted snoRNAs led us to reexamine this supposedly ancient group of RNAs from a new perspective. Further investigations failed to find homologs of these imprinted snoRNAs in any other non-eutherian mammals or vertebrates [29] – [31] . In addition, a number of lineage-specific snoRNA genes were identified only in rodents rather than humans, and a rat-specific imprinted snoRNA cluster, RBII-36, containing 74 member genes, was not detected in mice (see the review of [32] ). These observations suggest that imprinted snoRNA genes represent a class of newly derived small non-coding RNAs that might be subjected to rapid evolution.

The top panel depicts the gene distribution across the entire PWS locus. Pink represents maternally expressed genes, blue represents paternally expressed genes, red represents imprinted snoRNA clusters, and the dashed wavy line represents non-coding transcripts. In the lower panels, each snoRNA gene is represented by a vertical bar. The distribution of snoRNA genes was scaled to their genomic location, and the colors indicate their respective gene families. Clusters are scaled relative to their size within the chromosome.

Small nucleolar RNAs (snoRNAs) belong to a large group of small, metabolically stable RNAs that primarily direct the post-transcriptional modification of ribosomal RNA (rRNA) nucleotides. Based on common sequence motifs involved in the assembly of sno-ribonucleoprotein (snoRNP) particles, snoRNAs fall into two major classes, the C/D box and H/ACA box groups, which account for the 2′-O-ribose methylation and pseudouridylation modifications respectively of rRNA or snRNA [1] , [2] . These snoRNAs are called ‘guide’ snoRNAs, and there is a fraction of snoRNAs, the so called ‘orphan’ snoRNAs, which do not target to any rRNA or snRNA. In recent years, hundreds of snoRNAs have been discovered in imprinted regions of the eutherian genomes (imprinted snoRNAs) [3] – [8] . The expressions of imprinted genes are depending on the parent of origin. Unlike other snoRNAs, which show broad expression profiles, imprinted snoRNAs are mainly expressed in the brain and are closely associated with brain function [9] , [10] . Another feature of imprinted snoRNAs is that they usually fall within large gene clusters containing several gene families. One of such family usually contains dozens of snoRNA genes. Two large paternally expressed C/D box snoRNA families, HBII-85 (SNORD116) and HBII-52 (SNORD115), are located in the 15q11-q13 imprinted region of the human genome. Large interstitial deletions of this region underlie ∼70% of cases of Prader-Willi syndrome (PWS, [MIM176270]), a complex human neurological disease [11] . Duplication of the same region is the only recurrent cytogenetic aberration associated with autism, occurring in up to 5% of autism cases [12] – [14] . The clinical symptoms of PWS include neonatal hypotonia, feeding difficulties and failure to thrive during infancy, excessive weight gain after 18 months, hyperphagia, hypogonadism, global developmental delay and equivocal facial features. Deletion of the HBII-85 snoRNA cluster results in an exhibition of all major clinical symptoms of PWS in humans [15] – [17] , but the role of the HBII-52 cluster in PWS has been difficult to define [11] , [18] . The neuronal-specific HBII-52 snoRNAs have been reported to participate in the post-transcriptional regulation of the pre-mRNA encoding the 5-hydroxytryptamine 2C receptor (5-HT 2C R), an important neurotransmission protein, including A-to-I RNA editing and alternative RNA splicing, with in vitro experiments [19] – [21] . The biological importance of the regulatory role of HBII-52 in vivo, if any, remains to be demonstrated. Knockout of MBII-52 (murine homologues of HBII-52) in a murine model led to an increase in the editing of 5-HT 2C R pre-RNA and alterations in a number of 5HT 2C R-related behaviors, including impulsive responses, locomotor activity and reactivity to palatable food [22] . Although the molecular targets of HBII-85 are as yet unknown, an increasing number of studies have investigated the regulatory roles of HBII-85 snoRNAs and their connection to PWS phenotypes [15] – [17] , [23] , [24] . In addition, the PWS locus contains several singleton snoRNA genes of unknown function, e.g., HBII-438, HBII-13 and HBII-436 ( Figure 1 ).

To investigate the targeting pattern of HBII-52 gene families in different eutherians species, homologous sequence to human 5-HT 2C R mRNA of different species were got by BALSTP program from ensembl genome web site ( www.ensembl.org ). The binding pattern between ASE2 sequences and 5-HT 2C R mRNA in same species was analyzed by BLAST method with some modifications. Two major characters, the length of perfect binding and the distance of perfect binding from D box, were analyzed.

The K a /K s or d n /d s test is one of the most frequently used tests of selection. The test relies on the simple fact that the rate of evolution can exceed that of a neutral sequence only if there is positive selection [51] . Ka/Ks test for protein-coding gene was performed with MEGA 4.0 software [52] .

Tajima’s D test was performed with DnaSP software version 5.10.01 [49] . For the Tajima’s neutrality test, significant deviations of Tajima’s D values from zero suggest rejecting the standard neutral model and negative values of Tajima’s D indicate positive selection. All the gene sequences of HBII-52 or HBII-85 gene family from one species were used for the test. The statistical significance was obtained assuming that D follows the beta distribution [50] .

According to the secondary structure we divided box C/D snoRNs gene into 5 parts: the stem region, C/D boxes, antisense element 1 (ASE1), antisense element 2 (ASE2) and the remaining sequence. To investigate the evolutionary rates of different parts of the imprinted snoRNA genes, we created an alignment for HBII-52 or HBII-85 family and defined different structural components in the alignment. The evolutionary distances (K) between humans and each of the other species were calculated for each of the five parts of the snoRNAs using the Kimura 2-parameter model.

To test whether the rapid rate of evolution was specific to the imprinted snoRNAs, we compared the evolutionary rate between imprinted snoRNA genes and their flanking intergenic sequences. We obtained 500 bp of both upstream and downstream flanking sequences for each analyzed imprinted snoRNA gene from the UCSC genome website. The substitution rates of both imprinted snoRNA genes and their flanking sequences were calculated as described above. Similar analyses were also performed using non-imprinted snoRNA genes as the reference.

For each imprinted gene family, the substitution rates between humans and each of the other species were calculated using the “between-group average calculation” method with the Kimura 2-parameter model implemented with MEGA ver. 3.1 [37] , [48] . A “group” corresponds to a multi-gene family from one species, which contains all gene members in this family. For each between-group average, an arithmetic average is computed for all valid inter-group pairwise comparisons. The standard error for each calculation was estimated by bootstrap analysis with 500 replicates. To calculate the substitution rates of non-imprinted snoRNA genes, we first identified all of the families that were conserved between humans and each of the other species ( Table S2 , the sequences were provided in data S1 and S2 ). The guide snoRNAs and orphan snoRNAs were analyzed separately. The orphan snoRNA genes were identified by the absence of targeting sites in rRNA or snRNA according the information in snoRNA-LBME-db ( https://www-snorna.biotoul.fr/index.php ) [27] . These orphan snoRNAs were listed in Table S2 . For multi-gene families, one gene member and its orthologous in the other species s was selected for the analysis. Then gene sequences from one species were linked together in order of gene name and an aliment was built with this integrated sequence of human and each of the other species. Next, the average substitution rates of the conserved families were calculated with the Kimura 2-parameter model. All of the calculations were performed using MEGA version 3.1 [37] .

The number of new genes was calculated as follows. For imprinted snoRNAs, the number of newly derived genes was estimated according to the evolutionary change in gene number described above, whereas the number of newly derived non-imprinted snoRNAs was estimated by determining the difference in gene number between closely related species. Birth rates of new genes were calculated by dividing the number of newly derived genes by the lapsed time since the species diverged from their most recent common ancestor. The time scale was set according to Hedges et al. [47] , and the weighted average data were selected.

To investigate the expansion process of imprinted snoRNAs, a combined physical-phylogenetic map for the HBII-52 or HBII-85 gene family of human or mouse was constructed. The phylogenetic trees were built with the Bayesian method as described above, each Markov chain ran for 1×10 7 generations. The clades receiving >50% posterior probabilities were kept and the derived consensus trees were reconciled with gene physical locations in the chromosomes.

To estimate the numbers of gene in ancestral species and gene gains and losses, we used the reconciled tree method [42] – [44] , in which the topology of a gene tree is reconciled with that of a species tree. A simple example is shown by Niimura and Nei [45] and Nam and Nei [44] . To apply this method to imprinted snoRNA genes, we requested the computer program described in Niimura and Nei [45] , [46] from Y. Niimura. We first constructed a phylogenetic tree using all genes of the two multi-gene families (HBII-52 and HBII-85) together with outgroup genes chosen from other close related families. The reconciled tree method was applied to these two phylogenetic trees and the Euarchontoglires tree topology. Given the relative low bootstrap value caused by short sequence of snoRNA genes, the 50% bootstrap condensed trees were used here.

To further investigate the phylogenetic relationship within the HBII-52 and HBII-85 multi-gene families, the alignment of each family was built as described above. The phylogenetic relationships were reconstructed using MrBayes ver. 3.1 [36] with a HKY+G model suggested by MrModeltest (ver. 2.2) [40] . Every analysis consisted of two runs each containing four Markov chains that ran for 2×10 6 generations, sampled every 1000 generations, with a random starting tree, default priors and equal branch lengths. The consensus trees were edited with iTOL [41] . The sequence alignments are provided in Data S2 .

To reconstruct the phylogenetic relationship of imprinted snoRNA genes in the PWS locus, a dataset containing gene sequences from all families in this locus (i.e. HBII-436, HBII-13, HBII-438, HBII-85 and HBII-52) was generated. For gene families containing more than 20 members, 4–6 representative genes were selected based on a gene tree constructed using the neighbor-joining method implemented in MEGA ver. 3.1 [37] ; for gene families containing less than 20 members, all sequences were added to the dataset. The sequences were aligned with ClustalX [38] . The sequence alignment is provided in data S1 and S2 . Then a phylogenetic network was constructed using the NeighborNet method implemented in SplitsTree ver. 4.10 [39] .

For imprinted snoRNA gene family classification, BLAST search was first conducted to identify homologs to know human or rodents imprinted snoRNA genes, the cutoff is set to 10 −3 . Nevertheless, due to algorithmic characteristics and sensitivity constraint, this classification method will lead to ambiguities when dealing with the species that are distantly related to the reference. In addition, the genomes of some species were poorly assembled, e.g. the elephant and the tenrec genome, which could not provide accurate gene mapping information to validate the classification results. Thus, phylogenetic methods were further exploited to classify imprinted snoRNA candidates into known gene families. For this purpose, phylogenetic trees containing imprinted snoRNA candidates of each species were constructed with the human imprinted snoRNA genes as the reference for family classification. The threshold value of branch support for the assignment of a candidate to known gene families was >50%. The phylogenetic trees were constructed with MrBayes ver. 3.1 [36] with a HKY+G model.

Two independent strategies were employed to search imprinted snoRNA candidates in the PWS loci: (1) The comparative genomics identification strategy with WGAs was performed as described in Yang et al. [35] . (2) The single-genome strategy was performed by searching the PWS homologous regions with snoSeeker program ( http://202.116.74.192/snoSeeker/downloads.php ) under the cutoff value of >24.0. The non-redundant candidates from these two strategies were combined for further gene family classification. The sequences of non-imprinted snoRNA genes of 15 vertebrates (with the addition of opossums, platypus and chicken) were identified from the WGAs of human and each of other species as described in Yang et al. [35] . All identified imprinted and non-impritned snoRNA gene sequences were provided in the Data S1 .

The genome sequences and related whole genome alignments (WGAs) of 12 placental mammals (Human, Chimp, Rhesus, Rat, Mouse, Dog, Cat, Horse, Cow, Armadillo, Tenrec and Elephant) were downloaded from the UCSC genome website ( http://genome.ucsc.edu/ and http://genome-test.cse.ucsc.edu/ ). These species represent all major lineages of extant placental mammals. The range of human PWS locus was identified by the gene pair of MKRN3 and ATP10A. The homologous region of PWS in other species was obtained using the “covert” tool of the UCSC genome website and the gene content was used as an index when the result of covert tool is poor. The information of PWS loci in different species is listed in Table S1 .

Results

Identification and Characterization of Imprinted snoRNAs in the PWS Homologous Locus of 12 Eutherian Genomes Both single genomic and comparative genomic strategies were applied to identify imprinted snoRNA genes from 12 placental mammals to achieve high sensitivity and accuracy simultaneously (Table S1). To classify the snoRNA gene families, BLAST searches and phylogenetic analyses were combined to avoid the misassignment of individual gene members to known gene families. A total of 613 imprinted snoRNA genes were identified in the PWS homologous loci (the 15q11-q13 region of the human genome) of 12 eutherian genomes (Table 1, all snoRNA gene sequences were provided in Data S1). In general, primates and rodents have many more imprinted snoRNAs than other mammals, mainly reflected by the big volume of the HBII-52 and the HBII-85 multi-gene families. On average, primates have 77 PWS-related imprinted snoRNAs, rodents have 112, while Laurasiatheri and Xenarthra have only 17. The number of imprinted snoRNAs in elephants was comparable to that of primates. Certain gene families are missing from the ancestral lineages of placental mammals, such as Xenarthra and Afrotheria. For example, the HBII-85 gene family was missing from the tenrec genome, whereas HBII-52 was absent from the cow and the armadillo genomes. These observations may have two explanations: the first is due to the incomplete assembly of these genomes, the second one attributes to particular gene loss events occurred in these lineages. Conversely, extensive expansions of particular gene families have occurred in the mouse and elephant, leading to an excess of HBII-52 genes (130 and 70, respectively). The gene numbers of the multi-gene families (HBII-52 and HBII-85) predicted in this study were slightly lower than those of previous studies on the model species (e.g. human and mouse) [4], [5], [53]. It may attribute to the more stringent criteria for snoRNA identification used in the snoSeeker program [35]. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Statistics of imprinted snoRNA genes in the PWS locus of mammalians. https://doi.org/10.1371/journal.pone.0100329.t001 Based on the genomic location information of each snoRNA gene, we illustrated the precise distribution of the imprinted snoRNA genes in the PWS homologous loci of representative species (Figure 1). The size of the PWS imprinted snoRNA cluster was notably expanded in rodents. The cluster sizes of both mouse and rat were approximately two-fold that of the human. In both the rat and mouse genomes, the PWS-related snoRNA clusters were extended by the insertion of a long heterogeneous fragment containing no snoRNA genes. However, the insertion sites were different, occurring within the HBII-52 gene family in the rat genome and in the HBII-85 gene family in mouse. Interestingly, the size of the imprinted snoRNA cluster does not appear to determine the gene number. Despite similar cluster sizes, the dog has only 25% of the snoRNA genes presented in human. These results imply that the PWS imprinted snoRNA cluster is located in a highly active genomic region, a hot spot for chromosomal rearrangements and gene duplications as observed in other imprinted loci [54].

Phylogenetic Analyses The phylogenetic network indicated that each gene families could form a single phylogenetic clade (Figure. S1). The phylogenetic relationship among the gene families exhibits consistence with their physical distribution in the chromosome. For example, HBII-436 and HBII-13, which are closely distributed in the locus, showed close phylogenetic relationship. Interestingly, the HBII-438 family, consisting of two gene members HBII-438A and HBII-438B, was separated by the large gene cluster of HBII-52 and HBII-85. This suggests these two gene cluster might origin from an insertion of ancestral gene(s) within the HBII-438 family and followed by extensive expansions [33]. Figure 2 shows the intra-family gene relationship of the HBII-52 and the HBII-85 family. A remarkable feature exhibited by the phylogenetic trees is that gene members from same species always form a single clade, especially in non-primates species, suggesting their close relationship. In primates, an intermixed pattern among species was observed, implying of a progressive evolutionary process of these gene families. This pattern was also suggested by the phylogenetic network shown in Figure S1. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Phylogenetic analysis of HBII-52 and HBII-85 gene family. Abbreviations: HS, human; PT, chimpanzee; RM, rhesus; MM, mouse; RN, rat; CF, dog; FC, cat; EC, horse; BT, cow; DN, armadillo; LA, elephant. https://doi.org/10.1371/journal.pone.0100329.g002

Recent Creation of PWS-related Imprinted snoRNA Genes through a Birth-and-death Process To further explore the evolutionary history of the PWS-related imprinted snoRNA genes, we estimated the number of imprinted snoRNA genes in all ancestral organisms and their increases and decreases at different stages during the evolution of eutherian. Considering that their functional importance and gene volume, we focused this analysis on the two multi-gene families, HBII-52 and HBII-85. To estimate the gene number in ancestral species as well as gene gains and losses, we used the reconciled tree method [42]–[44], in which the topology of a gene tree is reconciled with that of a species tree. The analysis was based on the “Euarchontoglires tree” proposed by Murphy et al. [55], which suggested that primates and rodents are sister groups forming a clade named Euarchontoglires. Figure 3 summarizes the gene numbers estimated for each ancestral species and the associated changes along the different lineages. The numbers of PWS-related imprinted snoRNA genes were extremely low in ancestral organisms as compared with extant species. Interestingly, the evolutionary history of PWS-related snoRNA families is dominated by gene increases (Figure 3). In both the HBII-52 and HBII-85 families (Figure S2 and S3), most (55–98%) of the existing family members arose from lineage-specific gene gain events. In the human genome, 31 of the 41 HBII-52 genes were gained after the species diverged from chimpanzees; for the mouse, 127 of the 130 HBII-52 genes were gained after the species diverged from rats. In contrast, gene loss events were very rare. Similar results were observed for the HBII-85 cluster. Figure 3 provided an overview of the evolutionary dynamics of PWS-related imprinted snoRNA genes in mammals, which reflects the rapid gene turnover in these families. It also suggeststhat the extant genes in each species primarily arose through a lineage-specific gene gaining process occurring in each species’ recent history. This trend can also be intuitively inferred by their grouping patterns, i.e., gene members from the same species were always clustered together (Figure S1 and Figure 2). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Gene numbers and changes in PWS-related imprinted snoRNA genes with respect to eutherian divergence. Numbers in black indicate the current gene number in each species, whereas red numbers indicate the ancestral gene number, blue numbers denote gained genes, and green numbers indicate lost genes. The major diversification events of eutherians are indicated on the timescale. MYA, a million years ago. https://doi.org/10.1371/journal.pone.0100329.g003 Since the results suggest rapid intra-species gene expansion occurred during eutherian evolution, especially in primates and rodents, we want to ask how this happened. For this purpose, combined physical-phylogenetic maps of the HBII-52 and HBII-85 gene families in human and mouse were constructed to investigate gene duplication processes (Figure S4). There are several models proposed for gene duplication [56]. The tandem duplication model predicts a close phylogenetic relationship for physically neighboring genes. This pattern was observed in all the investigated gene families, i.e. HBII-52, HBII-85, MBII-52 and MBII-85. However, in the HBII-52 and MBII-52 family, a lot of physically distant genes show phylogenetic connections, which seems like the duplicated genes have jumped or trans-located from their original sites. This pattern could be explained with a ‘regional ectopic duplication’ model, which usually refers to duplication of individual or a small group of genes to an unlinked locus [57], [58]. The mechanism behind this may be recombination during the gametogenesis. These results also suggest different duplication processes were favored by different gene families. The tandem duplication plays a major role in the recent expansion of the HBII-85 and MBII-85 family; while the regional ectopic duplication dominates the expansion process of the HBII-52 and MBII-52 families. The birth-and-death model of gene evolution predicts the death of genes, or “gene degradation”. Consistent with this idea, we found pseudogenes in all species (Table S3, Figure S5). Gene degradation mainly occurs through the loss-of-function mutations that impair snoRNA processing or snoRNP formation, e.g., in the C/D boxes. Primates and rodents generally exhibit much lower pseudogene ratios than other species (Table S3). For example, in the HBII-52 family, the average pseudogene ratios were 0.138 and 0.158 for primates and rodents, respectively, but 0.667 for other species. This observations suggest that a trend toward gene expansion was favored in both primates and rodents and explains why these groups have more PWS-related imprinted snoRNA genes.