Long noncoding RNAs (lncRNAs) have been described in cell lines and various whole tissues, but lncRNA analysis of development in vivo is limited. Here, we comprehensively analyze lncRNA expression for the adult mouse subventricular zone neural stem cell lineage. We utilize complementary genome-wide techniques including RNA-seq, RNA CaptureSeq, and ChIP-seq to associate specific lncRNAs with neural cell types, developmental processes, and human disease states. By integrating data from chromatin state maps, custom microarrays, and FACS purification of the subventricular zone lineage, we stringently identify lncRNAs with potential roles in adult neurogenesis. shRNA-mediated knockdown of two such lncRNAs, Six3os and Dlx1as, indicate roles for lncRNAs in the glial-neuronal lineage specification of multipotent adult stem cells. Our data and workflow thus provide a uniquely coherent in vivo lncRNA analysis and form the foundation of a user-friendly online resource for the study of lncRNAs in development and disease.

Here, we leveraged the SVZ-OB system to develop a greater understanding of lncRNA expression and function. First, we used Illumina-based complementary DNA (cDNA) deep sequencing (RNA-seq) and ab initio reconstruction of the transcriptome to generate a comprehensive lncRNA catalog inclusive of adult NSCs and their daughter cell lineages. This lncRNA catalog informed a subsequent RNA CaptureSeq approach, which increased the read coverage and read length for our SVZ cell analysis, validating the transcript structure and expression of many of these novel lncRNAs. Gene coexpression analysis identified sets of lncRNAs associated with different neural cell types, cellular processes, and neurologic disease states. In our analysis of genome-wide chromatin state maps, we identified lncRNAs that—like key developmental genes—demonstrate chromatin-based changes in a neural lineage-specific manner. Using custom lncRNA microarrays, we found that lncRNAs are dynamically regulated in patterns reminiscent of known neurogenic transcription factors. To define lncRNA expression changes throughout the SVZ neurogenic lineage in vivo, we acutely isolated the major cell types of the SVZ with fluorescent-activated cell sorting (FACS) and probed lncRNA expression with our custom microarrays. We integrated these diverse experimental approaches to develop an online resource useful for the identification of lncRNAs with potential roles in SVZ neurogenesis ( http://neurosurgery.ucsf.edu/danlimlab/lncRNA ). Furthermore, expression and shRNA-mediated knockdown experiments confirmed functional roles for lncRNAs identified by our integrative approach. Overall, our study demonstrates a generalizable workflow that assimilates genome-wide bioinformatic strategies with experimental manipulations for the identification of lncRNAs that regulate development.

The subventricular zone (SVZ) of the adult mouse brain represents an ideal system for the study of lncRNAs in vivo. Throughout life, SVZ neural stem cells (SVZ-NSCs) generate large numbers of neuroblasts that migrate to the olfactory bulb (OB), where they differentiate into interneurons ( Figure 1 A). In addition, SVZ-NSCs are multipotent, capable of generating astrocytes and oligodendrocytes, the other major cell types of the CNS. In contrast to the embryonic brain, wherein multipotent precursor cells are inherently transient, continually changing their developmental potential and location over time and with organ morphogenesis, the adult SVZ retains its NSC population in a stable, spatially restricted niche, producing neurons and glia throughout life (). This enduring population of multipotent stem cells and its well-characterized daughter cell lineages make the SVZ a particularly tractable in vivo model for molecular genetic studies of development. The SVZ has been used to elucidate key principles of neural development including the role of signaling molecules, transcription factors, microRNAs, and chromatin modifiers (). We have previously shown that the Mixed-lineage leukemia 1 (Mll1) chromatin-modifying factor is required for the SVZ neurogenic lineage (), and recent studies indicate that MLL1 protein can be targeted to specific loci by lncRNAs ().

(A) Schematic of sagittal section of adult mouse brain. SVZ neural stem cells give rise to migratory neuroblasts (red). These neuroblasts travel along the rostral migratory stream (curved arrow) before terminally differentiating and integrating into olfactory bulb (OB) neuronal circuits. Numbered schematics correspond to coronal brain sections highlighting dissected regions (yellow) used for RNA collection.

Emerging studies suggest that lncRNAs play critical roles in CNS development. For instance, in embryonic stem cells (ESCs), specific lncRNAs repress neuroectodermal differentiation (), and during in vitro differentiation of ESC-derived neural progenitor cells (ESC-NPCs), lncRNA expression is dynamic (). In the mouse brain, some lncRNAs are regionally expressed (), including among the six layers of the adult cortex (). In vivo functional data is limited, but mice null for the lncRNA Evf2 have abnormal GABAergic interneuron development and function (), and morpholino inhibition of two CNS-specific lncRNAs in zebrafish affects brain development ().

The mammalian genome encodes thousands of long noncoding RNAs (lncRNAs), and it is becoming increasingly clear that lncRNAs are key regulators of cellular function and development. Loss-of-function studies performed in cell culture indicate that lncRNAs can regulate gene transcription through the targeting and recruitment of chromatin-modifying complexes (). While it is now evident that lncRNAs have important cellular and molecular functions, how they participate in development in vivo is poorly understood.

The Dlx1/2 bigene cluster encodes lncRNA Dlx1as (), and this locus was also bivalent in ESCs and H3K27me3 monovalent in MEFs. We used SVZ-NSC monolayer cultures to further investigate the chromatin state of the Dlx1as TSS. In self-renewal conditions, Dlx1as was bivalent, and after 30 hr of differentiation, H3K27me3 decreased, correlating with the start of Dlx1as upregulation ( Figures S7 D and S7E). Interestingly, we also found enrichement of the H3K27me3-specific demethylase JMJD3 () at the Dlx1as TSS during differentiation ( Figure S7 F), suggesting that this chromatin-modifying factor plays a role in the activation of this lncRNA. Consistent with the transcriptional upregulation of Dlx1as during SVZ neurogenesis in vitro, we observed robust Dlx1as expression in SVZ regions with migratory neuroblasts and the OB core ( Figure 7 D). We designed two knockdown constructs targeting the splice junction of Dlx1as and verified that these constructs target Dlx1as and not full-length Dlx1 transcript ( Figure S7 G). Knockdown of Dlx1as caused a decrease in expression of Dlx1 and Dlx2 after 2 days of differentiation compared to control ( Figure S7 H), suggesting that this lncRNA can regulate expression of its protein-coding gene neighbors. After 7 days of differentiation, we found a nearly 3-fold decrease in Tuj1+ neuroblasts, and an ∼60% increase in the number of GFAP+ astrocytes. In contrast to knockdown of Six3os, the production of OLIG2+ cells was unaffected by Dlx1as knockdown ( Figures 7 B and 7E).

Like lnc-pou3f2, lncRNA Six3os was also monovalent for H3K4me3 in SVZ-NSCs and downregulated in neuroblasts. Consistent with these observations, Six3os transcripts were detected in the SVZ but not the OB core ( Figure 7 A). To further investigate the role of Six3os in SVZ NSCs, we used lentiviruses to separately introduce two different short hairpin RNA (shRNA) sequences to knockdown Six3os (LV-sh-Six3os-GFP) in monolayers of SVZ-NSCs. After confirming Six3os knockdown in proliferating NSCs ( Figure S7 C), we assessed neuronal and glial lineages from LV-sh-Six3os-GFP-infected cells in comparison to controls infected with LV-sh-luc-GFP. After 7 days of differentiation, there were 2-fold fewer Tuj1-positive cells and 3-fold fewer cells expressing OLIG2, a marker of the oligodendrocyte lineage (). These decreases were accompanied by an increase in the number of GFAP+ cells ( Figures 7 B and 7C).

(E) Analysis of Dlx1as knockdown after 7 days of differentiation. Two unique targeting sequences were used (sh-Dlx1as-4, sh-Dlx1as-7). Top: ICC for Tuj1 (red) and GFP (green); middle: ICC for GFAP (red) and GFP (green); bottom, ICC for OLIG2 (red) and GFP (green). Quantification of data is presented at right. Scale bars represent 10 μm. Error bars represent SEM, five to six replicates for control group, three per experimental group.p < 0.05,p < 0.01, compared to sh-Luci, two-tailed t test. See also Figure S7 and Table S5

(C) Analysis of Six3os knockdown in SVZ-NSCs after 7 days of differentiation. Two different constructs were used (sh-Six3os-1, sh-Six3os-2). Top: ICC for Tuj1 (red) and GFP (green); middle: ICC for GFAP (red) and GFP (green); bottom: ICC for OLIG2 (red) and GFP (green) after infection with a vector expressing shRNAs targeting Six3os (LV-sh-Six3os-GFP). Quantification of data is presented at right. Scale bars represent 10 μm. Error bars represent SEM, five to six replicates for control group, two to three per experimental group. ∗ p < 0.05, ∗∗ p < 0.01, compared to sh-Luci, two-tailed t test.

(B) Control (LV-sh-Luci-GFP) lentiviral infections in SVZ-NSC cultures after 7 days of differentiation. Top: immunocytochemistry (ICC) for Tuj1 (red) and GFP (green); middle: ICC for GFAP (red) and GFP (green); bottom: ICC for OLIG2 (red) and GFP (green).

(A) In situ hybridization (ISH) for Six3os using branched DNA probes. Positive signal is revealed by Fast Red alkaline phosphatase substrates, which appear as highly fluorescent, punctate deposits (left); DAPI nuclear counterstain is shown at the right. Blue boxes in SVZ and OB coronal schematics at left indicate regions shown at right. Scale bars represent 10 μm. V, ventricle; STR, striatum.

To facilitate the identification of lncRNAs with potential roles in SVZ neurogenesis, we constructed an online resource that allows the user to easily filter the lncRNA catalog for multiple variables including locus chromatin-state, expression in FACS-isolated SVZ cells, and regulation during in vitro neurogenesis ( http://neurosurgery.ucsf.edu/danlimlab/lncRNA ). Using this resource, we filtered for those lncRNAs that were bivalent in ESCs and H3K27me3 repressed in MEFs. Of this set, which includes lnc-pou3f2, 100 lncRNAs were monovalent for H3K4me3 in SVZ-NSCs ( Figures 5 A and 5B), which would predict expression in the adult SVZ; indeed, ISH ( Figures S7 A and S7B) revealed lnc-pou3f2 expression in the SVZ, and, as predicted by the FACS microarray data, this transcript was not detected in the OB ( Figure S7 B).

Similar to the cell culture data ( Figure 6 B), we found sets of lncRNAs that showed transient increases in transit-amplifying cells, repression throughout differentiation, and significant induction in the terminally differentiated neuroblast population ( Figure 6 E). By integrating our chromatin state maps with this microarray expression data, we were able to begin to define an “lncRNA signature” for each stage of neurogenesis in vivo ( Table S5 ).

To confirm the separation of SVZ cells, we examined differential mRNA expression. We found that 12,812 protein-coding probe sets were differentially expressed in a comparison between activated SVZ-NSCs and migratory neuroblasts ( Table S4 ). As SVZ-NSCs become activated and differentiate into transit-amplifying cells, Dlx1/2 and Ascl1 become upregulated (), and this was reflected in our transcriptional profiles ( Table S4 ). As expected, the transcriptome of migratory neuroblasts was enriched for Dlx1/2 downstream targets, including Dlx5 and Arx, as well as markers of young neurons, including Tubb3 ( Table S4 ). Thus, these transcriptomes represent distinct stages of the SVZ neurogenesis and also distinguish niche astrocytes from NSCs.

The adult SVZ contains three major cell types that represent a developmental continuum: (1) activated NSCs, which express glial fibrillary acidic protein (GFAP) and the epidermal growth factor receptor (EGFR), (2) transit-amplifying cells, which are EGFR positive but GFAP negative, and (3) migratory neuroblasts, which are CD24 positive ( Figure 6 C). In addition, the SVZ contains GFAP+ cells that do not express EGFR, and these have been termed “niche” astrocytes (). We used these cell-specific characteristics to perform FACS to acutely isolate cell populations representing each stage of this neurogenic lineage and the niche astrocytes ( Figure 6 D). cDNA libraries for each SVZ cell type were generated and hybridized to our custom lncRNA and standard gene expression microarrays. Expression levels of both protein-coding genes and lncRNAs were visualized by heat maps organized by k-means clustering (transcripts) and unsupervised hierarcical clustering (cell types) ( Figures 6 E and S6 C).

We next sought to define the dynamic changes in lncRNA expression during SVZ neurogenesis. SVZ-NSCs cultured as a monolayer can efficiently recapitulate key aspects of in vivo neurogenesis as assessed by immunocytochemistry (ICC, Figures 6 A and S6 A) (). We generated cDNA libraries from SVZ-NSC cultures in self-renewal conditions and after 1, 2, and 4 days of differentiation and hybridized to both custom lncRNA (see Experimental Procedures ) and standard gene expression arrays. Included in the set of upregulated transcripts were genes related to SVZ neurogenesis (e.g., Dlx1/2 and Dlx5/6) ( Table S4 ). Also as expected, genes expressed at higher levels early in the SVZ lineage (e.g., Egfr and Nestin) were in the set of downregulated transcripts ( Table S4 ). Like mRNA transcripts, lncRNAs also exhibited similar patterns of induction and repression ( Figures 6 B and S6 B) over this 4 day differentiation time course.

(E) Heatmap of lncRNAs differentially expressed throughout the SVZ lineage in vivo. Genes differentially expressed >2-fold between activated NSCs and neuroblasts were k-means clustered using the Pearson correlation metric, k = 5. Color bars at the right (dark blue, light blue, orange, peach, and green) represent gene clusters resulting from k-means clustering. See also Figure S6 and Table S4

(D) FACS plots for isolation of the SVZ lineage. Cells were dissociated from freshly dissected SVZ tissue from the hGFAP-GFP mouse and stained with EGF conjugated to the A667 fluorophore and a CD24 antibody conjugated to PE.

(B) Heat map representing expression of lncRNAs that were changed >4-fold from proliferation conditions to 4 days of differentiation. Color bars (orange, peach, light blue, and dark blue) at the right represent gene clusters resulting from k-means clustering, k = 4, Pearson distance metric.

(A) Immunocytochemistry (ICC) of SVZ-NSC differentiation in vitro. In proliferation conditions, the culture is composed of neural precursor cells including GFAP+ (green) NSCs. After growth factor withdrawal, cells in these cultures differentiate into Tuj1+ neuroblasts (red, increasing numbers at 2 days and 4 days).

Tissue-specific stem cells also retain bivalency at key loci, possibly reflecting retained gene expression plasticity (). Protein-coding genes that are bivalent in ESCs and NSCs were highly enriched for GO terms related to neurogenesis (e.g., neuron differentiation, axonogenesis; Figure S5 E). For instance, in adult SVZ NSCs, Dlx1 and Dlx2 are transcription factors required for interneuron development, and these were bivalent in NSCs ( Figure S5 F). Thus, the identification of lncRNAs that are bivalent in both ESCs and NSCs might enrich for those involved in neuronal differentiation. These criteria were met by 583 lncRNAs ( Figure 5 C), such as three splice variants encoded from an lncRNA locus located ∼50 kb upstream of protein-coding gene Odf3l1 ( Figure 5 D). We propose that lncRNAs bivalent in SVZ-NSCs are enriched for those that function in neuronal lineage specification.

One hundred lncRNAs have a similar pattern of chromatin-based changes ( Figure 5 A). Furthermore, 76% of this set of lncRNAs was also monovalent for H3K4me3 in ESC-NPCs, suggesting that these lncRNAs are common to an early neural development transcriptional program. An example is lnc-pou3f2: this lncRNA locus is 2 kb upstream of the locus for known neurogenic transcription factor POU3F2. Both the lnc-pou3f2 and the Pou3f2 loci were bivalent in ESCs, monovalent for H3K4me3 in NSCs, and H3K27me repressed in MEFs ( Figure 5 B). Given the known relationship between chromatin modifications and the expression of key developmental regulators, we propose that this set of lncRNAs is enriched for those that play roles in early neural commitment in the adult SVZ.

(D) A novel lncRNA locus ∼50 kb downstream of protein-coding gene Odf3l1. The promoter was bivalent in both SVZ-NSCs and ESCs. See also Figure S5

(C) Venn diagram demonstrating the number of lncRNAs that were bivalent in both ESCs and SVZ-NSCs.

(B) The Pou3f2 promoter and the promoter (yellow boxes) of a nearby lncRNA demonstrated a similar pattern of histone modifications (bivalent in ESCs, repressed in MEFs, and activated in SVZ-NSCs).

In ESCs, bivalent domains identify key developmental genes. As ESCs differentiate into lineage-specific cell populations, many of these bivalent genes become activated (monovalent H3K4me3) or repressed (monovalent H3K27me3), reflecting the lineage specification and restriction of developmental potential (). Thus, genes that are more likely to play roles in the neural identity of SVZ NSCs would be those that are bivalent in ESCs, activated in SVZ-NSCs, and also repressed (H3K27me3 monovalent or bivalent) in a nonneural cell type (MEFs). We found 302 protein-coding genes that meet these criteria, and analysis revealed that the most statistically significant GO terms for these activated genes pertain to early brain development ( Figure S5 C). For example, proneural Ascl1, Pou3f3, and Pou3f2 were bivalent in ESCs, H3K4me3 monovalent in SVZ-NSCs, and H3K27me3 repressed in MEFs ( Figure S5 D).

In SVZ-NSCs, 3,671 (40.8%) lncRNAs were marked by either H3K4me3 or H3K27me3, and 928 (10.3%) were bivalent ( Figure S5 A). As has been described for protein-coding genes, these TSS chromatin modifications correlated strongly with lncRNA expression levels: lncRNAs monovalent for H3K4me3 exhibited higher expression levels than those by marked by only H3K27me3 or bivalent chromatin domains (p < 0.0001, Mann-Whitney U, Figure S5 B). These data suggest that transcription of both lncRNAs and protein-coding genes utilizes similar chromatin-based regulatory mechanisms.

Methylation of histone lysine residues is a critical determinant of transcriptional activity (). In previous work, lncRNA loci have been identified in part by the presence of H3K4me3 at the TSS (). For protein-coding genes, H3K4me3 enrichment at the TSS correlates with active transcription, whereas H3K27me3 is associated with a repressed state. Genes that are “bivalent” for both H3K4me3 and H3K27me3 are generally silenced but remain transcriptionally “poised” for activation or repression (). To investigate whether lncRNA loci exhibit a similar correlation between promoter histone modifications and transcription, we performed chromatin immunoprecipitation sequencing (ChIP-seq) for H3K4me3 and H3K27me3 in SVZ-NSC cultures and included sequencing data from ChIP-seq and RNA-seq studies of mouse ESCs, ESC-NPCs, and mouse embryonic fibroblasts (MEFs) ().

The enrichment and longer reads provided by RNA CaptureSeq enabled the identification of rare lncRNAs as well as uncommon splice isoforms in the SVZ transcriptome, yielding more than 3,500 lncRNAs that could not be detected by the short-read sequencing technology. For example, CaptureSeq identified an lncRNA transcript with an intron overlapping Pou3f3, a known neurogenic transcription factor ( Figure 4 B). In addition to this discovery of an lncRNA locus downstream of Pou3f3, RNA CaptureSeq also identified splice isoforms that include exons of a previously annotated lncRNA (2620017I09Rik) that lies upstream of the Pou3f3 locus. Some lncRNAs are transcribed from multiple TSSs, which can be a challenge for transcript assembly. Adjacent to the locus of neurogenic transcription factor Nr2f1, CaptureSeq identified a series of lncRNAs originating from four unique TSSs. This organization of protein-coding gene and multiple lncRNAs is conserved in humans, hinting at an evolutionarily conserved functional significance ( Figure 4 C). Thus, RNA CaptureSeq, in addition to providing a genome-wide validation of our SVZ lncRNA analysis, demonstrated previously underappreciated complexity to the structure of lncRNA loci. A complete annotation of CaptureSeq-derived transcripts is available at http://neurosurgery.ucsf.edu/danlimlab/lncRNA

For our RNA CaptureSeq probe library, we tiled across 100 MB of putative lncRNA loci and 30 MB of protein-coding regions as a control. We used this library to capture SVZ cDNA for sequencing (5,882,293 reads, median length of 356 bases per read). As expected, de novo assembly of sequences accurately reconstructed protein-coding transcripts and previously annotated lncRNAs (median identity of 90% for protein-coding RefSeq genes and median identity of 95% for annotated noncoding RefSeq RNA). As an example, Evf1 and Evf2, lncRNAs with roles in neural development, have overlapping genomic structures (), and RNA CaptureSeq identified and distinguished both transcripts in the SVZ ( Figure S4 A). RNA CaptureSeq also eliminated sequencing bias related to transcript abundance ( Figures S4 B and S4C), and measured expression values were well correlated between CaptureSeq and conventional RNA-seq strategies ( Figure S4 D and Supplemental Experimental Procedures ).

Because many lncRNAs have not been previously annotated and are expressed at low levels, we employed a targeted RNA capture and sequencing strategy (CaptureSeq) to more robustly identify and characterize lncRNAs in the adult SVZ. With RNA CaptureSeq, cDNAs are hybridized to probe libraries tiled against the genomic regions of interest, eluted, and then sequenced ( Figure 4 A). Through this enrichment, the read coverage of targeted transcripts is dramatically increased (). Furthermore, by using a 454 GS-FLEX Titanium instrument for sequencing, we obtained longer reads, which improve the delineation of rare splice isoforms.

(C) CaptureSeq-derived reads correctly assembled known protein-coding gene Nr2f1 and identified four distinct TSSs for an lncRNA transcribed divergently from the Nr2f1 promoter. The syntentic region in human reveals a similar organization of CpG islands and divergent transcriptional start sites for noncoding transcripts. Genes derived from RefSeq are colored purple, and genes from Ensembl are red. See also Figure S4

(B) Isotigs assembled at the Pou3f3 locus revealed a distal transcriptional start site for a transcript that can be spliced into known noncoding RNA 2610017I09Rik.

(A) Schematic of RNA CaptureSeq procedure. We used Cufflinks’s lncRNA assembly to define putative lncRNA loci and designed tiled probe libraries against these loci. The cDNA library was then hybridized to this biotin-labeled probe library, and after purification by streptavidin, the enriched population of lncRNAs was sequenced by 454 (Roche) long-read chemistry.

Interestingly, some modules were also associated with human disease, notably Huntington’s disease (), Alzheimer’s disease, convulsive seizures, major depressive disorder, and various cancers () ( Figures S3 A–S3F). For instance, the striatal neuron module (salmon) correlated with a gene expression set misregulated in Huntington’s disease mouse models, suggesting a potential role for the 88 lncRNAs in this set in this neurodegenerative condition. Taken together, our coexpression analysis provides an important resource as a comprehensive annotation of lncRNAs to specific neural cell types in vivo and neurological disease states.

The dark red module ( Figure 3 E) was enriched for glial markers but also had a large number of known early neurogenic factors as prominent members ( Table S3 ). This module was specifically associated with the ventricular zone of the embryonic brain, which contains radial glia, the stem cells of the developing brain and precursors of the adult SVZ-NSCs. We additionally identified a module (red, Figure 3 F) specifically associated with the “stemness” transcriptional program and the cell cycle.

Using RNA-seq data from 22 samples ( Figure 2 A and Table S2 ), we constructed transcript coexpression networks comprised of both mRNAs and lncRNAs. For the 56 modules of coexpressed transcripts, we performed enrichment analysis using gene sets from the Molecular Signatures Database () and other sources () to relate modules (described by “color,” Figures 3 A–3F) to specific adult neural cell types including cortical neurons (purple), striatal neurons (salmon), ependymal cells (green), and oligodendrocytes (grey60) ( Table S3 ).

(A–F) Top of each panel: heat maps depicting expression levels for six modules of coexpressed transcripts (rows) in 22 samples (columns) representing various brain regions and cell lines. Samples are labeled as in Figure 2 . Red, increased expression; black, neutral expression; green, decreased expression. Middle of each panel: barplots of the values of the module eigengenes (), which correspond to the first principal component obtained by singular value decomposition of each module. Modules were characterized by performing enrichment analysis with known gene sets (see Table S3 and Supplemental Experimental Procedures ). Bottom of each panel: pie charts indicating the abundance of lncRNAs within each module. Module members are defined as all transcripts that were positively correlated with the module eigengene at p < 2.61 × 10 Supplemental Experimental Procedures ). See also Figure S3 and Table S3

To begin to infer functions for lncRNAs, we investigated the relationship between mRNA and lncRNA transcription by using gene coexpression analysis (GCA) to identify groups of transcripts, or “modules,” whose variation in expression correlate across different brain regions and developmental time points. For mRNAs, module membership distinguishes sets of genes that correspond to specific cell types and biological processes (), and a similar “guilt-by-association” approach has been used to assign putative functions to lncRNAs based on their coexpression with protein-coding genes ().

To explore lncRNA expression patterns in multiple adult brain regions and embryonic forebrain development, we analyzed RNA-seq data of the six layers of the adult cortex (), adult whole prefrontal cortex (PFC), adult preoptic area (POA), whole embryonic day 15 (E15) brain (), and specific regions of the developing E14.5 cortex (ventricular zone, intermediate zone, and cortical plate) () ( Figure 2 A and Table S2 ). Unsupervised hierarchical clustering of expression profiles revealed region-specific and temporally related expression of both mRNAs and lncRNAs ( Figures 2 B and 2C). We calculated a specificity score for each transcript () and found that the mean score was 0.57 (SD 0.21) for lncRNAs, while it was 0.45 (SD 0.17) for mRNAs (p < 10, Wilcoxon rank-sum test); thus, lncRNAs exhibit greater brain region and temporal specificity than mRNAs, suggesting that they play important roles in the determination and/or function of specific neural cell types.

(C) Hierarchical clustering results of lncRNAs expressed across all samples. The Pearson correlation coefficient was used as the distance metric. DG, dentate gyrus; STR, striatum; SVZ, subventricular zone; STR/SVZ, mixed dissection including both SVZ and striatal regions; OB, olfactory bulb; CTXA, cortical dissection layer 2/3; CTXB, cortical dissection layer 4; CTXC, cortical dissection layer 5; CTXD, cortical dissection layer 5; CTXE, cortical dissection layer 6; CTXF, cortical dissection layer 6b; POA, preoptic area; PFC, prefrontal cortex; E15, whole embryonic day 15 brain; VZ, ventricular zone of E14.5 cortex; SVZ/IZ, subventricular zone/intermediate zone of E14.5 cortex; CP, cortical plate of E14.5 cortex; ESC, cultured embryonic stem cells; NPCs, ESC-derived neural progenitor cells. See also Figure S2 and Table S2

To verify that the cDNA libraries of the SVZ and OB together represent a transcriptome enriched for adult neurogenesis, we first analyzed mRNA expression in the RNA-seq data. Differential gene expression identified 1,621 genes enriched >2-fold in the SVZ cDNA library as compared to the cDNAs from cells in the adjacent nonneurogenic striatum (76.4 million reads). As the primary site where NSCs and transit-amplifying cells proliferate, the SVZ was enriched for GO terms related to cell cycle and mitosis ( Figures S2 A and S2B). Neuroblasts migrate through the SVZ and into the OB, and, as expected, transcripts related to this migratory neuroblast stage of neurogenesis were enriched in these regions (). The SVZ/OB expression profile included transcription factors known to play key roles in adult neurogenesis, such as Dlx1, Dlx2, Ascl1, and Pax6 (). Furthermore, in situ hybridization (ISH) data from the Allen Brain Atlas () confirmed the regional expression of many of these SVZ/OB-enriched genes ( Figures S2 C and S2D), and the SVZ/OB transcriptional profile (923 genes) was enriched for GO terms related to cell migration, development, and neurogenesis ( Figure S2 E).

The transcriptional start site (TSS) of some lncRNAs is proximal (<10 kb) to the promoters of protein-coding genes (), and we found that the TSS of 2,265 lncRNAs (25.2%) in our catalog were located within 5 kb of a protein-coding gene promoter ( Figure S1 D). Gene onotology (GO) analysis with the genomic regions enrichment of annotations tool (GREAT) () revealed that these protein-coding neighbors are enriched for homeodomain-containing transcription factors, genes expressed in the brain, and genes that are typically repressed by Polycomb Repressive Complex 2 in ESCs ( Figure S1 E). While some lncRNAs had strongly correlated expression with their protein-coding neighbor, as a group they had no obvious correlation ( Figure S1 F), indicating that expression of this subset of lncRNAs is not likely related to local transcriptional activity of protein-coding genes.

To substantiate the noncoding nature of our lncRNA candidates, we used the coding potential calculator () and found that over 80% of these transcripts have essentially no protein-coding potential ( Figure S1 A). Consistent with previous studies, lncRNAs were expressed at lower levels than protein-coding genes (2.49-fold difference; Mann-Whitney U, p < 0.0001) ( Figure S1 B), and their exons were less strongly conserved than protein-coding exons by PhastCons scores ( Figure S1 C).

We used Illumina-based sequencing to obtain paired-end reads of these cDNA libraries from the SVZ (229 million reads), OB (248 million reads), and DG (157 million reads). To broaden our lncRNA catalog, we also included RNA-seq data from embryonic stem cells (ESCs) and ESC-derived neural progenitors cells (ESC-NPCs) (). With this collection of over 800 million paired-end reads, we used Cufflinks () to perfom ab initio transcript assembly. This method reconstructed a total of 150,313 multiexonic transcripts, of which 140,118 (93%) overlapped with known protein-coding genes. Our lncRNA annotation pipeline (see Figure 1 B and Experimental Procedures ) identified 8,992 lncRNAs encoded from 5,731 loci (see Table S1 available online). There were 6,876 (76.5%) novel ones compared to RefSeq genes, 5,044 (56.1%) were novel compared to UCSC known genes, and 3,680 (40.9%) were novel compared to all Ensembl genes. Interestingly, 2,108 transcripts (23.4%) were uniquely recovered from our SVZ/OB/DG reads.

Because lncRNAs exhibit tissue-specific expression, previous mouse lncRNA databases were not likely comprehensive for lncRNAs involved in adult neurogenesis. Thus, we identified lncRNAs expressed in the adult brain neurogenic niches by employing an RNA-seq and ab initio transcriptome reconstruction approach. First, we generated cDNA libraries of polyadenylated RNA extracted from microdissected adult SVZ tissue, which contains NSCs, transit-amplifying cells, and young migratory neuroblasts. To include the transcriptome of later stages of neurogenesis and neuronal function, we also generated cDNA libraries from the OB. Furthermore, we generated cDNA libraries from microdissected adult dentate gyrus (DG), the other major adult neurogenic niche, which locally contains all cell types of an entire neuronal lineage. Figure 1 A shows a schematic of regions used for the cDNA libraries.

Discussion

We performed an in-depth analysis of lncRNA expression of adult SVZ-OB neurogenesis, an excellent in vivo model system for the study of multipotent stem cells and neural development. Our use of two high-throughput sequencing-based approaches for the study of the lncRNA transcriptome (RNA-seq and RNA CaptureSeq) provided complementary data sets that together allowed the identification of thousands of novel lncRNAs, confirmation of rare transcripts, and resolution of previously unappreciated complexity of lncRNA loci.

Like the loci of genes encoding key developmental transcription factors, a subset of lncRNA loci showed changes of chromatin state during lineage specification. By integrating these chromatin state maps with data from custom microarrays and FACS purification of the SVZ lineage, our online resource ( http://neurosurgery.ucsf.edu/danlimlab/lncRNA ) and files ( Table S5 ) facilitate the identification lncRNAs with potential roles in adult NSCs as well as neural development. Interestingly, we found that Dlx1as is required selectively for the SVZ neuronal lineage, whereas Six3os appears to play a role in both neuronal and oligodendrocyte differentiation. These data indicate that lncRNAs can play key roles in the glial-neuronal lineage specification of multipotent adult stem cells.

Guttman et al., 2011 Guttman M.

Donaghey J.

Carey B.W.

Garber M.

Grenier J.K.

Munson G.

Young G.

Lucas A.B.

Ach R.

Bruhn L.

et al. lincRNAs act in the circuitry controlling pluripotency and differentiation. Lim et al., 2009 Lim D.A.

Huang Y.-C.

Swigut T.

Mirick A.L.

Garcia-Verdugo J.M.

Wysocka J.

Ernst P.

Alvarez-Buylla A. Chromatin remodelling factor Mll1 is essential for neurogenesis from postnatal neural stem cells. Bertani et al., 2011 Bertani S.

Sauer S.

Bolotin E.

Sauer F. The noncoding RNA Mistral activates Hoxa6 and Hoxa7 expression and stem cell differentiation by recruiting MLL1 to chromatin. Wang et al., 2011 Wang K.C.

Yang Y.W.

Liu B.

Sanyal A.

Corces-Zimmerman R.

Chen Y.

Lajoie B.R.

Protacio A.

Flynn R.A.

Gupta R.A.

et al. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. A recent model of lncRNA action suggests that lineage-specific lncRNAs become activated during differentiation and guide histone modifications that create cell type-specific transcriptional programs (). MLL1 is a trithorax group (trxG) chromatin-modifying factor that is enriched at Dlx2 during SVZ NSC differentiation and is required for proper Dlx2 expression (); however, how MLL1 is targeted to Dlx2 is not known. Interestingly, in mouse ESCs, lncRNA Mistral directly binds MLL1 and recruits it to Hoxa6 and Hoxa7 () and lncRNA HOTTIP recruits MLL1 through an interaction with WDR5 to distal HOXA genes in human fibroblasts (). Our work provides a useful resource for the identification of such lncRNAs. For instance, lncRNAs that immunoprecipitate with chromatin modifiers could be identified by hybridization to the lncRNA microarray and then filtered online by multiple other criteria (e.g., enrichment in neuroblasts, upregulation during neurogenesis, bivalency in ESCs, and repression in MEFs).

Our analysis of chromatin state maps and transcript expression suggest that histone modifications correlate with lncRNA expression in a manner similar to that of protein-coding genes. Some lncRNA loci were bivalent in both ESCs and SVZ-NSCs, and many of these lncRNA loci became transcriptionally active in SVZ neuroblasts, supporting their candidacy as key determinates of neurogenesis. In SVZ-NSC monolayer cultures, Dlx1as was bivalent and H3K27me3 repression decreased during neuronal differentiation ( Figure S7 D), correlating with the upregulation of Dlx1as transcription ( Figure S7 E). Interestingly, we also found enrichment of the H3K27me3-specific demethylase JMJD3 at the Dlx1as locus ( Figure S7 F), suggesting that active removal of repressive histone modifications plays a role in the expression of lncRNAs. Overall, our data raise the possibility that lncRNA loci, like protein-coding genes, are targeted by chromatin-modifying factors that have critical roles in development.

While this study attempted to be as comprehensive as possible, it is possible that some lncRNAs important for SVZ neurogenesis were not identified. The initial sequencing experiments were performed on microdissected tissues that contain several cell types. Even at our sequencing depth, transcripts that are expressed at low copy number in a small number of cells might not be detected. Despite this potential shortcoming, we were still able to identify thousands of previously unannotated lncRNA transcripts. Furthermore, our initial catalog proved sufficient for our primary objective, which was to integrate complementary data analysis strategies and experimental methods to identify lncRNA expression patterns coherent to an in vivo experimental model system.

The role of lncRNAs in development and disease is in the early states of investigation, and our analysis of the SVZ lineage provides a resource for the movement of this research into in vivo studies. More broadly, this work presents a generalizable workflow for the identification and categorization of novel transcripts, both coding and noncoding.