F0 (hs119, hs121, hs122) or F1 (hs123) generation embryos aged E12.5 were screened for fluorescent reporter transgene expression using an MZ16 Stereomicroscope (Leica Microsystems) outfitted with a SOLA SM 365 Light Source (Lumencor). Whole forebrain tissue was dissected from all embryos displaying fluorescent reporter activity in a pattern identical to the previously described enhancer activity pattern. For each enhancer, forebrain tissue from at least two independent founders or founder lines was pooled together. Room temperature Accumax Cell Dissociation Solution (Innovative Cell Technologies) was added to the tissue, and this was briefly pipetted to generate single cell suspensions. Cells were then immediately passed through a 40 micron strainer to remove non-dissociated tissue. Samples were kept on ice throughout the collection and preparation process to prevent RNA degradation. For each experiment, human HEK293T/17 cells (ATCC) were spiked in to a final concentration of 2.5% to serve as an internal quality control for doublet frequency. The final cell concentration was then adjusted to 50 cells/μl. Single-cell RNA sequencing was performed using the Drop-Seq method () according to protocol version 3.1 from http://mccarrolllab.com/dropseq/ . Briefly, cells and ChemGenes Beads (Lot 011416B) were captured in aqueous droplets containing lysis buffer using a microfluidic device (Nanoshift). Droplets were recovered and broken with the addition of 6X SSC and perfluorooctanol (Sigma) followed by mixing and centrifugation. Beads released from the droplets were recovered, washed, and suspended in reverse transcriptase buffer. Captured mRNAs were reverse transcribed using Maxima H- Reverse Transcriptase (Thermo Fisher Scientific). Beads were then washed, and excess bead primers were removed with Exonuclease I (NEB). Beads were again washed and counted. Template switching PCR was performed using Kapa HiFi Hotstart Readymix (Kapa Biosystems). cDNAs were captured and purified using AMPure XP beads (Beckman Coulter), and library quality was assessed using a BioAnalyzer High Sensitivity Assay (Agilent). Sequencing libraries were generated using the Nextera XT kit (Illumina), purified with AMPure beads, and sequenced paired-end on an Illumina HiSeq2500.

Macosko et al., 2015 Macosko E.Z.

Basu A.

Satija R.

Nemesh J.

Shekhar K.

Goldman M.

Tirosh I.

Bialas A.R.

Kamitaki N.

Martersteck E.M.

et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.

Dobin et al., 2013 Dobin A.

Davis C.A.

Schlesinger F.

Drenkow J.

Zaleski C.

Jha S.

Batut P.

Chaisson M.

Gingeras T.R. STAR: ultrafast universal RNA-seq aligner.

Macosko et al., 2015 Macosko E.Z.

Basu A.

Satija R.

Nemesh J.

Shekhar K.

Goldman M.

Tirosh I.

Bialas A.R.

Kamitaki N.

Martersteck E.M.

et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.

Wang et al., 2017 Wang B.

Zhu J.

Pierson E.

Ramazzotti D.

Batzoglou S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning.

Ester et al., 1996 Ester M.

Kriegel H.P.

Sander J.

Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise.

Finak et al., 2015 Finak G.

McDavid A.

Yajima M.

Deng J.

Gersuk V.

Shalek A.K.

Slichter C.K.

Miller H.W.

McElrath M.J.

Prlic M.

et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data.

Benjamini and Hochberg, 1995 Benjamini Y.

Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing.

Sing et al., 2005 Sing T.

Sander O.

Beerenwinkel N.

Lengauer T. ROCR: visualizing classifier performance in R.

Motenko et al., 2015 Motenko H.

Neuhauser S.B.

O’Keefe M.

Richardson J.E. MouseMine: a new data warehouse for MGI.

Chen et al., 2017 Chen Y.-J.J.

Friedman B.A.

Ha C.

Durinck S.

Liu J.

Rubenstein J.L.

Seshagiri S.

Modrusan Z. Single-cell RNA sequencing identifies distinct mouse medial ganglionic eminence cell types.

Kiselev et al., 2017 Kiselev V.Y.

Kirschner K.

Schaub M.T.

Andrews T.

Yiu A.

Chandra T.

Natarajan K.N.

Reik W.

Barahona M.

Green A.R.

Hemberg M. SC3: consensus clustering of single-cell RNA-seq data.

Raw data analysis and digital expression quantification was carried out using the Drop-Seq published pipeline version 1.11 (). DetectBeadSynthesisErrors and DigitalExpression were run setting NUM_CORE_BARCODES to 10,000. The alignment was performed using STAR version 2.4.1d (). Custom genomes and transcriptomes were generated using iGenome annotations for mm10 and hg19, along with the sequences of the transgenes, mCherry and GFP (SUN1-sfGFP). All the downstream analyses were performed using the statistical computing environment R v.3.3.1 ( www.r-project.org ). First, for each one of the five libraries, the expression counts for each transcript was retrieved for the top 10,000 STAMPs. By means of the human cells spiked-in, the purity was determined using 90% as cutoff (namely, if 90% or more of the transcripts assigned to a STAMP were from human or mouse, that STAMP was assigned to either a human or a mouse cell, otherwise it was considered as a doublet). Next, starting from the 50 STAMPs showing the highest number of detected transcripts, the doublet-rate was estimated (). The 51STAMP was then added and the doublet-rate re-calculated, and so forth up to the 10,000STAMP. A threshold of 10% doublet-rate was finally applied to the resulting curve, which in turn allowed the identification of a set of high-quality STAMPs. A further threshold on the minimum number of detected transcripts was applied, so that STAMPs showing less than 1,000 detected molecules were discarded. Results from the five libraries were then merged together. Digital expression for each STAMP was normalized to Transcripts Per Million (TPM). The expression of the two transgenes (mCherry, GFP) was excluded from any of the following clustering steps. Genes with detectable expression in less than or equal to 10 STAMPs were also discarded. Dimensionality reduction was performed using the single-cell interpretation via multikernel learning approach (SIMLR;). The STAMPs were then clustered in the resulting two-dimensional space using density-based clustering (DBSCAN;). The dbscan R package was employed, setting eps to 3 and minPts to 25. MAST () was then run to detect differentially expressed genes. The Two-sample Likelihood Ratio Test implemented in the LRT function of the MAST R package allowed the identification of marker genes for each cluster. Briefly, given a cluster, each STAMP was either flagged as belonging or not belonging to it. Those genes identified as upregulated in the cluster at a FDR ≤ 0.05 (Benjamini-Hochberg correction;) were classified as markers for the cluster. More stringent sets of marker genes were determined by filtering these lists based on the Area Under the Curve (AUC), which is an estimate of how well a certain gene predicts a cell as belonging to a certain cluster. AUCs were calculated using the ROCR R package (). Given a cluster, a list of stringent markers was defined as those genes identified by MAST (see above) also showing an AUC > = 0.6. Enriched pathways and gene ontologies were identified using mouseMine (). Cell type cluster identities were determined by examining the enriched gene ontology lists and by comparing marker gene sets for each cell type to marker genes used previously for forebrain (such as). The identification of the cell types in which each enhancer is active was robust to the choice of clustering algorithm, as similar analyses using the k-means based SC3 method (), resulted in identical conclusions.