Somatic LINE-1 (L1) retrotransposition during neurogenesis is a potential source of genotypic variation among neurons. As a neurogenic niche, the hippocampus supports pronounced L1 activity. However, the basal parameters and biological impact of L1-driven mosaicism remain unclear. Here, we performed single-cell retrotransposon capture sequencing (RC-seq) on individual human hippocampal neurons and glia, as well as cortical neurons. An estimated 13.7 somatic L1 insertions occurred per hippocampal neuron and carried the sequence hallmarks of target-primed reverse transcription. Notably, hippocampal neuron L1 insertions were specifically enriched in transcribed neuronal stem cell enhancers and hippocampus genes, increasing their probability of functional relevance. In addition, bias against intronic L1 insertions sense oriented relative to their host gene was observed, perhaps indicating moderate selection against this configuration in vivo. These experiments demonstrate pervasive L1 mosaicism at genomic loci expressed in hippocampal neurons.

Despite extensive evidence of somatic retrotransposition in the brain, many fundamental aspects of the phenomenon remain unclear. The rate of L1 mobilization in the neuronal lineage is, for instance, a major unresolved issue. Estimates range from <0.1 to 80 somatic L1 insertions per neuron (). Experiments using engineered L1 reporter systems have shown that L1 mobilization is likely to occur via TPRT in neuronal precursor cells and may be altered by neurological disease (). However, it is unknown whether endogenous L1 retrotransposition in hippocampal neurons adheres to these predictions. Most importantly, it is unclear whether somatic L1 insertions influence neuronal phenotype or endow carrier neuronal progenitor cells with a selective advantage or disadvantage in vivo. To address these questions, we applied single-cell retrotransposon capture sequencing (RC-seq) to hippocampal neurons and glia, as well as cortical neurons, and found that L1 retrotransposition is a major endogenous driver of somatic mosaicism in the brain.

Of approximately 500,000 LINE-1 (L1) copies present in the human genome, only ∼100 members of the L1-Ta and pre-Ta subfamilies remain transposition-competent (). L1 mobilization primarily occurs via target primed reverse transcription (TPRT), a process catalyzed in cis by two proteins, ORF1p and ORF2p, translated from the bicistronic 6 kb L1 mRNA. L1 ORF2p encodes endonuclease (EN) and reverse transcriptase (RT) activities essential to L1 retrotransposition and also responsible for trans mobilization of Alu and SVA retrotransposons (). A typical TPRT-mediated L1 insertion involves a degenerate L1 EN recognition motif (5′-TT/AAAA), an L1 poly-A tail and, crucially, produces target site duplications (TSDs) (). Various host defense mechanisms suppress L1 activity (), including via methylation of the CpG-rich L1 promoter. Neural progenitors and other multipotent cells can nonetheless permit L1 promoter activation (), a pattern accentuated in the hippocampus, likely due to its incorporation of the neurogenic subgranular zone (). This coincidence of neurogenesis, L1 activity, and mosaicism has elicited speculation that L1 mobilization could impact cognitive function rooted in the hippocampus ().

The extent to which the genome of one cell differs from that of any other cell from the same body is unclear. DNA replication errors, mitotic recombination, aneuploidy, and transposable element activity can cause somatic mosaicism during ontogenesis and senescence. In humans, the consequences of somatic mosaicism are most apparent in disease, including cancer and developmental syndromes (). The impact of mosaicism among normal cells is relatively undefined beyond the notable exception of V(D)J recombination and somatic hypermutation intrinsic to lymphocyte antigen recognition (). Reports of retrotransposition () and other genomic abnormalities () in animal neurons may therefore be important given that, as for immune cells, mosaicism is a plausible route to neuron functional diversification.

De novo germline L1 insertions can be highly deleterious to gene function, and commonly undergo purifying selection (). The L1 ORF2 segment of sense oriented intronic L1 insertions particularly hinders RNA polymerase processivity (). Hence, while sense and antisense intronic L1 insertions are assumed to occur with equal frequency in the germline, sense insertions are selected against more strongly and tend to be eliminated from the population. It follows that an estimated 43.3% of recent intronic L1-Ta insertions are sense oriented, versus only 34.1% of fixed L1-Ta insertions and 39.7% of all polymorphic L1-Ta insertions (). By contrast, sense oriented intronic L1 insertions are not depleted in tumors (). Among the control individuals examined here, we found that, as expected, 42/101 (41.6%) of intronic, polymorphic germline L1 insertions were sense oriented to their host gene. Surprisingly, 406/1,024 (39.6%) of intronic somatic L1 insertions detected in hippocampal neurons by single-cell RC-seq were also sense oriented, significantly less than the expected 50% (p < 0.0001, exact binomial test). This proportion was 47/136 (34.6%) and 166/503 (33.0%) for glia and cortical neurons, respectively. Adhering to the prevailing germline model of L1 evolutionary selection, we concluded that some somatic L1 insertions may arise sufficiently early in neurogenesis to impact neural progenitor cell fitness, as indicated by a depletion of sense oriented events in mature neurons and glia.

Noting that euchromatin is also a signature of active enhancer elements, we intersected our list of somatic L1 insertions detected by hippocampus bulk RC-seq with an extensive FANTOM5 catalog of transcribed constitutive and cell-type specific enhancers defined by histone modifications and CAGE-delineated transcriptional activity (). Globally, no substantial difference was observed in the rate of L1 insertions in all enhancers versus random expectation. However, of 47 cell-type specific enhancer sets, only neuronal stem cell enhancers were significantly enriched for somatic L1 insertions, compared with random expectation (p < 0.01, Fisher’s exact test, Bonferroni correction) and compared with the union of the remaining 46 cell-type specific enhancer sets ( Figure 7 C; p < 1.0 × 10, Fisher’s exact test). This enrichment was highest for L1 insertions within 100 nt of an enhancer, and was observed up to 500 nt from defined enhancer boundaries ( Figure 7 D). No enrichment was observed for astrocytes or for other cells not of the neuronal lineage, such as hepatocytes ( Figure 7 D). The smaller cohorts of somatic L1 insertions detected by single-cell RC-seq and liver bulk RC-seq were insufficient to perform meaningful statistical analyses of L1 insertional preference with regards to enhancers. Nonetheless, hippocampus bulk RC-seq indicated that neuronal stem cell-specific enhancers were the most highly enriched genome functional element in absolute terms (1.8-fold) for somatic L1 insertions. This reinforced the view that L1 mobilization during neurogenesis impacts regulatory and protein-coding loci specifically active in the hippocampus.

Open chromatin is a typical prerequisite for efficient transcription (). With this in mind, we used single-molecule cap analysis of gene expression (CAGE) transcriptome profiling data from the FANTOM5 consortium () to test whether genes strongly transcribed in the hippocampus were specifically enriched for somatic L1 insertions in hippocampal neurons. We first identified genes differentially upregulated in hippocampus, cortex, caudate nucleus, liver, or heart tissue surveyed by CAGE and then intersected these gene lists with the cohort of intragenic somatic L1 insertions detected by single-cell RC-seq applied to hippocampal neurons. Only those genes upregulated in hippocampus versus heart, and hippocampus versus liver, were significantly enriched (p < 0.05, Fisher’s exact test, Benjamini-Hochberg correction) for insertions ( Figure 7 A, Table S6 ). Somatic L1 insertions in hippocampal glia were also most enriched in genes upregulated in the hippocampus (p < 0.07). No enrichment was observed for cortical neurons while, intriguingly, the liver-specific L1 insertion cohort exhibited enrichment (p < 0.11) in genes upregulated in liver versus hippocampus ( Figure 7 A). Finally, we calculated the significance of enrichment for hippocampal neuron L1 insertions in genes upregulated in hippocampus while incrementally introducing putative artifacts described in Figure 5 B. We found that statistical significance was no longer achieved once the dataset contained 15% or more artifacts ( Figure 7 B), hence demonstrating how experimental noise reduced in single-cell RC-seq analyses would otherwise obscure genome-wide enrichment. These experiments altogether reveal context-dependent, preferential L1 mobilization into strongly transcribed loci.

(C) Of the transcribed cell-type specific enhancers defined by FANTOM5, only those of neuronal stem cells were enriched (observed/expected) for somatic L1 insertions detected by bulk hippocampus RC-seq, compared with other enhancers (p < 1.0 × 10 −4 , Fisher’s exact test, Bonferroni correction).

(B) Hippocampal somatic L1 insertions were statistically enriched in genes upregulated in hippocampus versus liver (black) or hippocampus versus heart (gray), as shown in (A). However, as previously filtered molecular chimeras (see Figure 5 B) were re-introduced into this dataset, enrichment rapidly became no longer significant.

(A) Somatic L1 insertions detected by single-cell RC-seq in hippocampal neurons and glia were enriched in genes differentially upregulated in hippocampus. Liver-specific L1 insertions detected by bulk RC-seq were moderately enriched in genes upregulated in liver. No enrichment was observed for cortical neurons. Color intensity is based on the absolute log 2 transformed p value determined by Fisher’s exact test (Benjamini-Hochberg correction) with blue and orange colors representing depletion and enrichment, respectively. Note: in each matrix pairwise comparison, the more highly expressed tissue is on the y axis.

Substrate DNA chromatinization modulates L1 EN target site nicking in vitro (). As such, dynamic changes to chromatin state during neurogenesis may impact the associated genome-wide pattern of L1 mobilization. An intersection of somatic L1 insertion sites detected by hippocampus bulk RC-seq with RefSeq gene coordinates revealed significant (p < 1.0 × 10, Fisher’s exact test, Bonferroni correction) depletion for insertions in exons and promoters versus random sampling and significant (p < 3.8 × 10) enrichment for introns versus polymorphic insertions ( Table S3 ). Exons and introns carrying gene ontology (GO) terms relevant to neurobiology were however enriched for somatic L1 insertions ( Tables S4 and S5 ) compared with random sampling performed by gene identifier or by genomic coordinate (p < 4.5 × 10and p < 0.03, respectively, Fisher’s exact test, Benjamini-Hochberg correction). The latter result indicated enrichment for L1 insertions in genes expressed in the brain, despite taking into account that their length is on average >50% greater than that of other genes. By considerable margin, the most enriched GO term found ( Table S5 ) was “regulation of synapse maturation” (p < 1.7 × 10, Fisher’s exact test, Benjamini-Hochberg correction). Genome-wide patterns for somatic L1 insertions detected in glia and neurons by single-cell RC-seq typically corroborated those found by bulk RC-seq, including enrichment in introns and depletion from promoters and exons ( Table S3 ) and even stronger enrichment in neurobiology genes annotated by GO term ( Tables S4 and S5 ). Intriguingly, in AGS-1 hippocampal neurons we did not observe enrichment for L1 insertions in neurobiology genes ( Table S4 ), whereas enrichment was observed for control hippocampal neurons, even if each individual was analyzed separately. As a control experiment, from the liver bulk RC-seq data we identified a set of 175 potential liver-specific L1 insertions (see Extended Experimental Procedures ) that collectively presented a clear L1 EN consensus motif ( Figure S6 D) and, owing to the sensitivity of bulk RC-seq, were unlikely to represent incorrectly annotated polymorphic L1 insertions ( Table S1 ). Notably, these liver-specific L1 insertions exhibited no enrichment for neurobiology genes ( Table S4 ). We concluded that somatic L1 retrotransposition in neural cells preferentially occurs into the euchromatic regions of the genome contributing to neurobiology.

As the 13 total somatic L1 insertions detected by single-cell RC-seq and validated by PCR generally followed the TPRT model, we next assessed whether somatic L1 insertions detected by bulk RC-seq also carried TPRT signatures. RC-seq separately applied to DNA extracted from the four control hippocampus samples elucidated 318,866 putative somatic L1 insertions ( Table S1 ). Again exploiting L1-genome junction resolution by RC-seq reads ( Figures 6 A and 6B and S2 ), we found a strong enrichment for the L1 EN motif ( Figure 6 C), a typical TSD size range of 5–35 nt ( Figures 6 D and S6 ) and a median L1 poly-A tail length of 33 nt for somatic L1 integration sites identified by bulk RC-seq. We also identified a substantial group of insertions with TSDs > 40 bp in length ( Figure S6 ). Thus, single-cell RC-seq and RC-seq applied to bulk DNA both elucidated the hallmark sequence features of TPRT-mediated retrotransposition.

Expected values for (A) and (B) were calculated by randomizing sense and antisense RC-seq read cluster genomic coordinates, to ascertain how many overlapping clusters in the opposing orientation and detecting opposite ends of an L1 insertion were found, using the same bioinformatics process as used for observed clusters. Expected values for (C) were calculated by random sampling of genomic coordinates and searching for the nearest upstream L1 EN motif, again following the same string matching process as for observed values. Note: the corresponding TSD size distribution for liver somatic L1 insertions detected at only their 5′ L1-genome junction contained insufficient data (n = 7) to make a meaningful comparison with hippocampal somatic L1 insertions.

(A) A 6 kb L1-Ta element incorporates 5′ and 3′ UTRs and two ORFs. ORF2p presents EN and RT domains. Methylation of a CpG island present in the 5′ UTR regulates L1 promoter activity. The locations of two capture probes used by RC-seq are indicated below the L1. Note: TSDs and probes are not drawn to scale. See also Figure S2

Recent qPCR based estimates of L1 CNV in human tissue, as well as in vitro L1 reporter assays, indicate L1 mobilization may be pronounced in a range of neurodevelopmental and psychiatric diseases () including Aicardi-Goutières syndrome (AGS). AGS is a rare, severe neurodevelopmental condition, characterized by mutations in several genes thought to inhibit reverse transcription, including SAMHD1 (). To address whether SAMHD1 deficiency in AGS patients increases neuronal L1 mobilization, we first applied bulk RC-seq to the post-mortem hippocampus and fibroblasts of an AGS patient (identifier AGS-1) carrying two loss-of-function SAMHD1 mutations. We then performed single-cell RC-seq upon 21 neuronal nuclei from AGS-1 hippocampus and identified 373 putative somatic L1 insertions ( Figures 4 C and S5 ), leading to a true positive mean estimate of 8.0 insertions per AGS-1 neuron. This figure was significantly (p < 0.03, two-tailed t test, df = 112) lower than the 13.7 somatic L1 insertions found for control hippocampal neurons. A more significant difference was observed when AGS-1 neurons were compared only with the age (18 years) and gender (female) matched hippocampal neurons of CTRL-36 (p < 0.0001, two-tailed t test, df = 44). As corollary, L1 qPCR also indicated significantly lower (p < 0.002, two-tailed t test, df = 23) L1 copy number in AGS-1 hippocampus versus controls ( Figure 4 D). Finally, the results of the L1 CNV assay were strongly correlated (R= 0.93) with the mean somatic L1 insertion frequencies estimated by single-cell RC-seq ( Figure 4 E). We therefore concluded that L1 mobilization was unlikely to be elevated in AGS-1 hippocampus.

PCR validation including TSD discovery underpins accurate calculation of L1 mobilization frequency and reflects experimental veracity independent of methodology (). It is therefore notable that, at this stringency, Evrony et al. reported a PCR validation rate of 1/96 and a consequential paucity of L1 activity. Two key technical considerations may explain our discrepant findings. First, RC-seq reads fully span L1-genome junctions ( Figure S2 ), enabling bioinformatic identification of molecular chimeras before PCR validation. The earlier work by contrast followed a design () that typically did not resolve L1-genome junctions, prohibiting computational removal of chimeric reads. Instead, the authors maintained that artifacts, including those generated by WGA and Illumina library preparation, should present lower read depth than genuine L1 insertions, and essentially adhered to the same principle in a very recent study applying WGS to a smaller number of neurons (). This assumption is crucial as, at least in single-cell RC-seq libraries, putative chimeras are disproportionately likely to amplify efficiently and accrue high read depth ( Figures 5 A and 5B ). Second, Evrony et al. selected candidates for PCR validation effectively as a function of high read count and not at random ( Figure 5 C). This approach would strongly enrich for artifacts if applied to single-cell RC-seq data ( Figure 5 B). It follows that, without the capacity to filter artifacts a priori, the previous study resolved numerous molecular chimeras after PCR and capillary sequencing of putative L1 insertions, substantially reducing the reported validation rate. By contrast, we selected PCR validation candidates at random ( Figure 5 D). These factors plausibly explain why our validation rate of 9/20 (45.0%) was significantly higher than the rate of 1/96 (1.0%) reported by the earlier work (p < 1 × 10, chi-square test, df = 1), as well as the disparate estimates of somatic L1 retrotransposition made by each study.

(C) Distribution of read peak height for L1 insertions selected for validation by Evrony et al. The L1 insertion successfully validated by TSD discovery is colored black. The remaining insertions not validated to this standard are colored red.

(B) As for (A), except for all single-cell RC-seq data presented here. Peaks were annotated as chimeric or as likely genuine L1 insertions by sequence analysis of RC-seq reads.

A recent single-cell genomic analysis of 300 cortex and caudate nucleus pyramidal neurons elucidated <0.1 somatic L1 insertions per cell, and concluded that L1 was not a major driver of neuronal diversity (). However, the biological or technical reasons for such disparate results compared with prior data from the hippocampus were unclear. We therefore performed single-cell RC-seq upon 35 NeuNnuclei isolated from CTRL-42, CTRL-45 and CTRL-55 cortex tissue, including seven pyramidal neurons, and identified 1,262 putative somatic L1 insertions ( Figures 4 B and S5 ). This provided a true positive mean estimate of 16.3 insertions per cortical neuron, a figure higher than hippocampal neurons, but not significantly different. An estimated 10.7 insertions occurred per cortex pyramidal neuron, a rate substantially lower than the remaining cortical neurons but a difference that fell short of statistical significance (p < 0.16, two-tailed t test, df = 33). These data elucidate L1 mosaicism in cortical neurons and exclude a biological explanation for inconsistency with the previous study.

Prior in vitro experiments based on an engineered L1 reporter indicated that glia may support far less L1 mobilization than neurons (). To evaluate glial lineage endogenous L1 retrotransposition in vivo, we performed single-cell RC-seq upon 22 glial nuclei (NeuN/Ki67) isolated from CTRL-42, CTRL-45, and CTRL-55 hippocampi, and detected 316 putative somatic L1 insertions ( Figures 4 A and S5 ). This produced a mean true positive estimate of 6.5 insertions per glial cell, based on the PCR validation rate determined for hippocampal neurons (45.0%). This rate was 52.6% lower than the estimated 13.7 insertions for hippocampal neurons, a significant difference (p < 0.005, two-tailed t test, df = 112). Interestingly, four insertions were found in both glial and neuronal cells by single-cell RC-seq, with one of these instances detected at both its 5′ and 3′ L1-genome junctions, revealing a 12 bp TSD ( Table S2 ). We concluded that L1 insertions can arise in proliferating neural stem cells prior to glial or neuronal commitment, while glia otherwise support less L1 mobilization than neurons.

Single-cell RC-seq identified mean somatic L1 insertion counts of 48.4, 27.5, 30.5, and 14.8 per hippocampal neuron in CTRL-36, CTRL-42, CTRL-45, and CTRL-55, respectively, yielding an overall mean count of 30.4 ( Figure 2 F). To estimate the overall true positive mean, we incorporated the PCR validation rate (45.0%) calculated above, leading to a conservative rate calculation of 13.7 somatic L1 insertions per hippocampal neuron. If, more conservatively, only L1 insertions detected at a 3′ L1-genome junction were considered, the true positive mean was 9.9. Conversely, if all L1 insertions were considered, we generously incorporated the maximum PCR validation rate calculated above (90%) and we corrected for assay sensitivity in terms of polymorphic L1 insertions detected (49.0%), the estimated true positive mean was greatly increased to 55.8. Thus, given a true positive mean of 13.7 somatic L1 insertions per neuron, and the detection of at least one event in every neuron ( Figure 2 F), we concluded that L1 mosaicism was ubiquitous among the hippocampal neurons studied.

Nearly 75% of somatic L1 insertions found by single-cell RC-seq were detected only at a 3′ L1-genome junction ( Figure S2 ). Given this preponderance, we sought to ascertain why the matching 5′ L1-genome junction could not be identified by PCR for 11/20 selected examples of this type. PCR amplification failure was potentially due to RC-seq false positives, structurally exotic L1 insertions () or, alternatively, WGA inconsistently amplifying the 5′ L1-genome junctions of insertions detected at a 3′ L1-genome junction by single-cell RC-seq. To model the latter possibility, we randomly selected 12 polymorphic L1 insertions detected by bulk RC-seq and confirmed as heterozygous by genotype PCR. We performed PCR using bulk DNA to confirm each insertion was detectable at its 5′ L1-genome junction and then selected 100 random examples in individual neurons where these polymorphic L1s were detected at only a 3′ L1-genome junction by single-cell RC-seq ( Table S2 ). We attempted PCR amplification of the corresponding 5′ L1-genome junction for each neuron, hence recapitulating the validation process for somatic L1 insertions, and confirmed 50/100 examples. This assay indicated the maximum PCR validation rate (50.0%) for somatic L1 insertions detected at only a 3′ L1-genome junction by single-cell RC-seq and, given the validation rate reported above (9/20, 45%), implied a true positive rate potentially as high as 9/10 (90.0%).

These experiments showed that nearly half of somatic L1 insertions detected by single-cell RC-seq at a 3′ L1-genome junction could be confirmed as genuine TPRT-mediated retrotransposition events. By contrast, PCR validation for 10 randomly selected exonic L1 insertions detected at a 5′ L1-genome junction by single-cell RC-seq failed to find the opposing 3′ L1-genome junction in all cases ( Table S2 ). This was consistent with the L1 polyA-tail obstructing PCR amplification of somatic L1 insertion 3′ ends () and arguably did not resolve whether L1 insertions detected only at a 5′ L1-genome junction were false positives. Finally, we selected 4 L1 insertions found at both their 5′ and 3′ L1-genome junctions by single-cell RC-seq; all four were confirmed by PCR and presented TPRT hallmarks, including one with a 92 bp TSD ( Table S2 ).

To determine the true positive rate of single-cell RC-seq, we randomly selected 20 somatic L1 insertions detected at only a 3′ L1-genome junction and PCR amplified the opposing 5′ L1-genome junction. This enabled detection of TPRT sequence hallmarks that distinguish WGA artifacts from most genuine L1 integration sites; specifically a TSD, an L1 EN target motif and an L1 poly-A tail (). Through PCR and sequencing, 5′ L1-genome junctions were identified for nine insertions and, when combined with the corresponding 3′ L1-genome junctions described by RC-seq, indicated TSDs and polyA-tails in all cases, and plausible L1 EN motifs for 7/9 (77.8%) examples ( Tables S2 and Data S1 ). PCR validated insertions included full-length ( Figure 3 A) and variably 5′ truncated ( Figures 3 B–F) L1s. Intronic L1 insertions were found sense oriented to two genes expressed in brain, ZFAND3 ( Figure 3 B) and USP33 ( Table S2 ). One L1 insertion incorporated a 3′ transduction and was detected by PCR in two neurons of CTRL-42 ( Figure 3 D). Further, PCR applied to the full panels of analyzed neurons from each individual revealed that two other L1 insertions were present in 10/21 and 2/21 neurons, respectively ( Figures 3 E and 3F). Three of the validated L1 insertions generated TSDs >40 bp in length.

(A–F) Validated examples from hippocampal neuron single-cell RC-seq data included: (A) a full-length L1 insertion in neuron CTRL-42-HN-#13; (B) a truncated L1 insertion in neuron CTRL-42-HN-#11; (C) a heavily truncated L1 insertion in neuron CTRL-55-HN-#15; and (D) a very heavily truncated L1 insertion yielding a 3′ transduction in neuron CTRL-42-HN-#4, also validated in neuron CTRL-42-HN-#3, and traced to a donor L1-Ta on chromosome 3; (E) a very heavily truncated L1 insertion detected in CTRL-42-HN-#13 and validated in 10/21 CTRL-42 hippocampal neurons tested. Asterisks denote neurons where validation succeeded; (F) a very heavily truncated L1 insertion detected in CTRL-42-HN-#4 and also validated in CTRL-42-HN-#22. Note: in (A–F) the 3′ L1-genome junction was detected by single-cell RC-seq, while the 5′ L1-genome junction was identified by insertion-site PCR (using primers indicated by α and β) and sequencing. Green triangles indicate TSDs. Numbers below the 5′ L1-genome junction indicate the equivalent L1-Ta consensus position. See also Table S2 and Data S1

Single-cell RC-seq applied to each of the 92 libraries analyzed by WGS detected 61.3% of reference genome L1-Ta copies ( Figure 2 D, Table S1 ) and 49.0% of polymorphic L1-Ta insertions in each neuron ( Figure 2 E), as defined by the earlier bulk RC-seq experiments. The latter figure provided a provisional estimate of assay sensitivity for somatic L1 insertions. A total of 2,782 putative somatic L1-Ta and pre-Ta insertions ( Figure 2 F, Table S2 ) were identified in at least one hippocampal neuron, were not detected in any bulk liver RC-seq library or more than one hippocampus by single-cell or bulk RC-seq, and were absent from existing L1 polymorphism databases (). Of these insertions, 1,024 (36.8%) and 34 (1.2%) were found in introns and exons, respectively. Twelve (0.4%) somatic L1 insertions were detected at both their 5′ and 3′ L1-genome junctions, 760 (27.3%) at only a 5′ junction, and 2,010 (72.3%) at only a 3′ junction. Notably, nine somatic L1 insertions detected by single-cell RC-seq were also detected and annotated as somatic in the corresponding hippocampus bulk RC-seq library, and 13 were detected by single-cell RC-seq in more than one neuron from the same hippocampus. Of somatic L1 insertions, 98.2% belonged to the L1-Ta subfamily, and 1.8% were annotated as pre-Ta. Although at 5′ L1-genome junctions RC-seq captures only full-length and very heavily truncated L1s ( Figure S2 ), we found 123 full-length L1 insertions, representing 4.4% of all events and including two instances of 5′ transduction. Of those insertions detected at their 3′ L1-genome junction, 151 (7.5%) carried a putative transduced 3′ flanking sequence (). This L1 3′ transduction rate was lower than reported for germline L1 retrotransposition (), likely due to assay design not encompassing 3′ transductions longer than ∼100 bp, as reported elsewhere ().

Next, 92 individual neuronal nuclei were isolated from the aforementioned hippocampi, subjected to WGA and analyzed by WGS. Globally, WGS revealed that 4,226/4,232 (99.9%) chromosomes amplified ( Figure 2 A) with recurring WGA bias largely limited to telomeres ( Figures S3 S4 A and S4B). Higher-resolution copy-number variation (CNV) analysis based on the division of the genome into adjustable-width “bins” with an average size of ∼600 kb revealed five non-telomeric deletions larger than ∼5 Mb. The largest and third largest of these occurred on chromosome 6 of CTRL-45 hippocampal neuron 2 (CTRL-45-HN-#2) and were 16.2 Mb and 9.4 Mb in length ( Figure 2 B). An alternative CNV analysis using ∼60 kb bins indicated the presence of numerous subregions in the 16.2 Mb example where chromosomal copy number was ≥2 ( Figure S4 C), depicting a region of highly variable WGA performance and, arguably, contraindicative of a genuine deletion in vivo. Genome-wide, allelic dropout (AD) and locus dropout (LD) respectively affected 8.0% and 0.7% of bins at 600 kb resolution ( Figure 2 C, Table S1 ), indicating efficient amplification across >90% of the genome. Importantly, we optimized WGA parameters to not deplete L1-Ta copies from amplified DNA, with the mean ratio of WGS reads aligned to reference L1-Ta 5′ or 3′ L1-genome junctions at 0.81 and 1.28 of expected values, respectively ( Figures S4 D and S4E; Table S1 ). These results show robust WGA for individual neurons, without significant loss of reference genome L1-Ta copies.

(C) High resolution analysis of two localized AD regions on chromosome 6 of CTRL-45 hippocampal neuron 2 (CTRL-45-HN-#2), also presented at 600 kb resolution in Figure 2 B. Copy number is displayed for ∼60 kb bins (black diamonds). Bins with absolute log(copy number) ≥ 5 are colored in purple. Dropout regions are indicated by red bars.

For each sample, sequence alignments were binned by alignment start position into 600 kb intervals across the human genome, excluding unplaced contigs, extra haplotypes, and the mitochondrial genome. Counts were quantile normalized across all samples before plotting. Chromosomes are indicated on the vertical axis. Sample brain region location (cortex, hippocampus), cell type (glial, neuron) and individual ID are indicated for groups of columns on the horizontal axis. For each individual, single cells are ordered numerically. Note: low and high coverage bins are indicated in yellow and blue, respectively.

(B) WGS indicated 16.2 Mb and 9.4 Mb regions of localized AD (indicated by red bars) on chromosome 6 of neuron CTRL-45-HN-#2. Each blue diamond corresponds to a 600 kb “bin”. One bin with log 2 copy number < −5 is colored purple.

(A) Chromosome copy number in each amplified genome, assessed by WGS. Box-and-whisker plots indicate median chromosomal copy number and quartiles across all neurons. Empty circles represent chromosomes with copy number >1.5 IQR from the median. Sex chromosomes for CTRL-36 (female, ♀) and CTRL-42, CTRL-45, and CTRL-55 (male, ♂) are presented separately. Six autosomes, marked in red, had copy number ≤ 1. Two sex chromosomes with log 2 copy number < −2 are colored purple.

Prior to single-cell RC-seq, we performed deep coverage (∼80×) RC-seq on bulk DNA extracted from the post-mortem hippocampus and matched liver samples of four individuals (identifiers CTRL-36, CTRL-42, CTRL-45, and CTRL-55) without evidence of neurological disease ( Table S1 ). Bulk RC-seq on average detected 97.5% of 960 annotated reference genome L1-Ta copies (), indicating high assay sensitivity. As expected, we detected ∼210 polymorphic L1-Ta insertions absent from the reference genome, per individual ( Tables S1 and S2 ). This defined the polymorphic (germline) L1-Ta insertion cohort for each individual and provided a positive control for subsequent single-cell RC-seq analyses.

RC-seq utilizes sequence capture to enrich DNA for the junctions between retrotransposon termini and adjacent genomic regions, followed by paired-end sequencing, alignment, and clustering, to reveal L1 insertions absent from the reference genome. Here, we replaced previous RC-seq sequence capture pools () with two locked nucleic acid (LNA) probes respectively targeting the extreme 5′ and 3′ ends of L1-Ta. These probes capture typical L1 insertions at a 3′ L1-genome junction, and full-length or heavily 5′ truncated L1 insertions at a 5′ L1-genome junction ( Figure S2 ), and delivered a 15-fold improvement in L1 enrichment compared with previous RC-seq applied to brain (). Assembly of each overlapping read pair into a “contig” enabled computational identification of molecular chimeras and removal of PCR duplicates, and provided single-nucleotide resolution of L1 integration sites by fully spanning L1-genome junctions ( Figure S2 ).

(B) L1 detection scenarios as outlined in (A). Insertions are either 1) full-length or heavily truncated and detected at only a 5′ L1-genome junction, 2) of any length and detected at only a 3′ L1-genome junction, 3) full-length or heavily truncated and detected at both L1-genome junctions. Note the percentages given in brackets, indicating the relative occurrence of each scenario in the single-cell RC-seq data presented.

(A) A full-length L1-Ta structure indicates the positions of two RC-seq probes designed to detect the 5′ or 3′ L1-genome junction of a given L1 insertion. Three categories of RC-seq reads are therefore generated, namely those that detect: the 5′ L1-genome junction of a full-length L1, the 5′ L1-genome junction of a heavily truncated L1 and the 3′ L1-genome junction of any L1.

Several biological and technical factors hinder accurate calculation of somatic L1 mobilization frequency using bulk DNA extracted from tissue, as well as subsequent PCR validation and structural characterization of individual somatic L1 insertions (). We therefore developed a single-cell RC-seq protocol to detect somatic L1 insertions in individual neurons. Briefly, NeuNhippocampal nuclei were purified by fluorescence activated cell sorting (FACS) ( Figures 1 A and S1 ), with single nuclei isolated using a self-contained microscope and micromanipulator ( Figure 1 B). Whole-genome amplification (WGA) was achieved through an extensively optimized version of the quasi-linear Multiple Annealing and Looping Based Amplification Cycles (MALBAC) protocol () and was followed by Illumina library preparation ( Figures 1 C and 1D). Libraries were then subjected to low-coverage (0.35×) whole-genome sequencing (WGS) as a quality control step to assess amplification bias and, in parallel, hybridized and processed by RC-seq ( Figures 1 E and 1F).

(C) DNA was extracted from nuclei and subjected to linear WGA, followed by exponential PCR in two separate reactions for each nucleus, using different enzymes.

Discussion

Evrony et al., 2012 Evrony G.D.

Cai X.

Lee E.

Hills L.B.

Elhosary P.C.

Lehmann H.S.

Parker J.J.

Atabay K.D.

Gilmore E.C.

Poduri A.

et al. Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Coufal et al., 2009 Coufal N.G.

Garcia-Perez J.L.

Peng G.E.

Yeo G.W.

Mu Y.

Lovci M.T.

Morell M.

O’Shea K.S.

Moran J.V.

Gage F.H. L1 retrotransposition in human neural progenitor cells. Our experiments firmly establish that L1-driven mosaicism pervades the hippocampus and is mediated by TPRT. That we found 13.7 somatic L1 insertions per hippocampal neuron was unexpected given a prior estimate of <0.1 insertions per cortical neuron (). By discovering here a myriad of L1 insertions in cortical neurons, we exclude a biological explanation for this discrepancy and instead propose that the process by which the earlier work selected insertions for validation led to a significant underestimate of L1 retrotransposition frequency. Indeed, the mobilization rate reported here much more closely resembles an earlier estimate of 80 somatic L1 insertions per brain cell, calculated via L1 qPCR ().

Spalding et al., 2005 Spalding K.L.

Bhardwaj R.D.

Buchholz B.A.

Druid H.

Frisén J. Retrospective birth dating of cells in humans. Gilbert et al., 2005 Gilbert N.

Lutz S.

Morrish T.A.

Moran J.V. Multiple fates of L1 retrotransposition intermediates in cultured human cells. Beyond this, our data demonstrate that L1 insertions in hippocampal neurons and glia are preferentially found in protein-coding genes highly transcribed in the hippocampus. Transcribed enhancers active in neuronal stem cells are also enriched for somatic L1 insertions, indicating likely L1 perturbation of regulatory elements. L1 insertions in cortical neurons were however not significantly enriched in genes highly transcribed in the cortex. We speculate that this could be due to cortical neurogenesis primarily occurring during fetal development (), which presents a genome-wide transcriptional profile different to that of the adult cortex. Although L1 mobilization was not increased in AGS-1 hippocampal neurons, the pattern of L1 insertions was prospectively different to that of controls, the reasons for which are presently unclear. The most obvious caveat of this analysis is that, due to the extreme rarity of the disease, only one AGS patient hippocampus was studied. Nonetheless, this experiment serves as a proof-of-principle demonstration that single-cell RC-seq could be used in the future to assess abnormal L1 mobilization in neurological disease. Finally, we noted that somatic L1 insertions in neurons bore substantially longer TSDs on average than polymorphic L1 insertions, corroborated by structural characterization of L1 integration sites found by single-cell RC-seq. Unusually long TSDs have previously been identified using an engineered L1 reporter system in HeLa cells (). As also hypothesized in that context, pervasive euchromatinization in neural progenitor cells may promote the formation of long TSDs.

The predominant developmental timing of endogenous L1 mobilization in the brain remains unclear. Although the vast majority of somatic L1 insertions detected by single-cell RC-seq were found in one cell each, a small proportion of L1s were detected in multiple cells, including examples found in both glia and neurons, indicating L1 mobilization in a common multipotent progenitor cell. Three somatic L1 insertions were validated by PCR in multiple neurons, including one example found in nearly 50% of the neurons assayed. Thus, although most L1 insertions may occur in one or a handful of neurons, a substantial number appear to arise during early neurogenesis. Indeed, the signature of potential selection against somatic L1 insertions sense oriented to host gene introns suggests that many retrotransposition events precede terminal neural cell maturation. We speculate that depletion of these events could be explained by preferential L1 integration into neurogenesis genes, thereby impacting the survival or differentiation potential of neural progenitor cells. It also cannot be excluded that somatic L1 integration primarily occurs antisense to host gene introns, though we currently lack a mechanistic explanation for this preference.

Cai et al., 2014 Cai X.

Evrony G.D.

Lehmann H.S.

Elhosary P.C.

Mehta B.K.

Poduri A.

Walsh C.A. Single-cell, genome-wide sequencing identifies clonal somatic copy-number variation in the human brain. Gole et al., 2013 Gole J.

Gore A.

Richards A.

Chiu Y.J.

Fung H.L.

Bushman D.

Chiang H.I.

Chun J.

Lo Y.H.

Zhang K. Massively parallel polymerase cloning and genome sequencing of single cells using nanoliter microwells. McConnell et al., 2013 McConnell M.J.

Lindberg M.R.

Brennand K.J.

Piper J.C.

Voet T.

Cowing-Zitron C.

Shumilina S.

Lasken R.S.

Vermeesch J.R.

Hall I.M.

Gage F.H. Mosaic copy number variation in human neurons. McConnell et al., 2013 McConnell M.J.

Lindberg M.R.

Brennand K.J.

Piper J.C.

Voet T.

Cowing-Zitron C.

Shumilina S.

Lasken R.S.

Vermeesch J.R.

Hall I.M.

Gage F.H. Mosaic copy number variation in human neurons. Neuronal genome mosaicism may not be restricted to somatic L1 insertions. Alu and SVA retrotransposons trans mobilized by L1 may also contribute mosaic insertions. Other than transposable element activity, recent studies have reported localized and chromosome-wide CNV in normal neurons (). We find no definitive evidence of these events in our data, though it must be noted that our CNV analyses were expressly geared to discern genomic deletions caused by WGA failure or variability. However, it must be noted that we found consistent WGA inefficiency at telomeres, while others have reported that most apparent small genomic deletions occur close to telomeres ().

Garcia-Perez et al., 2007 Garcia-Perez J.L.

Marchetto M.C.

Muotri A.R.

Coufal N.G.

Gage F.H.

O’Shea K.S.

Moran J.V. LINE-1 retrotransposition in human embryonic stem cells. Kano et al., 2009 Kano H.

Godoy I.

Courtney C.

Vetter M.R.

Gerton G.L.

Ostertag E.M.

Kazazian Jr., H.H. L1 retrotransposition occurs mainly in embryogenesis and creates somatic mosaicism. Shukla et al., 2013 Shukla R.

Upton K.R.

Muñoz-Lopez M.

Gerhardt D.J.

Fisher M.E.

Nguyen T.

Brennan P.M.

Baillie J.K.

Collino A.

Ghisletti S.

et al. Endogenous retrotransposition activates oncogenic pathways in hepatocellular carcinoma. L1 mosaicism may also occur outside of the brain, for instance during early embryogenesis () or, as we previously reported for a single L1 insertion, in the liver (). However, some cell types present practical and technical challenges not posed by neural cells. For example, hepatocytes are frequently multinucleated and sustain aneuploidy and polyploidy, greatly complicating single-cell genomic analysis. Thus, although the liver-specific L1 insertions detected here by bulk RC-seq consistently bore L1 EN motifs and were enriched in genes differentially upregulated in liver, we were unable to corroborate these findings with single-cell RC-seq or downstream PCR validation. Future methodological advances will therefore likely be required to elucidate L1 mosaicism in the liver, and elsewhere in the body.

The capacity to locate somatic L1 insertions in individual neural cell genomes is a major step toward determining whether mosaicism impacts neurobiological function. Limitations in assaying the transcriptome and genome of the same cell however currently prohibit functional assays of individual somatic L1 insertions. Nonetheless, given the frequency of these events, their mutagenic potential for protein-coding and regulatory regions and an apparent preference for euchromatic DNA linked to neurobiological function, it is not unreasonable to predict that L1-driven somatic mosaicism may alter the functional properties of the brain.