BioTASQ selectively captures G4 targets in vitro

To support affinity purification and identification of functional transcriptomic G4-RNA targets, we added a biotin tag to the biomimetic quadruplex ligands known as TASQ (Fig. 1a)14,15, known for their high G4-selectivity and, for some of them, their ability to track G4-RNAs in live cells (N-TASQ)12,16. Detailed synthesis and characterization of the biotinylated TASQ, or BioTASQ, can be found in Supplementary Figure 1 and Methods. We first evaluated the G4-interacting properties of BioTASQ via a fluorescence resonance energy transfer (FRET)-melting assay (Supplementary Figure 2)17 and electrospray ionization mass spectrometry (ESI-MS) analyses (Supplementary Figure 3)18. FRET-melting experiments were performed with BioTASQ against a panel of dual-labeled nucleic acid sequences that included: (a) three G4-DNAs (F-Myc-T and F-kit-T, found in the promoter regions of MYC and KIT gene, respectively, and F21T, the human telomeric sequence);19,20 (b) one G4-RNA (F-TERRA-T, the human telomeric transcript);21,22 and (c) one duplex-DNA as a control (F-DS-T) (Supplementary Figure 2). Introduction of a biotin tag negatively impacted G4-affinity of BioTASQ (further confirmed by ESI-MS measurements) compared to the non-biotinylated parent PNADOTASQ. However, it did not affect the ability of BioTASQ to discriminate G4s over duplexes14,23.

Fig. 1 Characterization of G4-specific affinity of BioTASQ. a Structure of BioTASQ displaying a biotin affinity tag (red circles), and schematic representation of its open (left) and closed, quadruplex-associated conformation (right), in which the intramolecular G-quartet is formed. Schematic representation of a guanine-rich RNA sequence (guanines as gray squares) in its unfolded, random-coil and folded G4 structure. b Fluorescence analysis of pull-down experiments carried out with (1) FAM-labeled oligonucleotides (1 µM): either G4-DNA (F-MYC, F-SRC, and F-22AG), duplex-DNA (F-duplex) or G4-RNA (F-TERRA, F-TRF2, and F-RAS); (2) BioTASQ (20 µM); and (3) streptavidin-coated magnetic beads. c Competitive pull-down experiments performed with F-SRC and F-NRAS (1 µM), BioTASQ (20 µM) in the absence or presence of duplex-DNA competitors (ds17 or ds26, 20 µM) or DNA extracts (calf thymus DNA, ctDNA, 100 µM), or of molecular competitors (biotin (or Biot.), 80 µM or PNADOTASQ (or TASQ), 10 µM). All experiments were done in triplicates. Error bars represent SD Full size image

Next, we assessed whether BioTASQ could efficiently capture G4s from solutions in vitro24. We used fluorescein (F)-labeled nucleic acids since fluorescence signal measurements allow for convenient and sensitive detection of the ligand/G4 assemblies. We performed these experiments with: (a) three G4-DNAs (F-Myc and F-SRC, two sequences found in the promoter regions of MYC and SRC gene, respectively25,26, and F-22AG, the human telomeric sequence); (b) three G4-RNAs (F-TRF2 and F-NRAS, two sequences found in the mRNA of TRF2 and NRAS gene, respectively;27 and F-TERRA (the human telomeric transcripts); and (c) one duplex-DNA as control (F-Duplex). Labeled oligonucleotides (1 μM) were incubated with BioTASQ (20 μM) and streptavidin-coated magnetic beads (MagneSphere®, 25 μL). After overnight incubation at 25 °C, streptavidin beads were precipitated, the supernatant removed, and the beads resuspended in denaturing buffer before thermal denaturation (8 min at 90 °C). After separation from the beads (via centrifugation and magnet immobilization), fluorescent signals from the supernatant solutions were measured to quantify nucleic acid capture. Our results confirmed the efficiency of BioTASQ-mediated G4 capture (Fig. 1b) and showed that the level of G4s recovered was dependent on both the G4 nucleic acid type and topology. G4-DNA was enriched 4.1–20.7-fold, whereas G4-RNA was enriched between 10.9- and 23.8-fold when compared to controls, while duplex-DNA was not enriched. BioTASQ appeared to have stronger preferences for certain G4 topologies, as type I (or ‘parallel’)28 G4s displayed better enrichment than type II (or “mixed-hybrid”) G4s (with 16.6-, 20.7-, 20.6-, 10.9-, and 23.8-fold enhancement for F-Myc, F-SRC, F-TERRA, F-TRF2, and F-NRAS, respectively, versus 4.1-fold enhancement for F-22AG). The preference of BioTASQ for type I G4, which displays accessible external G-quartets, is expected, given that TASQs are sterically demanding ligands that require accessible, loopless G-quartets for binding G4 targets efficiently. This property represents a limitation to the use of BioTASQ for G4 detection, especially for G4-DNAs, which have higher conformational diversity than G4-RNAs. We have not yet tested the affinity of BioTASQ on the recently reported antiparallel G4-RNA29. While the cellular prevalence of G4-RNA with antiparallel topology remains to be established, they may provide key insights into the topological preference of TASQ ligands. We confirmed that the streptavidin bead/BioTASQ system did not extract duplex-DNA (0.7-fold). We further confirmed G4 selectivity of BioTASQ via competitive pull-down experiments, which we performed with F-SRC and F-NRAS (1 μM) in the presence of an excess of duplex-DNA (either ds17 or ds26, 20 μM) or DNA extracts (calf thymus DNA, ctDNA, 100 μM, expressed in base pairs). The capture efficiencies of the fluorescently labeled G4-RNA were not significantly affected by an excess of synthetic duplexes (90–112% with F-SRC, 74-82% with F-NRAS) or with DNA extracts (76% and 90% with F-SRC and F-NRAS, respectively). We also tested BioTASQ/streptavidin association with an excess of either biotin (80 μM, to compete with streptavidin interaction) or PNADOTASQ (10 μM, to compete with G4 interaction) (Fig. 1c). Together, these results show the strong ternary interaction between G4s, BioTASQ, and streptavidin (beads), which provided the basis for the development of our G4RP protocol (described below).

G4RP isolates G4-RNA targets from human cell extracts

After confirming that BioTASQ could interact with and capture G4s in vitro, we then assessed whether it could capture G4 targets from human cell extracts. For this, we developed the G4-RNA-specific precipitation (G4RP) protocol, a modified version of the commonly used RNA-immunoprecipitation (RIP) protocol30. MCF7 cells were first crosslinked with formaldehyde to halt biological processes and stabilize transient structural interactions. Harvested cells were then sonicated briefly to release cellular content. Cell lysates were incubated with a high concentration of BioTASQ (100 µM) overnight (Supplementary Figure 3C) before affinity purification with magnetic streptavidin beads.

We first used RT quantitative PCR (RT-qPCR) with gene-specific primers to confirm the efficiency of the G4RP protocol31. G4RP-qPCR analysis of RNAs extracted from the BioTASQ-enriched fractions showed that non-specific binding was negligible (black bars) (Fig. 2a, b) while demonstrating the enrichment of two known G4-forming mRNAs, i.e., VEGFA and NRAS27 (gray bars) (Fig. 2a, b).

Fig. 2 Isolation of G4 targets from human cell extracts using G4RP. (left) Schematic representation of G4RP protocol. G4RP signals of biotin control versus BioTASQ through RT-qPCR quantification of a VEGFA and b NRAS mRNA levels in untreated MCF7 cells. c Changes induced by BRACO-19 and RHPS4 measured by the fold change of G4RP signal in VEGFA, NRAS, TERF2, and HPRT1 mRNA. Values are normalized to their individual untreated controls. Three biological replicates were used for the quantifications. Student's t-test and two-way ANOVA were performed. p-Values: *p < 0.05, ** p < 0.01, and ***p < 0.001. Error bars represent SEM Full size image

To confirm bona fide G4 formation in target mRNA sequences, we collected MCF7 cell extracts following treatments with two well-established G4-stabilizing ligands, BRACO-19 and RHPS432,33. We chose the treatment ligand doses to be between the respective IC 15 and IC 25 (doses that are growth inhibitory in 15–25% of the cells; BRACO-19: 5 µg/mL and RHPS4: 1.5 µM), as determined by growth kinetics profiling of MCF7 in the presence of G4 ligands, compared with vehicle controls (Supplementary Figure 4). Treatment with G4 ligands significantly increased G4RP-qPCR signals by 6.5-fold, 3.6-fold, and 4.4-fold in BRACO-19-treated cells and by 10.2-fold, 4.2-fold, and 2.5-fold in RHPS4-treated samples in three selected G4-rich regions, VEGFA, NRAS, and TERF2, respectively, compared to the untreated control (Fig. 2c; Supplementary Figure 5), demonstrating that BioTASQ can specifically enrich for G4-containing RNA sequences. HPRT1 was selected as an unstructured RNA control, as this housekeeping mRNA is expected to have low G4-forming potential34. Neither BRACO-19 nor RHPS4 treatment induced a significant change in BioTASQ-captured HPRT1 signals, confirming that these G4 ligands were selective for G4-rich targets. Collectively, these results indicate that the G4RP protocol is suitable for the purification and identification of G4-containing RNAs from human cell extracts, as well as the quantification of the G4 ligand-induced changes.

We performed G4RP in samples that were crosslinked to preserve the transiently formed cellular G4s while preventing the induction of G4 formation in vitro, as crosslinked nucleic acids would be in an immobilized state. We used the reported formaldehyde concentration and crosslinking conditions where over 90% of nuclear DNA are immobilized35 and anticipated that the crosslink efficiency for cellular RNA to be similar. As controls and to illustrate the importance of this crosslinking step, G4RP was performed in non-crosslinked samples. We selected three targets from the top- and bottom-ranked transcripts obtained with our G4RP-seq results (see below). BioTASQ enrichment of these targets was quantified using RT-qPCR and compared between the non-crosslinked and crosslinked samples (Supplementary Figure 6). We observed a loss of difference in BioTASQ enrichment in the non-crosslinked samples, despite the overall higher signals. These increased signals in non-crosslinked samples were likely G4s that were formed in vitro, due to the high concentration of BioTASQ, arguing that the crosslinking step is necessary to immobilize cellular RNA structures and to minimize the effects of in vitro G4 formation and destabilization through the biochemical evaluation steps. As naked, non-crosslinked RNA targets are susceptible to the in vitro G4 stabilization effects of high concentration of BioTASQ during the incubation steps, the true differences between the in vivo transient levels of the top and bottom ranking targets would be masked.

G4RP-Seq identifies transcriptome-wide transient G4-RNAs

To survey the baseline in vivo G4 transcriptomic landscape, we performed G4RP followed by sequencing (G4RP-seq) in human breast cancer cells harvested at log-phase growth. Due to the crosslinking step in the G4RP protocol, we expected to capture global levels of transient G4s in the transcriptome. Notably, as G4 ligand treatment resulted in gene expression changes in human cells36, we needed to account for these input differences; therefore, an internal input control was included for each treatment condition. To enrich for diverse G4-forming RNAs, we elected to remove ribosomal RNA targets at the cDNA library preparation step, as rRNA constitute over 80% of total cellular RNAs. Ribodepletion instead of poly-A selection was used, as we anticipated that non-poly-A RNAs, including many non-coding RNAs (ncRNAs), not only harbor but contribute to a substantial proportion of cellular G4s37. For the sequencing analysis, two comparisons were required: BioTASQ versus input (which gives normalized global levels of transient G4s for a specific transcript as Enrichment Score (ES)), and G4 ligand-treated versus untreated cells (given as Enrichment Score Change (ΔES)) (Supplementary Figure 7). After mapping reads to the hg19 reference human genome assembly with HISAT238, normalization and differential gene expression analyses were performed with HTSeq/DESeq2 pipeline39,40, by comparing the BioTASQ-enriched samples with the corresponding inputs41 (Supplementary Data 1). Of note, the relative enrichment levels by BioTASQ (given as ES) were not direct quantitative readouts of G4 formation but instead indicate the relative propensity of a specific transcript to fold into G4. ES values allow ranking of transcripts by relative G4-folding status under specific experimental conditions.

We observed BioTASQ enrichment of many gene transcripts, suggesting the existence of widespread G4-RNAs, at least transiently, in live human cells. This observation was expected since we were capturing a snapshot of the G4-RNA landscape in which some G4-rich sequences were in folded states, while others were in unfolded states. While the sequencing depth was not high enough to detect subtle changes in individual G4-forming sequences, we were able to confidently determine gene-level changes by focusing on a subset of highly abundant transcripts, filtered by high normalized mean read counts (as an estimate of relative expression level). We compared ES for each condition to generate initial lists of gene transcripts filtered by a minimal abundance threshold (>50 normalized mean read counts) (Supplementary Data 1). To ensure high-confidence hits with substantive changes, we further filtered the list and included only those with high transcript read abundance (≥500 mean read counts; approximately the top 5% of the list), which we used for downstream analyses. Gene transcripts in this filtered list were then ES-ranked for downstream analyses (Supplementary Data 1D).

ES, a gene-specific ratio of the BioTASQ signal normalized to the corresponding input signal, moderately positively correlated with their respective G/C content (Pearson correlation = 0.43, p < 0.0001) (Fig. 3b). ES was uncorrelated to gene length (Pearson correlation = 0.02, p = 0.89) (Fig. 3c). To evaluate whether the ES is related to the density of potential G4 sequences, Quadbase242 was then used to assess the number of predicted G4 (pG4) motifs (i.e. sequences with the canonical G3L1–7 G4 sequence) in each gene transcript. The ratios of the number of pG4s to gene length were calculated and then plotted against the transcripts ranked by their ES (Fig. 3d). Comparison of the ratios between the top and bottom 100 ES-ranked transcripts showed significantly higher values (2.2-fold difference) for top-ranked transcripts. Results from gene ontology analysis of the top and bottom 100 ES-ranked transcripts are summarized in Supplementary Table 143. Together, our bioinformatics analyses confirmed that transcripts with higher ES tend to have higher G/C content and higher pG4 density.

Fig. 3 Characterization of the baseline level of G4-RNA landscape using G4RP-seq. a Top 10 highly abundant transcripts (filtered by at least 500 base-read counts) with the lowest BioTASQ enrichment (blue bars, normalized to the input in the untreated sample) or with the highest BioTASQ enrichment (red bars) ranked by Enrichment Scores. b Regression plot of BioTASQ-Enrichment Score for each transcript versus its corresponding G/C content (R2 = 0.187, p < 0.001, significant non-zero relationship). c Regression plot of BioTASQ-Enrichment Score for each transcript versus its corresponding gene length (R2 = 0.00005, p = 0.89, non-significant relationship). d Number of pG4 motifs (calculated by Quadbase2 using mid stringency G3L1–7) to gene length ratio plotted against the subset of highly abundant transcripts ranked by their BioTASQ enrichment. The bar graph (inset) shows the average pG4 motif/gene length ratio between the top 100 ranked transcripts versus the bottom 100 ranked transcripts (p < 0.001, significant difference, Student's t-test) Full size image

We also found that highly expressed lncRNAs had some of the lowest relative G4 levels, as shown by their low ES (Fig. 3a). These bottom-ranked transcripts included well-known lncRNAs such as MALAT1, RPPH1, RMRP, and XIST. The presence of MALAT1 on this list was unexpected, as it has been previously reported as a G4-forming lncRNA in vitro9,10. Interestingly, residual rRNAs (that escaped from the ribodepletion step during library preparation) were also among the most depleted within this list.

G4RP-seq identifies ligand-induced changes in G4-RNA profiles

We evaluated ligand-induced changes in the G4-RNA landscapes by applying G4RP-seq to samples treated with G4 ligands and calculating an Enrichment Score Change (ΔES) from the ratio of treated versus non-treated samples. By filtering the initial list ΔES > 1.75, we found BioTASQ enrichment in 251 and 463 gene transcripts to be highly induced by BRACO-19 and RHPS4, respectively (Supplementary Data 1B–C). Results from gene ontology analysis performed on the lists of top genes are summarized in Supplementary Table 243.

Among the list of gene transcripts with a high ligand-induced increase (ΔES > 1.75) and at least 500 mean read counts, the lncRNAs MALAT1, RPPH1, and XIST were highly ranked in both BRACO-19- and RHPS4-induced gene lists (Fig. 4a). Some RNAs previously reported to harbor G4s in vitro were also identified, including NEAT1 and NRAS9,27. We found no correlation between the ligand-induced ΔES and the read counts of the transcripts (Pearson correlation = 0.02, p = 0.69) (Supplementary Figure 8). The ligand-induced ΔES of each transcript was compared to the corresponding G/C content of the transcripts, which interestingly showed a negative correlation (Fig. 4b). When the transcripts were ordered by ΔES, the pG4 density appeared to be distributed toward lower scores (1.7-fold and 2.7-fold difference between the average ratio of top and bottom 100 ranked transcripts for BRACO-19 and RHPS4, respectively) (bottom panel) (Fig. 4b). Overall, our observations suggest that transcripts with higher pG4 density were more likely to be captured in a folded state in the absence of ligands, resulting in ΔES being lower due to the higher baseline level of G4 formation. In contrast, transcripts with lower pG4 density were more likely to be unfolded in the absence of ligands and to have their G4 structures stabilized in the presence of ligands, leading to a higher ΔES.

Fig. 4 Characterization of the ligand-induced changes in the G4-RNA landscape. a Top 10 highly abundant transcripts with highest fold increase in BioTASQ enrichment (ranked by Enrichment Score Change) for BRACO-19 (red) and RHPS4 (green). Common targets that were ranked highly for both ligands are indicated in brown dashed lines. b (Top) Regression plot of BioTASQ-Enrichment Score Change for each transcript versus its corresponding G/C content (BRACO-19: R2 = 0.16, p < 0.0001; RHPS4: R2 = 0.16, p < 0.0001, significant non-zero relationship). (Bottom) Number of G4 motifs (calculated by Quadbase2 using mid stringency G3L1–7) to gene length ratio plotted against the subset of highly abundant transcripts ranked by their BioTASQ-Enrichment Score Change. Left panel is BRACO-19-induced changes and right panel is RHPS4-induced changes. c Absolute number of pG4 motifs of highly abundant gene transcripts ranked by BioTASQ-Enrichment Score for the three conditions: non-treated (black), BRACO-19 (red), and RHPS4 (green). (Bottom) Quantification of average number of pG4 motifs for top 100 and bottom 100 ranked genes for the three sets of conditions. d Venn diagram comparing top filtered BioTASQ-enriched gene lists for BRACO-19 and RHPS4. e G4RP-qPCR controls of the top lncRNA hits MALAT1, RPPH1, and XIST, in G4 ligand-treated (BRACO-19 or RHPS4) samples normalized to corresponding untreated controls. Three biological replicates were used for quantification. Two-way ANOVA was performed. p-Values: *p< 0.05, **p < 0.01, and ***p < 0.001. Error bars represent SEM Full size image

When we compared the absolute number of pG4 motifs (i.e. without normalization to gene length) between the three treatment conditions ranked by their respective ES, we observed differential changes in pG4 profiles between the two G4 ligand treatments (Fig. 4c). pG4 scores generated using different stringency of searches (G2L1–10, G3L1–5, and G3L1–7) showed similar trends (Supplementary Figure 9). BioTASQ-captured targets generated from BRACO-19-treated samples exhibited higher levels of pG4-dependent enrichment regardless of the search stringency, conceivably due to the broader range of intramolecular G4s (longer loops, 2-quartet G4s, etc.) stabilized by this ligand. On the other hand, targets generated from RHPS4-treated samples showed lower levels of pG4-dependent enrichment and a pG4-dependency could only be observed when the plots were obtained using G4 motif searches with the lower stringency. We reasoned that RHPS4-binding preference could be selective towards sequences with lower numbers of G4 motifs or highly specific G4 sequences that are less prevalent within the transcriptome. However, we cannot rule out the possibility of intermolecular G4s, as computational algorithms are currently unable to predict these structures. Given this caveat, the lack of pG4-dependent enrichment in samples treated with RHPS4 could be partially explained by a preferential ligand-induced stabilization of intermolecular G4s. The overlap between the gene lists for the two G4 ligand treatments demonstrated that they have differential G4-induction profiles, in agreement with their differential in vitro G4-structure-specific binding profiles (Fig. 4d)44.

We further validated our findings from G4RP-seq by performing qPCR on separate G4RP samples obtained with biological repeat experiments, using primers specific for the top three common lncRNA targets: MALAT145, XIST46, and RPPH147. Fold change differences between G4RP-qPCR and G4RP-seq were seen and expected due to the small qPCR region amplified. Nevertheless, the qPCR results were consistent with those obtained from sequencing, confirming that these lncRNAs were targets of G4 ligands (Fig. 4e; Supplementary Figure 10). Circular dichroism (CD) and thermal differential spectra (TDS) analyses48 of the three selected pG4 regions of MALAT1, XIST, and RPPH1 were consistent with the formation of parallel-type G4 structures in vitro in the selected pG4 motif sequences extracted from these genes (Supplementary Figure 11). In summary, we observed that the BRACO-19 and RHPS4 treatments in MCF7 cells similarly induced G4 stabilization in several highly expressed lncRNA targets, but the treatments also displayed distinct ligand specificity towards other RNA targets.