About half of the RBPs covered by RBDpeps harbor well-established RBDs and play known functions in RNA biology, reflected by a strong and significant enrichment of RNA-related protein domains and biological processes comparable to the HeLa RNA interactome ( Figures 1 I and S1 J). Note that the reduced RBP coverage of RBDmap compared to RNA interactome capture equally affects both well-established and unorthodox RBPs ( Figures 1 I and S1 J).

Analysis of the RNA-bound and released fractions by quantitative proteomics shows high correlation of the resulting peptide intensity ratios between independent biological replicates. These ratios follow a bimodal distribution with one mode representing the released peptides (gray) and the other the RNA-bound ones (red; Figures 1 E and S1 F). We detected 909 and 471 unique N-link peptides as significantly enriched in the RNA-bound fractions of LysC- or ArgC samples, respectively (1% false discovery rate, FDR) ( Figure S1 G). Notably, computed RNA-bound/released peptide intensity ratios also correlate between the LysC and ArgC data sets ( Figure 1 F), supporting the robustness of the workflow. Due to their different specificities, each protease also contributes unique 1% FDR RBDpeps to the complete peptide superset ( Figure S1 G), covering 529 RBPs that highly overlap with human RNA interactomes ( Figure 1 G) (). Proteins within the RBDmap data set range from low to high abundance ( Figure S1 H), following a similar distribution as the input fraction and the HeLa RNA interactome (). Thus, RBDmap is not selective for highly abundant proteins. There were 154 additional RBPs that were identified here, helped by the reduction of sample complexity and of experimental noise by the additional proteolytic step and the second oligo(dT) capture. In agreement with this explanation, the relative abundance of corresponding RBDpeps is higher in the RNA-bound fractions than in the “input” samples ( Figures 1 H and S1 I). Thus, RBDmap detects RNA-binding regions within hundreds of RBPs in one approach, even if it does not cover all RBPs identified by RNA interactome capture ( Figure 1 G). Proteins will be missed by RBDmap when (1) binding to non-polyadenylated RNAs, (2) displaying low crosslinking efficiency, (3) interacting with the phospho-sugar backbone, but not the nucleotide bases, or (4) lacking suitable cleavage sites for trypsin within the LysC and ArgC proteolytic fragments and hence lacking MS-identifiable N-link peptides. Thus, the distribution of arginines (R) and lysines (K) will influence whether a given RBP can be studied by RBDmap, and we used two different proteases to maximize the identification of RBDpeps.

We collected an input sample aliquot after UV irradiation, oligo(dT) selection, and protease digestion, which in principle should reflect the RNA interactome ( Figure 1 A). When compared to a non-irradiated specificity control, the resulting high-confidence RBPs overlap 82% with the previously published human RNA interactomes (). This high concordance shows that LysC and ArgC treatments are fully compatible with the RNA interactome capture protocol. The remaining two thirds of the LysC or ArgC-treated samples were subjected to a second round of oligo(dT) purification leading to two peptide pools ( Figure 1 A): (1) peptides released from the RNA into the supernatant, and (2) peptides remaining covalently bound to the RNA, representing the RNA-binding sites of the respective RBPs. Importantly, subsequent tryptic digestion of the RNA-bound LysC/ArgC fragments yields two classes of peptides: the portion that still remains crosslinked to the RNA (X-link) and its neighboring peptides (N-link) ( Figure 1 A). While the directly crosslinked peptides (X-link) are difficult to identify due to the heterogeneous mass shift induced by the residual nucleotides (), the native peptides adjacent to the crosslinking site (N-link) can be identified by standard MS and peptide search algorithms. The original RNA-bound region of the RBP (i.e., RBDpep; Figure 1 A), which includes both the crosslinked peptide (X-link) and its unmodified neighboring peptides (N-link), is then re-derived in silico by extending the MS-identified peptides to the two nearest LysC or ArgC cleavage sites.

Proteins crosslinked to poly(A) RNA are isolated using oligo(dT) magnetic beads and purified by stringent washes that include 500 mM LiCl and chaotropic detergents (0.5% LiDS), efficiently removing non-covalent binders (). After elution, RBPs are proteolytically digested by either LysC or ArgC. These proteases were selected as best suited for RBDmap by an in silico simulation of their predicted cleavage patterns of known HeLa RBPs () and their compatibility with subsequent tryptic digestion ( Figure S1 B). Analysis by mass spectrometry (MS) of LysC- and ArgC-treated samples revealed an excellent match with the in silico predictions, as reflected by the low number of missed cleavages ( Figures 1 B and 1C). The extensive proteolysis of HeLa RBPs is achieved without compromising RNA integrity ( Figures 1 D and S1 C–S1E). The average peptide length after LysC and ArgC treatment is ∼17 amino acids, which defines the resolution of RBDmap ( Figure 1 C). Note that the extensive protease treatment disrupts protein integrity, and thus protein-protein complexes that might have withstood the experimental conditions will be released into the supernatant.

To define how RBPs bind to RNA in living cells, we extended RNA interactome capture () by addition of an analytical protease digestion step followed by a second round of oligo(dT) capture and mass spectrometry ( Figure 1 A). First, UV light is applied to cell monolayers to covalently stabilize native protein-RNA interactions taking place at “zero” distance (). While UV exposure using dosages exceeding those used here can potentially promote protein-protein crosslinking (), we could not detect such crosslinks under our conditions, evidenced by the lack of UV-dependent, high molecular weight complexes in RNase-treated samples ( Figures S1 A and S4 A;).

(I) Number of proteins harboring recognizable or unknown RBDs in the HeLa mRNA interactome (left) and in RBDmap dataset (right).

(H) Comparison of the peptide intensity ratios from three biological replicates between UV-irradiated and non-irradiated inputs (x axis) and between RNA-bound and released fractions (y axis) (color code as above).

(G) Venn diagram comparing the proteins within the RBDmap data set and the HeLa, HEK293, and Huh-7 RNA interactomes.

(F) Peptide intensity ratios between LysC and ArgC experiments computed from three biological replicates. The dots represent released peptides (blue), RBDpeps (red), candidate RBDpeps (salmon), and background peptides (gray).

(E) Scatter plot comparing the peptide intensity ratios between RNA-bound and released fractions. The peptides enriched in the RNA-bound fraction at 1% (RBDpep) and 10% FDR (candidate RBDpep) are shown in red and salmon, respectively (Pearson correlation coefficient: r).

(B) LysC- and ArgC-mediated proteolysis was monitored without trypsin treatment. The protease digestion under RBDmap conditions or in buffers typically used in MS studies (optimal) were compared to in silico digestions defining 0% miscleavage. The missed cleavages were calculated and plotted.

Unexpected initially, helicase domains are underrepresented in the RNA-bound fraction ( Figure 2 C). However, the high number of released helicase peptides likely reflects (1) the transitory and dynamic interactions that helicases establish with RNA, (2) the large protein segments of the domain situated far from the RNA, and (3) the predominance of interactions with the phospho-sugar backbone over nucleotide bases ( Figures S2 C–S2E) (). Nevertheless, high-confidence RBDpeps are found at the exit of the helicase tunnel, as discussed below ( Figures S2 C–S2E).

The crystal structure of the exon junction complex reveals how it maintains a stable grip on mRNA.

Validating the RBDmap data, classical RBDs such as RRM, KH, cold shock domain (CSD), and Zinc finger CCHC, are strongly enriched in the RNA-bound fraction ( Figure 2 C). This enrichment can also be appreciated at the level of individual protein maps ( Figures 2 D and S2 B–S2D). To evaluate the capacity of RBDmap to identify bona fide RBDs, we focused on RBPs that harbor at least one classical RBD (as listed in). MS-identified peptides from these proteins were classified as “within” or “outside” a classical RBD, according to their position within the proteins’ architecture ( Figure 2 E). The relative fraction of peptides within versus outside of the RBD was then plotted for each possible RNA-bound/released intensity ratio ( Figure 2 F). Correct re-identification of classical RBDs would lead to an ascending line (i.e., within/outside ratios should grow in parallel to the RNA-bound/release ratios; Figure 2 E), while a random distribution of peptides within and outside of classical RBDs would yield a horizontal line (i.e., within/outside ratios do not vary in accordance with the RNA-bound/released ratios; Figure 2 E). As shown in Figure 2 F, the relative fraction of peptides mapping within classical RBDs increases in parallel with the RNA-bound/released ratios. Thus, RBDmap correctly assigns RNA-binding activity to well-established RBDs.

Interestingly, RNA-bound and released proteolytic fragments display distinct chemical properties. Released peptides are rich in negatively charged and aliphatic residues, which are generally underrepresented in RNA-binding protein surfaces ( Figures 2 A, 2B , and S2 A). Conversely, RBDpeps are significantly enriched in amino acids typically involved in protein-RNA interactions, including positively charged and aromatic residues. These data show that the chemical properties of the RBDpeps resemble those expected of bona fide RNA-binding surfaces. As a notable exception, glycine (G) is enriched in RBDpeps, but depleted from protein-RNA interfaces derived from available structures ( Figures 2 A and 2B). Flexible glycine tracks can contribute to RNA binding via shape-complementarity interactions as described for RGG boxes (). Hence, lack of glycine at binding sites of protein-RNA co-structures reflects the technical limitations of crystallographic studies regarding disordered protein segments.

(F) Computed ratio of peptides mapping within known RBDs versus outside RBDs, regarding their peptide RNA-bound to released ratios. The horizontal line represents the baseline for uncorrelated data (i.e., the proportion of peptides mapping to classical RBD in the whole validation set in absence of enrichment; see E bottom).

(E) Schematic representation of RBDpeps mapping within or outside of classical RBDs (left). The idealized outcome of a perfect correlation between RBDpeps and classical RBDs (top right) and random distribution are shown (bottom right).

(D) Distribution of RBDpeps and released fragments in a classical RBP. The x axis represents the protein sequence from N to C terminus, and the y axis shows the RNA-bound/released peptide intensity ratios. The protein domains are shown in boxes under the x axis (LysC: L and ArgC: A).

(C) Bar plot showing the odds ratio of the most enriched known RBDs.

(B) Amino acid enrichment within RNA-binding protein surfaces (≤4.3 Å to the RNA) over distant regions (>4.3 Å from the RNA) extracted from protein-RNA co-structures.

This analysis also successfully assigned correct RNA-binding sites to KH, DEAD-box helicase, and CSD, as shown in Figures 3 E–3J, S3 G, and S3H. The DEAD box helicase domain establishes interactions primarily with the phospho-sugar backbone of the RNA, while nucleotide bases project away from the protein core ( Figure S3 I). X-link peptide coverage of RBDmap for the DEAD box domain identifies one alpha helix in the helicase tunnel exit that coincides with the only position in RNA-protein co-crystals where multiple amino acids establish direct contacts with nucleotide bases. Interestingly, different binding orientations of the double-stranded RNA-binding motif (DSRM) have been observed in structural studies ( Figure S3 J) (). The X-link peptide coverage analysis of the DSRM domain highlights the loop separating the second and third β strands as interaction partners with the double-stranded RNA ( Figures S3 J and S3K). Note that this loop is shown in several RNA-protein co-structures to be projected into the minor grove of the double-stranded RNA helix, establishing numerous interactions with the Watson-Crick paired bases (). In summary, RBDmap faithfully re-identifies the protein surfaces of canonical RBDs that contact nucleotide bases.

To test whether RNA-binding assignments of RBDmap can reach near single-amino acid resolution, we collected the complete set of RBDpeps and released peptides mapping to a given RBD class (e.g., RRM) and assessed their relative position within the domain (from 0 to 1) as well as its adjacent upstream (from −1 to 0) and downstream regions (from 1 to 2) ( Figure 3 B). The MS-identified part (N-link) of each RBDpep was then subtracted to infer the RNA-crosslinked (X-link) moiety(s), which cannot be identified by conventional MS due to their nucleotide remnant ( Figures 1 A and 3 B). The X-link/released peptide ratio was calculated for each position in the domain, where high prevalence of X-link over released peptides will indicate RNA binding ( Figure 3 B). The high accuracy of this analysis is illustrated by the example profile obtained for RRMs. As shown in Figures 3 C, 3D, and S3 E, the highest X-link/released peptide ratio points to β strand 1, 2, and 3 as partners in the interaction with RNA, in agreement with the dozens of RNA-RRM co-structures available. Note that the LysC and ArgC proteases dissected the RRM in a differential manner: while LysC points to β strand 1 and 3, ArgC identifies β strand 2 as RNA-binding site, reflecting that the mapping capacity by these proteases depends on the distribution of lysines and arginines. Moreover, these data support the complementarity of the LysC and ArgC data sets to build accurate and comprehensive RNA-binding maps. Unexpectedly, we observed two discrete peaks of high X-link/released peptide ratio within the α helices placed at the back of the RRM. These peaks coincide with amino acids projected from the α helix to the RNA in several structures ( Figure S3 F) () and hence confirm the accuracy of RBDmap.

RBDmap also correctly assigns RNA-binding regions within large protein complexes such as the nuclear cap-binding complex. The small nuclear cap-binding protein (NCBP) 2 (or CBP20) directly contacts mRNA via the cap structure (m7GpppG), while the larger NCBP1 (CBP80) interacts with NCBP2 (). In agreement, RBDmap defines the RNA-binding region of NCBP2 within the m7GpppG-binding pocket and no RBDpep is assigned to the large NCBP1 ( Figure S3 A). Moreover, RBDmap defines the corresponding RNA-binding sites within NCBP2 () and its cytoplasmic counterpart eIF4E () ( Figure S3 B), in spite of their low sequence identity. The glutamyl-prolyl-tRNA synthetase (EPRS) represents a large non-canonical RBP that harbors two tRNA synthase domains separated by three WHEP motifs ( Figures S3 C and S3D). The first and second WHEP motif bind the GAIT RNA element present in the 3′ UTRs of a number of pro-inflammatory mRNAs (), in complete agreement with the RBDmap data.

For direct validation of the RBDmap data, we selected all those RBPs for which protein-RNA co-structures are available within the Protein Data Bank (PDB) repository. These were “digested” in silico with either LysC or ArgC, and the predicted proteolytic fragments were considered as “proximal” to RNA when the distance to the closest RNA molecule is 4.3 Å or less; otherwise, they were categorized as non-proximal ( Figure 3 A). About half of all LysC and ArgC fragments are proximal to RNA by this criterion, reflecting that many RBP structures are incomplete and focused on the RBDs (average protein coverage ∼50%). By contrast, 70.3% (LysC) and 81% (ArgC), respectively, of RBDpeps qualify as proximal, showing that RBPmap highly significantly enriches for peptides in close proximity to the RNA ( Figure 3 A). Several factors suggest that the pool of peptides classified as proximal in the analyzed structures even underestimates the performance of RBDmap: (1) in several structures of RBPs that harbor two or more RBDs, only one of the RBDs displays the interaction with RNA (e.g., PDB 3NNC ) (). At least in some of these cases, structures lack RNA contacts of RBDs that likely occur in vivo. (2) Proteins are normally co-crystallized with short nucleic acids (5 to 8 nucleotides), and their physiological RNA partners likely establish additional interactions with the RBP. (3) RNA-protein co-structures usually reflect one interaction state, while protein-RNA interactions are typically more dynamic in vivo ().

(J) As in (D), but with the PDB 3TS2 as a model for a CSD.

(I) As in (C), but for the CSD.

(H) As in (D), but using the PDB 4B8T as a model for a KH domain.

(G) As in (C), but for the KH domain.

(F) As in (D), but using the PDB 2J0S as a DEAD-box helicase model.

(E) As in (C), but for the DEAD-box domain.

(D) The ratio of X-link over released peptides was plotted in a representative RRM-RNA structural model (PDB 2FY1 ) using a heatmap color code.

(C) x axis represents the relative position of the RRM (from 0 to 1) and their upstream (−1 to 0) and downstream (1 to 2) regions. The ratio of the X-link over released peptides at each position of the RRM and surrounding regions using the LysC data set was plotted (top). The secondary structure prediction for each position of the RRM and flanking regions is shown (bottom).

(A) Schematic representation of proximal and non-proximal peptides (left). The proteins within protein-RNA co-structures were digested in silico with LysC or ArgC and predicted fragments aligned with the RBDpep supersets. The left bars represent the proportion of proximal and non-proximal LysC/ArgC fragments in the complete structure superset (random probability). The right bars show the % of aligned RBDpeps that are RNA proximal or non-proximal ( ∗∗∗ p < 0.001).

Identification of Non-canonical RBDs

Mesa et al., 2008 Mesa A.

Somarelli J.A.

Herrera R.J. Spliceosomal immunophilins. Castello et al., 2012 Castello A.

Fischer B.

Eichelbaum K.

Horos R.

Beckmann B.M.

Strein C.

Davey N.E.

Humphreys D.T.

Preiss T.

Steinmetz L.M.

et al. Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. We also noticed clusters of RBDpeps within enzymes. Peptidyl prolyl cis/trans isomerases are classified based on their domain architecture into two groups: PPI and FKBP. This protein superfamily has close links to RNA metabolism, and two members, PPIE and PPIL4, harbor classical RRMs (). However, RNA interactome studies found 11 additional members of this family that lack RRMs as RBPs, suggesting the existence of a still unknown mechanism of RNA binding (). RBDmap reveals this RNA-binding activity within both the PPI and FKBP folds ( Tables S1 and S2 ). Although lacking sufficient peptide coverage to perform an X-link peptide analysis, we noticed two clusters of RBDpeps at the N- and C-termini of the FKBP fold that are located far apart in primary sequence, but close in 3D structure ( Figures S4 B and S4C). The mapped candidate RBD opposes the catalytic site.

Weber and Brangwynne, 2012 Weber S.C.

Brangwynne C.P. Getting RNA and protein in phase. Iwasaki et al., 2010 Iwasaki S.

Kobayashi M.

Yoda M.

Sakaguchi Y.

Katsuma S.

Suzuki T.

Tomari Y. Hsc70/Hsp90 chaperone machinery mediates ATP-dependent RISC loading of small RNA duplexes. Willmund et al., 2013 Willmund F.

del Alamo M.

Pechmann S.

Chen T.

Albanèse V.

Dammer E.B.

Peng J.

Frydman J. The cotranslational function of ribosome-associated Hsp70 in eukaryotic protein homeostasis. Furthermore, we noticed clusters of RBDpeps in six chaperones of the heat shock protein (HSP) 90 and 70 families ( Figure S4 D). HSPs are induced by cellular stress and prevent protein misfolding and subsequent aggregation, which typically occur in disordered regions of RBPs in health and disease (). Indeed, HSPs have been functionally linked to RNA metabolism and translation (). Chaperone domain binding to RNA may help to increase the local concentration of the chaperone machinery at ribonucleoprotein complexes to avoid the accumulation of pathological aggregates.

Cieśla, 2006 Cieśla J. Metabolic enzymes that bind RNA: yet another level of cellular regulatory network?. Nagy and Rigby, 1995 Nagy E.

Rigby W.F. Glyceraldehyde-3-phosphate dehydrogenase selectively binds AU-rich RNA in the NAD(+)-binding region (Rossmann fold). Castello et al., 2015 Castello A.

Hentze M.W.

Preiss T. Metabolic enzymes enjoying new partnerships as RNA-binding proteins. Apparently, numerous enzymes of intermediary metabolism bind RNA through regions in close proximity to their substrate-binding pockets. Specifically, the di-nucleotide binding domain (or Rossmann fold) and mono-nucleotide binding folds emerge as bona fide RBDs with 12 proteins mapped by RBDmap ( Table S3 ), extending earlier observations (). RBDpeps mapping to Aldolase (ALDO) A and C delimit the fructose 1,6 bisphosphate interacting domain ( Figures S4 E and S4F), suggesting that RNA and metabolite may compete for this binding pocket. Overall, the RBDpeps identified within metabolic enzymes show that the few well-characterized examples such as aconitase 1 (iron regulatory protein 1, IRP1), glyceraldehyde-3-phophate dehydrogenase, and thymidylate synthase may represent the tip of the iceberg of a more general engagement of metabolic enzymes with RNA (reviewed in).

RBDmap also uncovers RNA-binding activities within PDZ, 14-3-3, ERM, and the tubulin-binding domains, which are involved in protein-protein interactions and protein localization ( Figures 4 F, 4G, and S4 G–S4I). Due to the high peptide coverage of the PDZ domain, we could generate an X-link analysis ( Figures 4 F and 4G). This map shows a discrete RNA-binding site within a basic cavity formed by a short α helix and two β strands.

Chang et al., 2014 Chang X.

Xu X.

Ma J.

Xue X.

Li Z.

Deng P.

Zhang S.

Zhi Y.

Chen J.

Dai D. NDRG1 expression is related to the progression and prognosis of gastric cancer patients through modulating proliferation, invasion and cell cycle of gastric cancer cells. Kuchta et al., 2009 Kuchta K.

Knizewski L.

Wyrwicz L.S.

Rychlewski L.

Ginalski K. Comprehensive classification of nucleotidyltransferase fold proteins: identification of novel families and their representatives in human. Wolkowicz and Cook, 2012 Wolkowicz U.M.

Cook A.G. NF45 dimerizes with NF90, Zfr and SPNR via a conserved domain that has a nucleotidyltransferase fold. RBDmap also identifies RNA-binding sites within domains of unknown function such as NDR and DZF. N-myc downstream-regulated genes (NDRGs) represent a family of proteins with unknown function. NDRG1 is a metastasis suppressor relevant for cancer progression and prognosis (), its exact molecular function has remained unknown. RBDmap resolves a conserved RNA-binding region within the NDR domain of NDRG1, NDRG2, and NDRG4. RBDpeps reproducibly map to the helix-loop-β strand structure at the C terminus of the NDR fold ( Figures S4 J and S4K). DZF is predicted to harbor nucleotidyltransferase activity () and to promote protein dimerization (). The X-link peptide coverage analysis maps the RNA-binding region to a deep, basic cleft between two symmetrical domain subunits ( Figures 4 H and 4I). The RNA-binding activity of the DZF domain is compatible with its proposed nucleotidyltransferase function.

32P]-ATP, followed by SDS-PAGE and autoradiography. We generated Tet-inducible HeLa cell lines expressing the PDZ domain of β-1-syntrophin (SNTB) 1 and SNTB2, as well as the DZF domains of Zinc finger RNA-binding protein (ZFR) and interleukin enhancer-binding factor (ILF) 2 and ILF3, all fused to a FLAG-HA tag. As positive controls, we used the full-length ILF3 (FL), its DSRM domain alone, and hnRNPC, while actin (ACTB) was used as a negative control. The PNK assay shows radioactive bands of the expected molecular weight for all tagged PDZ and DFZ domains and only when UV light was applied to the cultured cells ( To independently assess RNA-binding of PDZ and DZF domains, we used the T4 polynucleotide kinase (PNK) assay as an orthogonal approach. In brief, cells are irradiated with UV light and, after lysis, RNA is trimmed with RNase I. Proteins of interest are immunoprecipitated under stringent conditions and the presence of RNA revealed by 5′ end phosphorylation with PNK and [γ-P]-ATP, followed by SDS-PAGE and autoradiography. We generated Tet-inducible HeLa cell lines expressing the PDZ domain of β-1-syntrophin (SNTB) 1 and SNTB2, as well as the DZF domains of Zinc finger RNA-binding protein (ZFR) and interleukin enhancer-binding factor (ILF) 2 and ILF3, all fused to a FLAG-HA tag. As positive controls, we used the full-length ILF3 (FL), its DSRM domain alone, and hnRNPC, while actin (ACTB) was used as a negative control. The PNK assay shows radioactive bands of the expected molecular weight for all tagged PDZ and DFZ domains and only when UV light was applied to the cultured cells ( Figures 4 J and 4K). By contrast, no signal is detectable for the control ACTB. As expected, the DSRM domain of ILF3 also displays RNA-binding activity. Taken together, these data corroborate the RBDmap assignment of PDZ and DZF domains as RBDs.