RNA editing, a post-transcriptional process, allows the diversification of proteomes beyond the genomic blueprint; however it is infrequently used among animals for this purpose. Recent reports suggesting increased levels of RNA editing in squids thus raise the question of the nature and effects of these events. We here show that RNA editing is particularly common in behaviorally sophisticated coleoid cephalopods, with tens of thousands of evolutionarily conserved sites. Editing is enriched in the nervous system, affecting molecules pertinent for excitability and neuronal morphology. The genomic sequence flanking editing sites is highly conserved, suggesting that the process confers a selective advantage. Due to the large number of sites, the surrounding conservation greatly reduces the number of mutations and genomic polymorphisms in protein-coding regions. This trade-off between genome evolution and transcriptome plasticity highlights the importance of RNA recoding as a strategy for diversifying proteins, particularly those associated with neural function.

Cephalopods are diverse and can be divided into the behaviorally complex coleoids, consisting of squid, cuttlefish, and octopus, and the more primitive nautiloids. In this paper we show that in neural transcriptomes extensive A-to-I RNA editing is observed in the behaviorally complex coleoid cephalopods but not in nautilus. The edited transcripts are translated into protein isoforms with modified functional properties. By comparing editing across coleoid taxa, we found that, unlike the case for mammals, many sites are highly conserved across the lineage and undergo positive selection, resulting in a sizable slow-down of coleoid genome evolution.

Recently, we reported an apparent exception: squid contain an unusually high level of recoding, with the majority of mRNAs in the nervous system harboring at least one event (). This intriguing but anecdotal result raised fundamental questions about the nature of recoding in these organisms. Does the massive RNA-level recoding translate into proteome diversification? Is it simply a neutral byproduct of a promiscuous ADAR tasked with another function or adaptive, providing a functional advantage? Finally, is it related to behavioral sophistication?

It is generally assumed that genetic information passes faithfully from DNA to RNA to proteins. Proteome complexity, however, depends on a diverse set of post-transcriptional processes that modify and enrich genetic information beyond the genomic blueprint. RNA editing is one such process. Adenosine deamination to inosine by the ADAR family of enzymes (adenosine deaminases acting on RNA) is the most common form of editing among animals (). Because inosine is recognized as guanosine during translation (), this process has the capacity to recode codons and fine-tune protein function. However, it seldom does so. Transcriptome-wide screens have revealed that only ∼3% of human messages and 1%–4% of those from Drosophila harbor a recoding site (). Even more surprising is the limited extent to which this process is conserved. There are only about 25 human transcripts that contain a recoding site that is conserved across mammals (), and only about 65 recoding sites conserved across the Drosophila lineage (). In C. elegans, only a few putative recoding sites have been identified, some of which were not validated (). These data support the hypothesis that recoding by RNA editing is mostly neutral or detrimental and only rarely adaptive ().

One may also quantify the effect of purifying selection in these regions by studying the fraction of inter-species mutations that were avoided, presumably due to maintaining the dsRNA structures required for editing. We analyzed the inter-species mutation rates (in orthologous parts of the respective transcriptomes) as a function of the distance to the closest conserved recoding site and found again that the rates are considerably lowered in the vicinity of editing sites, compared with the baseline rate observed far from any editing site ( Figure S5 A). Attributing the difference between the observed mutations rate and the baseline to effects of editing on genome evolution, and integrating this difference over the entire transcriptome, we estimate that 3%–15% of all transcriptomic inter-species mutations are purified (numbers vary for the various species pairs), apparently due to constraints imposed by editing. Similarly, we find that the actual number of SNPs in cephalopod coding sequences is 10%–26% lower than what would be seen in the absence of SNP suppression in the vicinity of recoding sites ( Figure S5 B). Thus, the purifying selection against inter-species mutations and intra-species genomic polymorphisms residing in proximity to recoding sites results in a sizable reduction in the global number of mutations and polymorphisms in these species, revealing an unanticipated genome rigidity required to maintain the extensive transcriptome recoding.

(B) Similar to the analysis presented in (A), we looked at the SNPs density (number of SNPs per bp) as a function of the distance of the genomic base to the closest recoding site. The baseline SNPs density (red) is the density observed at distances of > 500 bp from a recoding site (including SNPs in ORFs harboring no editing site). Integrating the difference between the baseline and the observed rate for genomic location at a distance shorter than 500bp provides an estimate of the number of avoided SNPs, apparently due to the constraints imposed by editing. Here too, we estimate the fraction of avoided SNPs as the number of avoided SNPs divided by (#avoided+#observed), and find this fraction (marked as A in the different panels) to be 10%–26%. This is an underestimation, as the contributions of sites in non-conserved transcriptome regions, are not taken into account.

(A) Mutation rates (averaged over all orthologous parts of the transcriptome) are presented as a function of the distance of the genomic base to the closest conserved recoding site. The baseline mutation rate (red) represents the rate measured for all sites at distances > 1,000 bp from a recoding site (including mutations in ORFs harboring no editing site). Integrating the difference between the baseline and the observed rate for genomic location at a distance shorter than 1,000 bp provides an estimate of the number of avoided mutations, apparently due to the constraints imposed by editing. We estimate the fraction of avoided mutations as the number of avoided mutations divided by (#avoided+#observed), and find this fraction (marked as A in the different panels) to be 3%–15%. This is an underestimation, as sites not conserved between the two species, or sites in transcriptome regions not conserved between the two species, are not taken into account.

The cumulative effect of this evolutionary constraint is considerable. Due to the large number of recoding sites and the extended range of the associated genomic rigidity, the local constraints observed in the vicinity of the recoding sites translate into a substantial global effect on genome evolution. These 200 nt windows around recoding sites cover a sizable fraction of all protein coding sequences: 23%–41%, depending on the coleoid species.

To edit a specific adenosine within an RNA, ADAR enzymes require surrounding dsRNA structures. These structures are often large, spanning hundreds of nucleotides (). If editing is under positive selection, maintaining these structures would require elevated sequence conservation in the vicinity of editing sites (). As this sequence conservation stems from constraints related to RNA structure, rather than its coding capacity, it should affect synonymous and non-synonymous changes equally. Indeed, we see a marked depletion of inter-species mutations ( Figures 7 A and S4 A–S4F ) and intra-species genomic polymorphisms ( Figures 7 B and S4 G), synonymous and non-synonymous alike, up to ∼100 nt on each side of a recoding site. These regions show an elevated GC content ( Figures 7 C and S4 H), consistent with the requirement for the formation of stronger secondary structures.

(H) Higher GC content in the vicinity of editing sites. GC content is elevated near editing sites. This effect persists up to ∼100 bp, and is therefore unlikely to be explained by the local ADAR preferences. Rather, it may facilitate formation of stronger and more stable double-stranded RNA structures required for ADAR binding and editing. Blue: all editing sites; green: all recoding sites; red: conserved recoding sites; dashed black line: GC content averaged over all ORFs. Error bars represent the SEM. The effect is not seen in for AG mismatches found in nautilus, expected to be mostly false-positives.

(G) Intra-species genomic polymorphisms are suppressed in genomic loci surrounding editing sites. Genomic polymorphisms are depleted near editing/recoding/conserved-recoding sites, attesting for a decrease in genome plasticity. Effect is stronger for recoding sites, and even more so for the conserved recoding sites.

(A–F) Species-to-species mutations are suppressed within ∼100 bp of a recoding site shared by the two species (left panels). No effect is seen when taking random non-edited sites from conserved regions within the same transcripts (right panels) (A) Oct.bim.-Oct.vul. (B) Oct.bim.-Sepia (C) Oct.bim.-Squid (D) Sepia-Oct.vul. (E) Squid-Oct.vul. (F) Squid-Sepia. (yellow – synonymous change, light green – non-synonymous; dark green – deletions; Mutations density is the number of mutations found at a given distance from an editing site divided by the number of such sites, i.e., mutations per base-pair).

(C) GC content is elevated near editing sites in squid, allowing for more stable double-stranded RNA structures. The effect is even stronger in conserved sites. Dashed line represents the baseline GC level in the entire ORFome, and error bars represent the SEM.

(B) Genomic polymorphisms are depleted near editing/recoding/conserved-recoding sites in squid, attesting to reduced genome plasticity. Effect is stronger for recoding sites, and even more so for the conserved recoding sites.

(A) Inter-species mutations are purified from genome loci surrounding conserved recoding sites (data shown for sites shared by squid and sepia). Depletion of mutations extends up to ∼100 bp of shared recoding sites (left). As a control, we show the mutation density (mutations/bp) around random non-edited adenosines from the same transcripts (right). Yellow—synonymous change; light green—non-synonymous; dark green—deletions.

To measure the effects of editing on functional properties, we expressed all channels in Xenopus oocytes and studied them using the Cut-Open Oocyte Vaseline Gap Voltage Clamp technique (). The unedited versions of the channels open over a similar range of voltages but have different opening, closing, and inactivation kinetics ( Figures S3 and 6 B ). To examine the effects of editing, we first looked at the sepia-specific editing site I529V. Figure 6 Ai shows superimposed current traces, obtained in response to a voltage step from −80 mV to +40 mV, for the unedited and edited (I529V) versions of sepia K2. Clearly, the edited channel inactivates more quickly, at all voltages tested ( Figure 6 Aii; there is also a very modest slowing of channel closure upon bringing the voltage back to −80 mV). Editing had no effect on voltage sensitivity and channel opening (data not shown). We next looked at a common editing site (squid I579V, sepia I630V, and octopus I632V) and found that it predominantly affects the channels’ closing rates. Interestingly, the direction of the effect is species dependent. First we analyzed the tail currents in the unedited squid, sepia, and octopus channels by recording currents at a negative membrane voltage of −80 mV following a brief activating pulse to a positive potential ( Figure 6 Bi). Each closes at distinct, species-specific rates, with squid the fastest and octopus the slowest. However, upon introduction of the common editing event, the channels converge on a similar rate ( Figure 6 Bii); in squid, editing this site slows closing, whereas in octopus and sepia, it speeds it. This effect on closing kinetics was consistent at all voltages tested ( Figures 6 Biii and 6Biv). Based on these data from K2 orthologs, and the fact that editing is exceptionally abundant in ion channels and proteins involved in synaptic vesicle release and recycling, the overall influence of RNA editing on neurophysiology is likely profound and complex.

(B) (i) Tail currents measured at a voltage (Vm) of −80 mV, following an activating pulse of +20 mV for 25 ms. Traces are shown for the WT K v 2.1 channels from squid, sepia, and Octopus vulgaris. (ii) Tail currents for the same channels edited at the shared I-to-V site in the 6 th transmembrane span, following the same voltage protocol. (iii) Time constants from single exponential fits to tail currents obtained at various negative voltages (Vm) (following an activating pulse to 20 mV for 25 ms) show that the unedited channels close at distinct rates, (iv) but the edited versions close at similar rates. N = 5 ± SEM for all data plotted in this figure.

(A) (i) Current traces resulting from a voltage step from −80 mV to 40 mV for the WT Sepia K v 2.1 and the same construct containing the sepia-specific I529V edit, lying within the 4 th transmembrane domain (green), showing that I529V accelerates the rate of slow inactivation. (ii) Time constants for slow inactivation determined by fitting single exponentials to traces similar to those in panel (i) at different activating voltages (Vm).

Unedited (WT) and singly edited versions of the voltage-dependent Kchannels of the K2 subfamily were studied under voltage clamp (see Table S9 ).

(D) The half-times for 150 ms activating pulses were measured from traces similar to those presented in panel A. They measure the time it takes for the currents to reach one half of their maximum values following an activating voltage step. N = 6 ± SEM for all graphs.

(C) Inactivation time constants (τ) taken from single exponential fits to current traces measured during a 1 s activation step of V m (similar to the traces presented in panel A).

(A) Examples of current traces recorded on fast (top) and slow (bottom) time-bases following activating pulses to +60 mV from a holding potential of −80 mV. These traces illustrate the variable activation and inactivation rates of the genomically encoded channels.

We next tested whether species-specific and conserved recoding events can affect protein function. We studied sepia, squid, and Octopus vulgaris K2 potassium channel orthologs, whose messages are abundantly edited (34–55 sites per species; 5 sites shared between all species; Figure S2 and Table S9 ). Voltage-dependent potassium channels of the K2 subfamilly, also known as “delayed rectifiers,” are expressed across the metazoa. In the mammalian central nervous system, they regulate excitability, action potential duration, and repetitive firing (). As with most voltage-dependent potassium channels, they are predominantly closed at negative membrane potentials and open at positive ones. When switched between negative and positive potentials, they open or close with characteristic rates. At positive potentials, channels will also spontaneously close after opening, a process known as “inactivation.” The kinetics of these three processes play a vital role in determining how the channels regulate electrical signaling.

Orthologous cephalopod K2.1 channel sequences are abundantly edited. Editing sites predicted by our analysis are indicated by small boxes around specific amino acids. Conserved transmembrane spans S1-S6 are enclosed in large boxes. The unique editing site for Sepia in S4 (I529V) and the common editing site in S6 (Squid I579V, Sepia I630V and Octopus vulgaris I632V) were those studied using electrophysiology in Figure 6 of the main test. The alignment was generated using Vector NTI software.

Overall, the non-synonymous to synonymous (N/S) ratio for cephalopod edits is 65/35 = 1.9, as expected under neutrality taking into account the ADAR target motif (). However, the N/S ratios increases to much higher values as editing levels increase ( Figure 5 A), signaling positive selection for the highly edited sites. Conserved sites show an even stronger pattern ( Figure 5 B), where almost all highly edited, conserved sites are non-synonymous. Consistently, and in stark contrast with mammals, the higher the editing levels, the more sites are conserved ( Figures 5 C and 5D). Furthermore, editing is over-represented in highly conserved regions of the transcriptome (>95% identity between species) ( Figure 5 E). Taken together, these results suggest that recoding by RNA editing is commonly adaptive in coleoid cephalopods, with many thousands of recoding sites under positive selection.

(D) Highly conserved regions of the transcriptome are enriched in editing sites, further attesting to positive selection of RNA editing. Density of editing sites (number of AG sites normalized by length) is higher for 112 recoding regions that are highly conserved across the four species (> 95% identity; average length 1382 bp), compared with all other, less conserved regions (Wilcoxon p value < 0.001 for all species). Error bars represent the SEM.

(C) In contrast with the case in humans, highly edited sites tend to be more conserved: the fraction of conserved sites rises with the editing level for all species pairs but more dramatically for the closely related octopuses and the sepia-squid pair.

(B) Editing levels are higher in conserved recoding sites. Distributions of editing levels in four groups of putative A-to-I editing sites: recoding and conserved (Rec + , Cons + ), recoding and non-conserved (Rec + , Cons − ), conserved sites that cause a synonymous change (Rec − , Cons + ), and non-conserved synonymous sites (Rec − , Cons − ). Horizontal red lines mark the median level, and yellow diamonds mark the mean. Conservation and non-synonymity are both positively correlated with higher editing levels, as well as their interaction (ANOVA, p value < 1.0e−162). Data presented here for squid (conserved sites are conserved in sepia), but the results are similar and significant for all species.

(A) The fraction of recoding sites among all editing sites in coding region increases with editing levels (top), as well as the fraction of recoding sites among all conserved sites (bottom). Red horizontal dashed line represents the recoding fraction expected assuming neutrality.

To conduct a comparable analysis of the recoding repertoire in cephalopods, we first identified the non-synonymous editing sites. About 65% of cephalopod edits within coding sequences are non-synonymous, leading to 54,287–86,230 recoding sites in 6,688–8,537 ORFs ( Table S7 ), orders of magnitude more recoding than any other species. In sharp contrast with mammals, thousands of recoding sites are shared between species ( Figures 4 A and 4B ). As expected, the fraction of conserved sites is higher for species that are evolutionary closer ( Figure 4 C), but unlike the picture observed in other evolutionary lineages ( Figure 4 D), editing in coding sequences is, to a large extent, conserved. Interestingly, 1,146 editing sites (in 443 proteins) are conserved and shared by all four coleoid cephalopod species ( Figure 4 E). A large fraction of proteins are recoded at multiple sites, and many proteins harbor multiple conserved and highly edited (>10% in at least one species) recoding sites ( Figure 4 F; Table S8 ). Notably, even the editing levels in the shared sites are conserved and exhibit significant and sizable correlations between evolutionarily distant species ( Figures 4 G and S1 ).

Editing levels in 887 recoding sites shared by all species are highly, positively and significantly correlated in all pairs of coleoid cephalopod species (p < 1.0e-75 for all pairs). Correlation is higher the closer the species are to each other in evolutionary terms, with Pearson rho = 0.95 for the two octopus species.

(G) Not only are the locations of editing sites conserved, but their editing levels are correlated as well. Editing levels in 887 recoding sites shared by all species are highly, positively, and significantly correlated in all pairs of coleoid cephalopod species (p < 1e−75 for all pairs; see Figure S1 for three additional pairs). Correlation is higher the closer the species are to each other in evolutionary terms, with Pearson rho = 0.95 for the two octopus species.

(F) Some proteins include multiple highly edited recoding sites (see Table S8 ). Of note are uromodulin, α-spectrin (previously reported to harbor the highest number of recoding sites in squid;), and calcium-dependent secretion activator 1 (CAPS1) with 14, 8, and 7 strong shared recoding sites, respectively. Recoding in CAPS1 was found to be conserved in vertebrate species from human to zebrafish ().

(E) Interestingly, 1,146 AG modification sites (in 443 proteins) are conserved and shared by all four coleoid cephalopod species. Of these, 887 are recoding sites and 705 are highly edited (≥10% editing) recoding sites (in 393 proteins).

(D) In contrast, only 36 human recoding sites (1%–2% of human recoding sites) are shared by mouse, and a similar number are shared between Drosophila melanogaster and D. mojavensis () (diverged at later times than squid-sepia).

(C) The majority of editing sites are conserved between the two octopus species, and even the most distant species share a sizable fraction of their sites.

(B) Virtually all (97.5%–99%) mismatches conserved across species are A-to-G, resulting from A-to-I editing. Manual inspection of the few non-A-to-G mismatches appearing in multiple species suggests that they either result from systematic erroneous alignments or are actually editing sites that were mistakenly identified as G-to-A mismatches due to insufficient DNA coverage.

(A) Tens of thousands of sites are conserved across species (see Table S7 ). The closer the species are evolutionarily, the higher the number of conserved sites.

Mammalian editing events in the coding region (and the editing levels) are negatively correlated with the importance of a site or gene—essential genes, and genes under strong functional constraints, tend to harbor lower numbers of editing sites and exhibit lower editing levels (). Furthermore, non-synonymous editing sites are suppressed, compared with synonymous ones, and the fraction of editing sites that are conserved across mammals is minute. These and other observations have led to the conclusion that although a few mammalian recoding sites are clearly beneficial, overall recoding by RNA editing is non-adaptive in mammals, presumably resulting from tolerable promiscuous targeting by the ADAR enzymes ().

An intriguing result from the recently reported Octopus bimaculoides genome was that the protocadherin gene family was greatly expanded (). In the mammalian brain these proteins are important for mediating combinatorial complexity in neuronal connections and play a role in diversifying neural circuitry (). We found a large number of protocadherins in the assembled transcriptomes for the four coleoid species (127–251 open reading frames [ORFs]), but not in nautilus (28 ORFs) ( Figure 3 E). Interestingly, protocadherins are significantly enriched in editing sites and are edited at higher levels in all four coleoid species, but not in nautilus ( Figures 3 F and 3G).

We analyzed RNA from 12 different tissues, including non-neural ones, and found 903,742 editing sites in the transcriptome ( Table S5 ), 12% of which reside in coding regions ( Figures 3 B and 3C). In mammals, editing mostly occurs within genomic repeats (). In primates specifically, most RNA-editing sites are found in Alu repeats, whose sequence facilitates the creation of a double-stranded RNA structure that promotes ADAR binding. Similarly, editing in Octopus bimaculoides is enriched in repeat regions (303,414/903,742 sites, 34%; 159,005 of them in annotated repeats). The “editing index,” a robust measure of editing activity () defined as the editing level averaged over all adenosines (edited and unedited) weighted by expression level, is calculated to be 0.21% in octopus repeats for the panel of 12 tissues studied, which is comparable to the index observed in human Alu repeats (). Unlike primates, though, there is not one specific repeat family that was found to contain the majority of sites, and SINEs are not edited more than other repeats ( Figure 3 D). Therefore, as the repeat editing index in octopus is calculated over all repeats (∼1.3 Gbp), and editing in repeats accounts for only 21%–38% of all editing events in octopus mRNAs (compared to >95% in primates), overall the number of editing events reflected in mRNA sequencing data is roughly an order of magnitude higher in Octopus bimaculoides compared to primates. Furthermore, in neural tissues ∼11%–13% of these events result in amino-acid modification, compared with <1% in mammals (). RNA editing is known to be important in neural function (), and abnormal editing patterns or ADAR function have been shown to underlie several neural conditions (). Indeed, we find that editing in non-neural tissues of Octopous bimaculoides is roughly 2-fold lower, and recoding events are even more strongly suppressed ( Figure 3 B). Consistently, GO analysis of edited transcripts shows enrichment of neuronal and cytoskeleton functions in all four species ( Table S6 ).

For most organisms, A-to-I editing is markedly depleted from the protein-coding regions of the transcriptome. The question then arises of whether the extensive recoding in cephalopods is accompanied by extra-ordinary editing of the non-coding transcriptome. Recently, a genome was published for Octopous bimaculoides, the first from a cephalopod (), allowing us to use genome-dependent methods () to study the full editome, including editing in non-coding sequences, as well as to compare with the genome-free method for the coding regions. Analyzing RNA-seq data of the same four neural tissues studied in the transcriptome-based approach resulted in 800,941 editing sites, 105,380 of them in annotated coding sequences (compared to 76,862 sites in coding sequences identified by the genome-free pipeline); 49,483 of these were also found using the transcriptome-based genome-free approach ( Figure 3 A). Differences between the two methods are due to the different de novo transcriptomes used and the different methods employed to filter out random mismatches (see STAR Methods ). These results suggest that the genome-free method provides a reasonable coverage of the editing signal in coding sequence, and that the number of editing sites outside the coding region is likely to be an order of magnitude higher than the number within the coding sequences for the other cephalopods studied here.

(F and G) Protocadherins contain significantly higher numbers of AG sites (F) and are edited at higher levels (editing level summed over all sites and normalized by ORF length), in all four coleoid species but not in nautilus (G). Error bars represent the SEM.

(E) Protocadherins is a gene family known to be principally expressed in the brain, important for mediating combinatorial complexity in neuronal connections and thought to play a role in diversifying neural circuitry (). It was impressively expanded in Octopus bimaculoides (). A large number of protocadherins are found in the assembled transcriptomes for the four coleoid species (127–251 open reading frames), but not in nautilus (28 open reading frames).

(D) Unlike the case in mammals, editing is not exceptionally enriched in specific repeat families in Octopus bimaculoides, as measured by the editing index (here defined as the editing level averaged over all, edited and unedited, adenosines in each specific repeat family).

(C) The number of editing sites in coding region is comparable to the number found in introns.

(B) RNA-editing levels, measured across the whole transcriptome (see Table S5 ) by the editing index (weighted average of editing levels over all editing sites identified in the transcriptome, see STAR Methods ). Levels vary across tissues and are highest for neural tissues (see Table S6 ). Unlike mammals, a sizable fraction of editing events (11%–13% in neural tissues) result in recoding events. Annotation of transcripts and repeats is based on. (CNS = central nervous system; ANC = axial nerve cord; OL = optic lobe; Sub = subesophageal ganglia; Supra = supraesophageal ganglia; PSG = posterior salivary gland; ST15 = stage 15 embryo.)

(A) A-to-I editing sites were found within coding sequences of Octopus bimaculoides using three methods: the genome-free method (alignment to de novo transcriptome), the genome-dependent approach using REDItools (), and identification of hyper-edited reads (). Overall, the three methods identified 170,825 unique AG sites in Octopus bimaculoides coding sequences (38,066 hyper-editing sites do not overlap those found by the other methods). See STAR Methods for analysis of the differences between the results of the first two methods.

Note that the shotgun proteomics method used here provides only partial coverage of the tryptic peptides generated by the proteolysis (). This is demonstrated by the fact that ∼90% of the recoded amino acids are completely missing from our data, regardless of their editing state. Accordingly, lack of peptide evidence for an edited or unedited form of a given site cannot be considered as evidence for this isoform not being present. However, it is possible that some of the editing sites are not translated or do not produce a stable protein, possibly due to deleterious effect of editing on the protein structure.

Sanger-sequencing validation of the sites detected by the present scheme was previously reported (). Here, we employed mass-spectrometry analysis to further test whether the multitude of novel RNA isoforms created by extensive RNA editing are translated into proteins, resulting in extensive proteome diversification by recoding. We analyzed squid giant axon and stellate ganglion samples, looking for peptides translated from RNA that include editing sites. To simplify the analysis, we considered only peptides that include a single non-synonymous (recoding) editing site and checked whether the edited, non-edited, or both versions of the peptide were observed. For squid stellate ganglion, a total of 74,146 unique peptides were detected, 4,115 of which harbor 5,617 recoding sites, and 3,204 peptides included a single predicted site. Of these, 320 sites (10.0%) were shown to be edited (174 cases where both the pre-edited and edited versions are observed, and 146 found only in the edited version), including most of the sites predicted to be edited at high levels. Similarly, for squid giant axon 58,403 unique peptides were detected, 3,579 of which harbor 4,956 predicted recoding sites, and 2,741 peptides included a single predicted site. Of these, 283 sites (10.3%) were shown to be edited (160 cases where both pre-edited and edited versions are observed, and 123 found only in the edited version). Altogether, this experiment validated 432 protein-recoding sites. The fraction of sites validated correlated very well with the editing level predicted from RNA-seq data ( Figure 2 ).

We analyzed peptides identified by mass-spectrometry analysis of two squid tissues, looking for evidence of recoding. For each site covered by one or more peptides, we marked whether the edited, non-edited, or both versions of the peptide are observed. The distribution is presented, binned by the predicted RNA-editing level (as measured from RNA-seq data). In parentheses are the numbers of recoding sites analyzed in each editing-level bin. The proteomic recoding level follows closely the predicted RNA-editing level. Altogether, this experiment validated protein recoding in 432 sites in two tissues:

For sepia, squid, and the two octopus species, most mismatches (>80%) detected by the above approach were A-to-G mismatches, and the noise level, estimated by the number of G-to-A mismatches, is rather low at 2%–3% ( Figure 1 B). Furthermore, the residues surrounding the detected A-to-G sites exhibit a sequence pattern consistent with the known preferences for ADARs () ( Figure 1 C). We thus attribute these mismatches to A-to-I RNA-editing events and obtain 80–130 thousand editing sites in protein-coding regions ( Tables S3 and S4 ). Remarkably, results from nautilus and Aplysia are in sharp contrast. First, we found only 1,150 and 933 A-to-G mismatches for these species, much less than for the octopus, squid, and sepia. Moreover, there is no excess of A-to-G mismatches over other events ( Figure 1 B; Tables S3 and S4 ), and the residues surrounding the detected A-to-G sites do not exhibit any sequence preference ( Figure 1 C). Thus, the A-to-G mismatches found in nautilus and Aplysia are likely to be (mostly) noise, with very few, if any, editing sites. Accordingly, editing within the coding sequence of these species is orders of magnitudes lower than for the octopus, squid, and sepia. These data suggest that extensive recoding through RNA editing evolved along the coleoid lineage. As all of the cephalopod groups that separate coleoids and nautiloids are now extinct (e.g., belemnites and ammonoids), it will be difficult to pinpoint a more exact time for the emergence of extensive RNA editing.

A full genome sequence is not available for the cephalopod species used in this study (except for Octopus bimaculoides; see below). Thus, to detect editing sites, we used a genome-independent method () that focuses specifically on the coding regions of the transcriptome. Briefly, RNA-seq data (174–366 million reads per species; Table S1 , also see) was utilized to assemble a de novo transcriptome (), and the coding sequences were identified by comparison with Swiss-Prot () open reading frames ( Table S2 ). RNA and DNA reads were then aligned to the assembled transcriptome (using Bowtie2 [] with local alignment configuration and default parameters). To detect editing events, we looked for systematic mismatches between RNA and DNA reads within the coding part of the transcriptome, filtering out those that stem from sequencing errors or genomic polymorphisms (see STAR Methods for more details). The A-to-G DNA-to-RNA mismatches that are identified by this process could result from A-to-I RNA editing, whereas other types of mismatches provide an estimate of our false-detection rate.

To assess the level of recoding via A-to-I RNA editing in cephalopods, we analyzed matching DNA and RNA samples of individual animals from species that span the cephalopod evolutionary tree. We studied four members of the coleoid cephalopod subclass (soft-bodied cephalopods): two octopuses (Octopus vulgaris and Octopus bimaculoides), a squid (Doryteuthis pealeii), and a cuttlefish (Sepia oficianalis), as well as a nautiloid (Nautilus pompilius) and a gastropod mollusk (Aplysia californica), as an evolutionary outgroup. Cephalopods emerged in the late Cambrian period, roughly at ∼530 million years ago (mya), and the divergence of nautiloids from coleoides is estimated to have occurred at 350–480 mya (). The coleoides diverged to Vampyropoda (octopus lineage) and the Decabrachia (squid and cuttlefish lineage) at ∼200–350 mya (). Divergence of squid from Sepiida is estimated to have occurred at 120–220 mya (). The two octopus species used in this study, Octopus vulgaris and Octopus bimaculoides, have been shown to be closely related using mitochondrial DNA and are in some cases even indistinguishable, depending on the geographical origins of the specimens (). The divergence time between the gastropod species Aplysia californica and cephalopods is estimated to be 520–610 mya (). A general representation of the phylogenetic relations between the species is shown in Figure 1 A.

(C) The nucleotides neighboring the detected editing sites show a clear pattern consistent with known ADAR preference () for the extensively recoded coleoid species—squid, sepia, and the two octopus species—but not in nautilus or sea hare. The motif is characterized by under-representation of G upstream to the editing site (relative location −1) and over-representation of G in the downstream base (the height of the entire stack of letters represents the information content in bits; the relative height of each letter represents its frequency).

(B) Tens of thousands of A-to-I editing sites (identified as A-to-G DNA-RNA mismatches) are detected in squid, sepia, and the two octopus species (see Tables S1–S4 for more details). The noise level (estimated by the number of G-to-A mismatches) is rather low. In contrast, in nautilus and sea hare, no enrichment of A-to-G mismatches is observed (inset).

(A) The species studied span the cephalopod evolutionary tree, as well as sea hare (Aplysia californica), as an outgroup (top). For comparison, a representative tree for vertebrates is shown (bottom), constructed based on divergence times estimated in

Cephalopod origin and evolution: A congruent picture emerging from fossils, development and molecules: Extant cephalopods are younger than previously realised and were under major selection to become agile, shell-less predators.

Cephalopod origin and evolution: A congruent picture emerging from fossils, development and molecules: Extant cephalopods are younger than previously realised and were under major selection to become agile, shell-less predators.

Cephalopod origin and evolution: A congruent picture emerging from fossils, development and molecules: Extant cephalopods are younger than previously realised and were under major selection to become agile, shell-less predators.

Discussion

Burns et al., 1997 Burns C.M.

Chu H.

Rueter S.M.

Hutchinson L.K.

Canton H.

Sanders-Bush E.

Emeson R.B. Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Higuchi et al., 1993 Higuchi M.

Single F.N.

Köhler M.

Sommer B.

Sprengel R.

Seeburg P.H. RNA editing of AMPA receptor subunit GluR-B: a base-paired intron-exon structure determines position and efficiency. Bazak et al., 2014a Bazak L.

Haviv A.

Barak M.

Jacob-Hirsch J.

Deng P.

Zhang R.

Isaacs F.J.

Rechavi G.

Li J.B.

Eisenberg E.

Levanon E.Y. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Liddicoat et al., 2015 Liddicoat B.J.

Piskol R.

Chalk A.M.

Ramaswami G.

Higuchi M.

Hartner J.C.

Li J.B.

Seeburg P.H.

Walkley C.R.

Danecek P.

et al. RNA editing by ADAR1 prevents MDA5 sensing of endogenous dsRNA as nonself. Mannion et al., 2014 Mannion N.M.

Greenwood S.M.

Young R.

Cox S.

Brindle J.

Read D.

Nellåker C.

Vesely C.

Ponting C.P.

McLaughlin P.J.

et al. The RNA-editing enzyme ADAR1 controls innate immune responses to RNA. Xu and Zhang, 2014 Xu G.

Zhang J. Human coding RNA editing is generally nonadaptive. Seminal studies on RNA editing focused on recoding events and their functional outcomes (). Later, with the advent of deep-sequencing technologies and accompanying computational advances, transcriptome-wide screens showed that recoding is extremely rare. For example, there are millions of editing sites in the human transcriptome, but almost all of these reside in untranslated regions (). This distribution implies some fundamental principles about RNA editing by ADARs. First, there is an active mechanism for excluding editing sites from coding regions; otherwise they would be far more common. Second, although there are clear exceptions for individual editing sites, the overall purpose of editing is not to recode (). This point is reinforced by the fact that most mammalian recoding sites are neutral at best (). The abundant recoding in coleoids reported here runs contrary to these ideas.

Liddicoat et al., 2015 Liddicoat B.J.

Piskol R.

Chalk A.M.

Ramaswami G.

Higuchi M.

Hartner J.C.

Li J.B.

Seeburg P.H.

Walkley C.R.

Danecek P.

et al. RNA editing by ADAR1 prevents MDA5 sensing of endogenous dsRNA as nonself. Mannion et al., 2014 Mannion N.M.

Greenwood S.M.

Young R.

Cox S.

Brindle J.

Read D.

Nellåker C.

Vesely C.

Ponting C.P.

McLaughlin P.J.

et al. The RNA-editing enzyme ADAR1 controls innate immune responses to RNA. We presented evidence that high-level recoding was invented by coleoids, or an extinct ancestor, after the divergence of the nautiloids. It is plausible that protein recoding may not be the primary function of editing in cephalopods. Perhaps there are other purposes for robust ADAR activity, such as its potential use in innate immunity (). As with any mutation, promiscuous “off-target” edits would sometimes be advantageous and therefore selected. However, many other organisms, such as humans, edit abundantly, producing multiple promiscuous edits. What is unique about coleoid cephalopods is that they appear not to exclude editing from protein-coding regions, leading to many thousands of recoding sites being recruited and conserved across distant species. Regardless of the primary motivation for editing, this unique phenomenon clearly has an enormous effect on the proteome.

Palavicini et al., 2009 Palavicini J.P.

O’Connell M.A.

Rosenthal J.J. An extra double-stranded RNA binding domain confers high activity to a squid RNA editing enzyme. Palavicini et al., 2012 Palavicini J.P.

Correa-Rojas R.A.

Rosenthal J.J.C. Extra double-stranded RNA binding domain (dsRBD) in a squid RNA editing enzyme confers resistance to high salt environment. Albertin et al., 2015 Albertin C.B.

Simakov O.

Mitros T.

Wang Z.Y.

Pungor J.R.

Edsinger-Gonzales E.

Brenner S.

Ragsdale C.W.

Rokhsar D.S. The octopus genome and the evolution of cephalopod neural and morphological novelties. Alon et al., 2015 Alon S.

Garrett S.C.

Levanon E.Y.

Olson S.

Graveley B.R.

Rosenthal J.J.C.

Eisenberg E. The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing. The extensive recoding activity in cephalopods might suggest that there are underlying mechanistic novelties in their editing process, compared with other organisms. For example, cephalopod ADARs may have evolved to increase their catalytic activity or decrease their specificity. Previous studies have shown that squid express a splice variant of ADAR2 with an extra dsRNA-binding domain, and this feature increases the variant’s affinity for dsRNA, leading to higher activity (). Although cephalopods do express ADAR1 orthologs (), no functional studies have been conducted on them, nor on any invertebrate ADAR1 for that matter. They too may possess unique activities. Finally, one might expect the introduction of thousands of editing sites to be accompanied by undesirable side effects. For example, messages that contain so many mutations might often translate into dysfunctional, or even toxic, proteins. To accommodate this burden, cephalopods may have evolved unique mechanisms for protein folding and quality control. These ideas require further study.

Garrett and Rosenthal, 2012 Garrett S.

Rosenthal J.J.C. RNA editing underlies temperature adaptation in K+ channels from polar octopuses. Rieder et al., 2015 Rieder L.E.

Savva Y.A.

Reyna M.A.

Chang Y.-J.

Dorsky J.S.

Rezaei A.

Reenan R.A. Dynamic response of RNA editing to temperature in Drosophila. Recoding in coleoid cephalopods is something of an enigma. Unlike the case for mammals, inter-species conservation and the higher-than-expected frequencies of non-synonymous changes suggest that a sizable fraction of events were recruited during the course of cephalopod evolution. Why would the coleoids choose to alter genetic information within RNA rather than hardwire the change in DNA? There are several potential advantages to making changes within RNA. First of all, the changes are transient. Thus an organism can choose to turn them on or off, providing phenotypic flexibility, a quality that is particularly useful for environmental acclimation (). In addition, RNA-level changes can better augment genetic diversity. With DNA, an organism is limited to two alleles. With RNA, all messages need not be edited, and thus the pool of mRNAs can include edited or unedited versions at given sites. When a message contains more than one site, complexity can increase exponentially. Future proteomic experiments will be necessary to determine whether the combinatorial complexity is realized in neural proteins, and whether editing contributes to neuron-specific diversity or the ability of the nervous system to respond to environmental cues. If the thousands of editing sites do indeed lead to independent functional outcomes, then the regulation of the editing process would be necessarily complex.

Young, 1971 Young J.Z. The Anatomy of the Nervous System of Octopus vulgaris. Hochner et al., 2003 Hochner B.

Brown E.R.

Langella M.

Shomrat T.

Fiorito G. A learning and memory area in the octopus brain manifests a vertebrate-like long-term potentiation. Shomrat et al., 2008 Shomrat T.

Zarrella I.

Fiorito G.

Hochner B. The octopus vertical lobe modulates short-term learning rate and uses LTP to acquire long-term memory. Shomrat et al., 2015 Shomrat T.

Turchetti-Maia A.L.

Stern-Mentch N.

Basil J.A.

Hochner B. The vertical lobe of cephalopods: an attractive brain structure for understanding the evolution of advanced learning and memory systems. Young, 1961 Young J.Z. Learning and discrimination in the octopus. Young, 1965 Young J.Z. The central nervous system of Nautilus. Among invertebrates, the nervous system of coleoids is uniquely large and complex. For example, with half a billion neurons, Octopus vulgaris has ∼5 times the number of a mouse (). Coleoids have brain lobes dedicated to learning and memory () and exhibit a range of complex and plastic behaviors. Nautiloid brains are simpler, containing fewer neurons, and lack specific lobes dedicated to learning and memory (). The association of massive recoding with the nervous system, and the fact that it is unique to the coleoids and not observed in nautilus, hint at its relationship with the exceptional behavioral sophistication of the coleiods. This idea is reinforced by the high density of editing in transcripts that encode proteins directly involved in excitability.