Enzymes responsible for RNA editing

Vertebrate genomes are known to contain at least three ADAR genes (ADAR1, ADAR2 and ADAR3) and two of these (ADAR1 and ADAR2) have been confirmed to be also present in invertebrate genomes, including a sea anemone (Nematostella vectensis), an oyster (Crassostrea gigas), a sea urchin (Strongylocentrotus purpuratus) and a tunicate (Ciona intestinalis)33. However, the genome of A. echinatior encodes only a single ADAR gene, but with two dsRNA-binding domains and one deaminase domain it appears more similar to ADAR2 than ADAR1 in other animals (Fig. 1a and Supplementary Fig. 1a). This finding is consistent with previous observations in other insects33 and supports the view that the common ancestor of the insects lost ADAR1. Gene structures and protein sequences of ADAR genes were highly conserved among 11 different ant species for which sequenced genomes were available, indicating that selection on this gene has been severely constrained (Fig. 1b and Supplementary Fig. 1b). We found that ADAR was widely expressed across the different castes of A. echinatior and two other ants (Camponotus floridanus and Harpegnathos saltator), and across different developmental stages of C. floridanus and H. saltator (Fig. 1c and Supplementary Fig. 2a,b), indicating that A-to-I RNA editing is likely to be operational throughout the life cycles of ants. We further observed that the expression levels of ADAR in small A. echinatior workers were higher than in gynes and large workers (Fig. 1c and Supplementary Fig. 2c,d).

Figure 1: ADAR in the leaf-cutting ant A. echinatior and other sequenced genomes. (a) Variation in the number of ADAR genes across animals, showing that the insect lineage, including A. echinatior, has lost ADAR1 but preserved ADAR2. (b) Domain organization and conservation scores of the ADAR protein sequences in ants, showing that ADAR2 is highly conserved during ant evolution. Conservation scores were calculated from the multiple sequence alignments of the ADAR protein sequences across 11 ant species with sequenced genomes using the Jensen–Shannon divergence method65: A. echinatior, Atta cephalotes, C. floridanus, Pogonomyrmex barbatus, Linepithema humile, H. saltator ( http://hymenopteragenome.org/ant_genomes/) and five other attine species (Atta colombica, Trachymyrmex septentrionalis, Trachymyrmex cornetzi, Trachymyrmex zeteki and Cyphomyrmex costatus), whose genome sequences were available but not yet published (Nygaard et al, in preparation). (c) Expression levels of ADAR denoted by real-time qPCR in the three female castes of A. echinatior. The housekeeping gene Ef1-beta was chosen as internal reference gene. Error bars are s.d. across the three replicates. Full size image

In addition to A-to-I editing, a much rarer type of cytosine-to-uracil (C-to-U) RNA editing has also been documented in mammals34 and fruit flies35, and our data also imply the existence of a small number of C-to-U editing sites in A. echinatior (see below). C-to-U RNA editing results from cytosine deamination catalysed by APOBECs (an apolipoprotein B mRNA-editing enzyme with polypeptide-like catalytic function) in mammals34, but we have not been able to identify orthologues of these genes in ants or other insects, suggesting that insects use other enzymes with cytidine deaminase activity to perform C-to-U RNA editing.

Caste-specific RNA editomes

To assess the RNA-editing patterns across different castes, we carried out strand-specific RNA-Seq36 on polyA+ RNA isolated from adult head tissues of the three female castes of A. echinatior: gynes, large workers and small workers. We sampled three sympatric colonies and sequenced them separately (Supplementary Table 1). To help filtering out possible DNA–RNA differences caused by heterozygous single-nucleotide polymorphisms (SNPs), we also resequenced the DNA isolated from the remaining tissues of exactly the same individuals as used for RNA-Seq (see Methods). On average, we obtained ca. 11 Gb RNA-Seq and ca. 12 Gb DNA-Seq data for each of the nine samples, representing a genome-wide coverage depth of ca. 37 × and 39 × , respectively. After mapping and rigorous filtering (see Methods), we obtained an average usable depth of ca. 11 × for RNA samples and ca. 24 × for DNA samples for further analysis (Supplementary Table 2).

To identify candidate RNA-editing sites, we designed a statistical framework to detect sites that were homozygous for genomic DNA (gDNA), but heterozygous for transcripts (see Methods). The orientation information provided by strand-specific RNA-Seq then allowed us to determine the RNA–DNA base differences independent of existing genomic annotations. We identified an average of 10,715 editing sites (range 8,146–13,823) per sample (Supplementary Table 3), with an average of 969 genes (range 812–1,125) being edited in one or more sites (Supplementary Table 4). As observed in humans17,18,19,20, mice16 and fruit flies14,15, most (up to 97%) of the editing sites represented A-to-I editing (Fig. 2a and Supplementary Table 3), indicating that non-A-to-I RNA editing is rare in A. echinatior.

Figure 2: RNA-editing sites identified in samples of gynes, large workers and small workers of A. echinatior. (a) Distribution of the mean number of editing sites across the 12 possible types of RNA editing in each caste-specific sample. (b) Frequency distribution of editing levels for all editing sites with RNA coverage ≥20 × after pooling the editing sites from all nine samples (gynes, large workers and small workers from three colonies; arrow is the median editing level of 12.6%). (c) Percentages of A-to-I editing sites across different genomic elements. Black asterisks indicate the background expectations, which were calculated as the number of transcribed ‘A’s (that is, ‘A’s with RNA-Seq read coverage ≥1 × ) in each genomic element divided by the total number of transcribed ‘A’s in the genome. The ‘Gene’ category is the sum of untranslated regions (5′-UTR and 3′-UTR), CDSs and introns, while the ‘Intergenic’ category represents all genomic regions that are neither genes nor TEs. (d) Percentages of A-to-I editing sites occurring in clusters (≥3 sites within a 50-bp window) across different genomic elements. The same number of transcribed ‘A’s as the identified A-to-I editing sites in each genomic element was randomly selected and used for calculating background expectations (indicated by black asterisks representing values close to zero). (e) RNA editing of the gene Mdh1 in large workers from colony 322, involving 84 A-to-I editing sites (blue points) in the 3′-UTR of this gene where the editing sites always occur in clusters. (f) Venn diagram of edited genes showing the genic positions of their editing sites. The numbers presented were counted in large workers from colony 322, but the other samples showed similar patterns. Error bars in a,c and d are s.d. across the three colonies. Full size image

The median editing level was rather low (12.6%) and only ca. 8% of the editing sites had editing levels (the ratio of RNA-Seq reads with editing versus total RNA reads) over 50% (Fig. 2b, also see Supplementary Fig. 3 for separating A-to-I and non-A-to-I editing sites). Consistent with the known properties of ADAR enzymes22, A-to-I editing sites were concentrated in potentially dsRNA regions, relative to random controls in A. echinatior (Supplementary Fig. 3c and Supplementary Methods). Similar to observations in humans18,19 and fruit flies15, the A. echinatior bases in upstream and downstream positions adjacent to A-to-I editing sites (−1 and +1 positions) had a deficiency and excess of guanines, respectively (Supplementary Fig. 3e and Supplementary Methods). In addition, A-to-I editing sites clearly occurred in clusters (Fig. 2d,e, Supplementary Table 5 and Supplementary Methods) with around half located within 20-bp of neighbouring editing sites (Supplementary Fig. 3d).

To further confirm the accuracy of our methods and data sets, we randomly selected 116 editing sites, including 106 A-to-I editing sites and 10 non-A-to-I editing sites, for experimental validation using PCR amplification, TA cloning and Sanger sequencing (see Methods). Overall, 110 of these 116 sites (95%) were confirmed, but validation percentages differed between A-to-I editing sites (99%; 105/106) and non-A-to-I editing sites (50%; 5/10) (Supplementary Data 1). As non-A-to-I editing is rare and apparently more difficult to validate, we decided to use only the A-to-I editing sites in further analyses. Our PCR and cloning validations further showed that the editing levels calculated by our initial pipeline were consistent with the levels obtained by TA-clonal sequencing (Pearson’s r=0.76 and P<10−15 for all 116 editing sites; Pearson’s r=0.78 and P<10−15 for the 106 A-to-I sites; Supplementary Fig. 4).

On average, ca. 42% of the A-to-I editing sites targeted genic regions, particularly the 3′-untranslated regions (3′-UTRs) (obs./exp.=21%/7%), whereas protein-coding regions (coding sequences (CDSs)) tended to be depleted of RNA-editing sites (obs./exp.=1%/19%) (Fig. 2c). Similar depletion of A-to-I editing sites in CDSs has been documented in humans17, mice16 and C. elegans13, but fruit flies have substantial enrichment of A-to-I editing sites in their CDSs (28% of total, data from ref. 15). Our results for a fungus-growing ant thus imply that enrichment of CDS RNA editing in Drosophila may be the exception rather than the rule for insects. However, 77% of the CDS-editing sites in A. echinatior had the potential to change protein sequences, mainly by recoding amino acids (ca. 99%), but in a few cases by removing stop codons (Supplementary Table 6). In addition, CDS-editing sites were less likely to be clustered compared with editing sites in other genomic elements (Fig. 2d), which is unlikely to be simply due to the low number of CDS-editing sites, as 5′-UTRs harbour similarly low numbers (Fig. 2c). We also found that simultaneous editing of genes in UTRs, CDSs and introns was rare, and that only a few genes were in fact edited in more than one of these genic regions (Fig. 2f).

Up to 32% of A-to-I editing sites occurred in transposable elements (TEs), a figure that was four times higher than the background expectation of 8% (Fig. 2c). The fact that TE transcripts are another major target of RNA editing in A. echinatior is interesting, because only one case of RNA editing in TE transcripts has been reported in insects: A-to-I, U-to-C and C-to-U editing in read-through transcripts of the KP elements of Drosophila melanogaster35. In humans, A-to-I editing is pervasive in Alu repeats19, a primate-specific short interspersed element that represents about 10% of the entire human genome37. Our data thus provide the first evidence for substantial RNA editing in insect TEs, and may imply that the activity of a significant fraction of A. echinatior TEs is regulated by RNA editing.

Functional editing sites shared across colonies

Comparing the A-to-I editing sites among samples of the same caste from different colonies, we found that ca. 60% (range 43–76%) of the caste-specific editing sites were shared by the three sampled colonies (Fig. 3a–c). This indicates a marked disparity of RNA-editing events among sympatric colonies of the same species, consistent with the limited reproducibility of editing events among different human individuals17,20. This variability may be due to colony-specific habitat conditions, colony size, colony age or even random expression noise. However, as our focus was on the editing sites most likely to contribute to caste-specific functions, we continued our analyses using only those sites where editing was confirmed in samples of a caste across all three colonies. This allowed us to identify 4,682, 5,049 and 9,120 caste-specific A-to-I editing sites for gynes, large workers and small workers, respectively (Fig. 3a–c), representing a total of 10,282 sites that were consistently edited in at least one caste. The high number of small worker-specific editing sites could not be explained by variation in sequencing depth among samples (Supplementary Fig. 5), but corresponded with the higher expression level of ADAR in small workers (Fig. 1c).

Figure 3: Functional editing sites in the leaf-cutting ant A. echinatior. (a–c) Venn diagrams of A-to-I editing sites across the three different colonies Ae322, Ae356 and Ae363. (d) Over-represented Gene Ontology (GO) terms for the 835 genes that were consistently edited in at least one caste, showing the percentages of edited genes relative to all transcribed genes annotated to corresponding GO terms. The first number after each bar indicates the number of edited genes annotated to this GO term, and the second number indicates the number of all transcribed genes (RPKM >1 in at least one of the nine sequenced samples) annotated to this GO term. Full size image

We identified 533, 538 and 779 genes that harboured at least one consistently edited site in gynes, large workers and small workers, respectively, corresponding to a total of 835 genes (Supplementary Data 2). The most highly edited gene was Aech_13462, with >70 consistently edited sites in its 3′-UTR in all three castes (Fig. 2e and Supplementary Data 2). This gene is an orthologue of Mdh1 (malate dehydrogenase 1), which has a conserved role in aerobic energy production38. The high expression level of this gene in the heads of Acromyrmex leaf-cutting ants (mean RPKM (reads per kilobase of transcript per million reads mapped) across 9 samples: 264; Supplementary Methods) probably implies that the aerobic energy requirements for brain activity are extremely high, and extensive RNA editing of its 3′-UTR may reflect a continuous need for fine-tuning gene expression related to energy metabolism. Another gene with a considerable number of consistently edited sites was Aech_00116, an orthologue of Hsc70-4 (heat shock 70-kDa protein cognate 4) in Drosophila, which is involved in nerve-evoked neurotransmitter exocytosis, a process critical for signal transmission across synapses39. It had >50 consistently edited sites in its 3′-UTR in all 3 castes of A. echinatior (Supplementary Data 2), which may offer a special mechanism for post-transcriptional regulation of this gene.

Gene Ontology (GO) enrichment analysis for the 835 genes, which were consistently edited in at least 1 caste (using all expressed genes as background), showed that 5 functional categories were over-represented among the RNA-edited genes, including neurotransmission, circadian rhythm, response to temperature stimulus, RNA splicing and carboxylic acid biosynthesis (Fig. 3d; see Methods). Functions in neurotransmission, including GO terms of synaptic growth at neuromuscular junctions, synaptic transmission and cell signalling, are consistent with the known functions of RNA editing in animal nervous systems10. For example, we found RNA editing in a series of genes encoding voltage-gated ion channel proteins, such as Shab (K+ channel), Shaw (K+ channel), Hk (K+ channel), NaCP60E (Na+ channel) and Ca-alpha1T (Ca+ channel), as well as some neurotransmitter secretion-related proteins, such as Snap25 (Synapse protein 25), n-syb (n-synaptobrevin), Sytbeta (Synaptotagmin beta) and Syt12 (Synaptotagmin 12) (Supplementary Data 2). Circadian rhythms have rarely been reported to be affected by RNA editing, although the previous discovery of RNA editing of the qvr and Sh transcripts in Drosophila is consistent with this possibility40. Yet, in A. echinatior we observed editing of many circadian rhythm genes: including qvr (quiver)41, Shaw (Shaker cognate w)42, lark43, nocte (no circadian temperature entrainment)44, Atf-2 (Activating transcription factor-2)45, cyc (cycle)46, ctrip (circadian trip)47, CkIIα (casein kinase IIα)48 and Rdl (Resistant to dieldrin)49 (Supplementary Data 2). Extensive RNA editing of circadian rhythm genes suggest that ants may possess a novel mechanism for adjusting circadian sleep/wake cycles with very short response times.

We only identified 27, 38 and 53 CDS-editing sites with amino acid recoding potential that were consistently edited in gynes, large workers and small workers, repectively, representing a total of 57 sites targeting 34 genes (Supplementary Data 3). Thirty-three per cent (19/57) of these recoding sites were located within or near (within 20 amino acids) known protein domains (Supplementary Data 3). The majority of these recoded genes were implicated in neural system functions such as synaptic transmission (for example, Caps and qvr), neurogenesis (for example, ctrip and Ars2), neuromuscular junction development (for example, TBPH and wah) and circadian rhythm (for example, qvr and ctrip). There were also some genes involved in mitotic processes (for example, RpL7 and Mtor), RNA splicing (for example, CG7564 and CG2926) and organism development (for example, rhea and scra) (Supplementary Data 3).

Conservation of RNA editing in ants

RNA editing may be conserved at two levels: the conservation of genomic sites corresponding to RNA-editing sites (DNA level) and the conservation of the RNA-editing events themselves (RNA level). We recently sequenced and assembled the genomes of another five attine fungus-growing ant species (Nygaard et al., in preparation) which, together with the published genomes of A. echinatior32 and Atta cephalotes50, span the most recent ca. 25 MY of the attine ant lineage that originated ca. 50 MYA30. Combining these data with the published genomes and transcriptomes of two non-fungus-growing ants, C. floridanus (Florida carpenter ant) and H. saltator (Jerdon’s jumping ant) that diverged from A. echinatior ca. 80 and 100 MYA8,51,52, we had the opportunity to estimate the degree of conservation of RNA editing in ants over long evolutionary time scales.

Whole-genome alignment analysis (see Methods) showed that 67% of all transcribed ‘A’s (that is, ‘A’s covered by more than a single RNA read in at least one caste) in A. echinatior were highly conserved (that is, displaying the same DNA base) among the 7 attine ant species, while 57% and 46% were conserved when comparing A. echinatior with C. floridanus and H. saltator, respectively. However, when we assessed the editing sites that appeared in at least one caste across all three colonies, we found that only 23% of these RNA-editing sites were conserved across the seven attine genomes and about 11% and 8% between A. echinatior and C. floridanus or H. saltator, respectively. The latter conservation percentages were all significantly lower than the former genome background percentages (Monte Carlo test P<10−4). Similar patterns were observed when we extended these comparisons to the separate genomic elements (Fig. 4a, Supplementary Fig. 6 and Supplementary Table 7).

Figure 4: Conservation of RNA editing among ant species. (a) Percentage of conserved genomic sites that display the same DNA bases among seven attine ant species. The ‘Gene’ category is the sum of 5′-UTRs, 3′-UTRs, CDSs and introns, and the ‘Intergenic’ category represents all genomic regions that are neither genes nor TEs. The red bars (genome background) were calculated from all transcribed ‘A’s (that is, ‘A’s covered by ≥1 RNA reads) in each category, and the blue bars from genomic sites corresponding to A-to-I RNA editing sites in each category. (b) The number of conserved editing events (sites displaying the same RNA base substitutions among species) between A. echinatior and C. floridanus and between A. echinatior and H. saltator. (c) Correlation of RNA-editing levels for the conserved editing events between heads of small workers of A. echinatior and brains of minor workers of C. floridanus. Only sites with RNA read coverage ≥10 × in both samples were used. Editing levels of A. echinatior small workers were calculated from the combined data of the three small worker samples. The blue line is a linear regression that is virtually identical to the diagonal. (d) Editing of the qvr gene in A. echinatior. The upper part shows the qvr gene model and the two editing sites, as well as their protein recoding potential. The red arrow in the sixth exon indicates the positions of the two neighbouring editing sites. The lower part shows the editing levels of the two sites in the nine head samples (G, L, S stand for gynes, large workers and small workers, respectively, and are followed by colony number). (e,f) Editing levels of the corresponding sites of qvr in C. floridanus and H. saltator. Full size image

We also estimated the conservation status of the genomic sites with RNA editing by using phastCon scores (base-by-base conservation scores ranging from 0 to 1) derived from the multiple genome alignments for the seven attine ant species (see Methods), and observed that the mean phastCon scores of genomic sites corresponding to RNA-editing sites were significantly lower than those of random genomic sites (Supplementary Table 8). Taken together, these results suggest that genomic sites with RNA editing are fast evolving, and imply that most RNA-editing sites in A. echinatior are species-specific and thus likely to have emerged recently.

To further investigate why genomic sites corresponding to RNA-editing sites appear to evolve relatively fast, we checked the evolutionary rate of the 835 genes, which contained at least 1 caste-specific editing site that consistently appeared in 3 colonies by calculating the non-synonymous (dn) and synonymous (ds) mutation rate among the 7 attine ant species (Supplementary Methods). Except for the small proportion of 5′-UTR-edited genes, we generally found that the coding regions of RNA-edited genes evolved at equal or lower rates than the background genes (Supplementary Fig. 7). This suggests that the fast evolutionary rate of RNA-editing sites cannot be explained by hitchhiking effects connected to neighbouring genes. It thus seems more probably that the fast evolution of RNA-editing sites may relate to gene expression responses to changes in the specific social environments of different fungus-growing ant species.

We next examined whether RNA editing of the conserved A. echinatior genomic sites also occurred in the transcriptomes of other ant species by comparing the RNA editomes of A. echinatior with previously published transcriptome data from C. floridanus and H. saltator (Supplementary Table 9)8,9,51. We identified 113 and 84 A. echinatior sites that were also edited and displayed the same RNA base substitutions in at least one sample of C. floridanus and H. saltator, respectively. We identified 45 sites that displayed the same RNA base substitutions in all three ant species (Fig. 4b). The number of conserved RNA-editing events that this analysis identified is likely to be an underestimate because the transcriptome sequencing depth in C. floridanus and H. saltator was lower than in A. echinatior, and samples were obtained in different ways8,9,51. Yet, 33 of the 45 sites (73%) with conserved editing events were targeting 20 genes of A. echinatior and 18 of these sites were edited in protein coding regions with amino acid recoding potential (Supplementary Data 4).

Functional analysis based on orthologues in Drosophila revealed that 8 of the 20 A. echinatior genes with conserved editing (40%) were involved in neural activity or circadian rhythm. For example, qvr (also called sleepless), a gene participating in the regulation of circadian sleep/awake cycles in Drosophila41, harbours two A-to-I editing sites in the last exon in all three castes of A. echinatior and in most samples of C. floridanus and H. saltator, causing S101G and H106R amino acid recoding in the protein sequence (Fig. 4d–f). Multiple sequence alignments indicated that genomic sites corresponding to these two editing sites are highly conserved in many insect species (Supplementary Fig. 8). Surprisingly, the qvr A-to-I RNA editing that we discovered occurs on exactly the same two sites in Drosophila40, indicating that both editing events have been conserved for >300 MY of insect evolution. Other examples include the following: Rdl and lark, two genes involved in circadian rhythm43,49; TBPH (also called TDP-43), a gene associated with neuromuscular junction development53; Caps (calcium-activated protein for secretion), a gene involved in synaptic transmission54; and tipE (temperature-induced paralytic E), a gene encoding voltage-gated sodium channel auxiliary subunits (Supplementary Data 4). These results once more confirm the importance of RNA editing in the regulation of nervous system function in A. echinatior.

Finally, the conserved editing events between head samples of A. echinatior and brain samples of C. floridanus allowed us to estimate the correlation of editing levels in similar tissues between the two ant species. We found that editing levels in A. echinatior were positively correlated with editing levels in C. floridanus at the same sites (Fig. 4c,e,f), implying that not only the editing events but also the degrees of editing have probably been under similar selection for many of these sites.

RNA editing and behavioural caste differentiation

In general, we observed that genome-wide editing patterns were quite similar across samples of the same caste from different colonies, with hierarchical clustering analysis separating samples by caste (Fig. 5a), which indicates that RNA-editing profiles differ more across castes than across colonies. Moreover, consistent with the ADAR expression pattern across the three castes (Fig. 1c), RNA-editing patterns of large workers were closer to those of gynes than small workers (Fig. 5a).

Figure 5: Caste-specific RNA editing in A. echinatior. (a) Heatmap and hierarchical clustering based on editing levels of all the editing sites identified in the nine samples (G, L, S stand for gynes, large workers and small workers, respectively, and are followed by colony number). Sites for which editing was confirmed in at least one of the nine samples and that had RNA read depth ≥20 × in all the nine samples were used (n=2,944). (b) Percentages of the 137 consistently differentially edited sites across different genomic elements. The ‘Gene’ category is the sum of 5′-UTRs, 3′-UTRs, CDSs and introns, and the ‘Intergenic’ category represents all genomic regions that are neither genes nor TEs. (c) RNA editing of wah (Aech 02313), with the upper part showing the wah gene model, the two neighbouring editing sites (indicated by the red arrow below the 9th exon) and their recoding potential. The two black vertical arrows above the gene model indicate the exon regions that code for the PEHE domain. The histograms show the editing levels of the editing sites in the nine head samples. P-values of Fisher’s exact tests between any two of the three castes within the same colony are given above the bars (*P<0.05, **P<0.01, ***P<0.001. P-values between 0.05 and 0.1 are given in precise figures). Full size image

We used Fisher’s exact tests to identify sites that displayed significantly different levels of RNA editing between any two of the three female castes within each colony (see Methods) and detected an average of 276, 599 and 338 significantly differentially edited sites between gynes/large workers, gynes/small workers and large/small workers, respectively (Supplementary Table 10). To focus on sites where editing changes are likely to be persistently associated with caste status rather than colony age, colony size or habitat, we restricted our further analyses to target differentially edited sites that were shared across the three sampled colonies (see Methods) and identified 10, 124 and 15 sites with consistently different editing levels across the three colonies for gyne/large worker, gyne/small worker and large/small worker comparisons, respectively, which involved a total of 137 sites targeting 48 genes.

Of these 137 consistently caste-differentially edited sites, 43, 13 and 13 were conserved for their corresponding genomic sites among the 7 attine ants, between A. echinatior and C. floridanus, and between A. echinatior and H. saltator, respectively. This encompassed a total of 55 sites, representing 40% of the 137 differentially edited sites (Supplementary Data 5). Three of these sites were located in protein coding regions, with two of them targeting the ninth exon of the gene Aech_02313, which had significantly higher editing levels across colonies in small workers compared with gynes, and resulted in Q966R and Q968R amino acid recoding in the protein sequences (Fig. 5c). This gene is an orthologue of the wah gene (also known as NSL1, CG4699) in D. melanogaster and has been proposed to participate in neuromuscular junction development55. We found that editing of these two sites was always linked (Supplementary Table 11), and domain annotation revealed that they were close to the PEHE domain (histone acetyltransferase binding activity) of the wah gene (Fig. 5c), implying their potential to affect protein activity. Both of these two recoding sites were highly conserved in ants (Supplementary Fig. 9a,b) and Q968R was also edited in some samples of C. floridanus and H. saltator (Supplementary Fig. 9c). Another site in which small workers showed significantly higher editing levels than gynes across colonies, was targeting the fifth exon of Aech_08485: a gene encoding a Zinc finger protein (Supplementary Fig. 10). This editing site can cause K432R recoding in the protein sequence downstream of the C2H2 zinc finger domain and is also conserved in all other ant species and edited in some samples of C. floridanus (Supplementary Fig. 11), but no function for this gene is known.

Most of the consistently differentially edited sites were targeting the non-coding regions of genes, mainly in 3′-UTRs (Fig. 5b), some of which are known to function in neural systems (Supplementary Data 5). For example, Tap42, pUf68 and Droj2 are known to participate in neurogenesis in Drosophila56, and all of them harboured consistently differentially edited sites in their 3′-UTRs that displayed higher editing levels in worker castes than in gynes (Supplementary Fig. 12 and Supplementary Data 6). Another gene associated with neuromuscular junction development, TBPH (also called TDP-43)53, also harboured a consistently differentially edited site in its 3′-UTR with higher editing levels in small workers than in gynes (Supplementary Fig. 12 and Supplementary Data 6). The neuromuscular junction connects the nervous system to the muscular system via synapses between nerves and muscle fibres, and is critical in locomotory behaviour. Differential editing of TBPH and wah may thus imply that RNA editing is involved in modulating caste-specific adult locomotory behaviour. RNA editing in the 3′-UTR of genes cannot change protein sequences, but may affect the stability of mRNA or change the targets of miRNAs25 to regulate the final protein levels. Moreover, it has been proposed that A-to-I RNA editing may cross-talk with RNA-interference pathways, which involve dsRNAs; thus, RNA editing may play a role in gene silencing (reviewed in ref. 57). The differential patterns of RNA editing among A. echinatior castes thus appear to uncover a suite of novel potential mechanisms for differentiating caste phenotypes in eusocial insects.