APC11 cDNA in GenBank is mis-annotated

APC11 (At3g05870) is a single-copy gene in the genome of Arabidopsis thaliana21. Current annotation predicts that APC11 has three exons and two introns and its coding sequence (CDS) contains 261 nucleotides, producing a polypeptide with 87 amino acids (AAs)21. However, sequencing of APC11 cDNA performed in this study has identified only one CDS with 252 nucleotides (highlighted in red; Fig. 1), encoding a polypeptide with 84 AAs. The discrepancy was partially caused by the inclusion of 10 nucleotides from the first intron to the exon in previous annotation (highlighted in blue; Fig. 1).

Figure 1 The genomic sequence of APC11. The coding region of APC11, in which putative exons are highlighted in red and capital, introns are denoted in black and lower case and the putative branch point “a” is highlighted in purple. A333 is the putative single-nucleotide exon. Conserved intron-exon splicing sequences “gt” and “ag” are underlined and in lower case. Start and stop codons are underlined and in capital. The mis-annotated exonic sequence in GenBank is highlighted in blue. Full size image

Further, alignment of the cDNA obtained with the APC11 genomic sequence revealed a single-nucleotide A inserted into the cDNA. The mysterious A is not in continuity with the CDS in the genomic region. The insertion is absolutely required for in-frame APC11 translation. Re-sequencing of the APC11 genomic DNA extracted from both Col-0 and Ler ecotypes confirmed that the genomic sequence available in the GenBank of National Center for Biotechnology Information (NCBI) is correct, while its cDNA annotated is wrong. We therefore speculate that the extra A may originate from a single-nucleotide exon located in the intron between the previously annotated first and second exons. Within the assigned 422-nucleotide intronic sequence we identified a putative A (designated as A333 in Fig. 1), surrounded by GT and AG, located 333 nucleotides after the upstream exon-intron junction. A putative branch point A was detected 44 nucleotides upstream of the A333 (highlighted in purple; Fig. 1).

A333 is a functional single-nucleotide exon

To test whether A333 indeed represents a single-nucleotide exon, six constructs with nucleus-localized APC11-SV40-GFP fusion proteins expressed under the control of the cauliflower mosaic virus (CaMV) 35S promoter were made: 1) gAPC11-nGFP: the 839-nucleotide APC11 genomic sequence, with its stop codon deleted, in-frame fused with a SV40-GFP reporter gene; 2) cAPC11-nGFP: a 252-nucleotide APC11 cDNA, with its stop codon deleted, in-frame fused with the same SV40-GFP; 3) gAPC11(A > T)-nGFP: the same as gAPC11-nGFP except the A333 was substituted by a T, which is expected to produce a cDNA with T333 if the A333 is indeed a single-nucleotide exon; 4) gAPC11(A > G)-nGFP: the A333 in gAPC11-nGFP was substituted by a G to determine whether nucleotide types affect the splicing; 5) gAPC11(A > TT)-nGFP: the A333 in gAPC11-nGFP was substituted by TT, which shall cause a TT substitution in the APC11 cDNA and a frame shift in APC11 translation, leading to disappearance of GFP fluorescence; and 6) gAPC11(-A)-nGFP: A333 in gAPC11-nGFP was deleted, which shall produce a cDNA without A333, leading to a frame-shift in APC11 translation and disappearance in GFP fluorescence (Fig. 2a). These constructs were introduced into A. thaliana mesophyll protoplasts individually using a polyethylene glycol (PEG)-mediated transfection22 for in vivo transcriptional and translational assays.

Figure 2 In vivo transcriptional and translational assays in Arabidopsis and rice protoplasts. (a) Constructs generated for transient assays. The CaMV 35S promoter was used to drive the expression of APC11 cDNA, genomic or different substitution constructs fused with a nucleus-localized SV40-GFP reporter gene. Boxes in orange, cyan and grey indicate three previously identified exons in APC11. The black lines indicate introns and the A333 is shown as red vertical bars. (b) Alignment of APC11 cDNA produced in transgenic Arabidopsis or rice protoplasts. Identical nucleotides are shaded. (c) Examinations of GFP fluorescence in Arabidopsis protoplasts transfected with constructs illustrated in a. Note that GFP signals are detected only in protoplasts transfected with nGFP, cAPC11-nGFP, gAPC11-nGFP, gAPC11(A > T)-nGFP or gAPC11(A > G)-nGFP. Scale bar = 10 μm for all photos in c. Full size image

cDNAs were prepared from RNAs extracted from protoplasts transfected with different fusion constructs to examine their splicing patterns. Afterwards, APC11-nGFP cDNAs were amplified from individual cDNAs by polymerase chain reaction (PCR) using a forward APC11 primer and a reverse GFP primer (Supplementary Table S1) and sequenced. Results obtained showed that, when either cAPC11-nGFP or gAPC11-nGFP was used, a sequence identical to APC11 cDNA was produced. Interestingly, substitutions of A333 by T [gAPC11(A > T)-nGFP], G [gAPC11(A > G)-nGFP] or TT [gAPC11(A > TT)-nGFP] led to T, G or TT substitutions in the cDNA, respectively (Fig. 2b). Further, deletion of A333 made in gAPC11(-A)-nGFP led to production of a cDNA without the A.

Detections of GFP fluorescence were used to define the translation of different fusion constructs. When examined under a confocal microscope after twelve-hour incubations, nucleus-localized GFP fluorescence was observed in protoplasts transfected with either cAPC11-nGFP, gAPC11-nGFP, gAPC11(A > T)-nGFP or gAPC11(A > G)-nGFP, suggesting that in-frame GFP translations were achieved in protoplasts transfected with these constructs. In contrast, no GFP fluorescence was detected when either gAPC11(A > TT)-nGFP or gAPC11(-A)-nGFP was used (Fig. 2c), indicating that the substitution of the A333 by TT or deletion of the A333 impaired the translation of these fusion constructs. These results confirmed that A333 in the APC11 is a functional single-nucleotide exon.

Splicing of the single-nucleotide exon is mostly conserved in rice

We then addressed whether the processing capability of the single-nucleotide exon is conserved in rice (Oryza sativa, var. Zhonghua 11), a remotely related monocotyledonous species. APC11 in rice has two paralogs, OsAPC11-1 (Os03g0302700) and OsAPC11-2 (Os07g0411101), both of them lack an intron. Protoplasts prepared from 14-day-old etiolated rice seedlings were used to perform in vitro transcriptional assay using above-mentioned six constructs (Fig. 2a). Sequencing of APC11-nGFP cDNAs amplified from rice protoplasts showed that, when either cAPC11-nGFP or gAPC11-nGFP was used in transfections, the intact APC11 cDNA produced from the same splicing patterns as those in Arabidopsis protoplasts were detected (Fig. 2b). Similarly, T or G substitutions were detected in cDNA isolated from protoplasts transfected with gAPC11(A > T)-nGFP or gAPC11(A > G)-nGFP, respectively (Fig. 2b). A cDNA without A333 and consequently a frame-shift, was detected in protoplasts transfected with gAPC11(-A)-nGFP. These results suggest that protoplasts of rice can splice the single-nucleotide exon accurately and effectively as those from Arabidopsis. However, it is interesting to note that, when gAPC11(A > TT)-nGFP was used, the splicing was incorrect. Additional 56 nucleotides from the first intron were incorporated into the cDNA, leading to a frame-shift in the translation of gAPC11(A > TT)-nGFP, suggesting that the substitution of A333 by TT has caused an altered splicing pattern in rice, which was not observed in Arabidopsis.