Soft-bodied cephalopods such as the octopus (Fig. 1a) show remarkable morphological departures from the basic molluscan body plan, including dexterous arms lined with hundreds of suckers that function as specialized tactile and chemosensory organs, and an elaborate chromatophore system under direct neural control that enables rapid changes in appearance1,8. The octopus nervous system is vastly modified in size and organization relative to other molluscs, comprising a circumesophageal brain, paired optic lobes and axial nerve cords in each arm2,3. Together these structures contain nearly half a billion neurons, more than six times the number in a mouse brain2,9. Extant coleoid cephalopods show extraordinarily sophisticated behaviours including complex problem solving, task-dependent conditional discrimination, observational learning and spectacular displays of camouflage1,10 (Supplementary Videos 1 and 2).

Figure 1: Octopus anatomy and gene family representation analysis. a, Schematic of Octopus bimaculoides anatomy, highlighting the tissues sampled for transcriptome analysis: viscera (heart, kidney and hepatopancreas), yellow; gonads (ova or testes), peach; retina, orange; optic lobe (OL), maroon; supraesophageal brain (Supra), bright pink; subesophageal brain (Sub), light pink; posterior salivary gland (PSG), purple; axial nerve cord (ANC), red; suckers, grey; skin, mottled brown; stage 15 (St15) embryo, aquamarine. Skin sampled for transcriptome analysis included the eyespot, shown in light blue. b, C2H2 and protocadherin domain-containing gene families are expanded in octopus. Enriched Pfam domains were identified in lophotrochozoans (green) and molluscs (yellow), including O. bimaculoides (light blue). For a domain to be labelled as expanded in a group, at least 50% of its associated gene families need a corrected P value of 0.01 against the outgroup average. Some Pfams (for example, Cadherin and Cadherin_2) may occur in the same gene, however multiple domains in a given gene were counted only once. Bfl, Branchiostoma floridae; Cel, Caenorhabditis elegans; Cgi, Crassostrea gigas; Cte, Capitella teleta; Dme, Drosophila melanogaster; Dre, Danio rerio; Gga, Gallus gallus; Hsa, Homo sapiens; Hro, Helobdella robusta; Lch, Latimeria chalumnae; Lgi, Lottia gigantea; Mmu, Mus musculus; Obi, O. bimaculoides; Pfu, Pinctada fucata; Xtr, Xenopus tropicalis. PowerPoint slide Full size image

To explore the genetic features of these highly specialized animals, we sequenced the Octopus bimaculoides genome by a whole-genome shotgun approach (Supplementary Note 1) and annotated it using extensive transcriptome sequence from 12 tissues (Methods and Supplementary Note 2). The genome assembly captures more than 97% of expressed protein-coding genes and 83% of the estimated 2.7 gigabase (Gb) genome size (Methods and Supplementary Notes 1, 2, 3). The unassembled fraction is dominated by high-copy repetitive sequences (Supplementary Note 1). Nearly 45% of the assembled genome is composed of repetitive elements, with two bursts of transposon activity occurring ∼25-million and ∼56-million years ago (Mya) (Supplementary Note 4).

We predicted 33,638 protein-coding genes (Methods and Supplementary Note 4) and found alternate splicing at 2,819 loci, but no locus showed an unusually high number of splice variants (Supplementary Note 4). A-to-G discrepancies between the assembled genome and transcriptome sequences provided evidence for extensive mRNA editing by adenosine deaminases acting on RNA (ADARs). Many candidate edits are enriched in neural tissues7 and are found in a range of gene families, including ‘housekeeping’ genes such as the tubulins, which suggests that RNA edits are more widespread than previously appreciated (Extended Data Fig. 1 and Supplementary Note 5).

Based primarily on chromosome number, several researchers proposed that whole-genome duplications were important in the evolution of the cephalopod body plan4,5,6, paralleling the role ascribed to the independent whole-genome duplication events that occurred early in vertebrate evolution11. Although this is an attractive framework for both gene family expansion and increased regulatory complexity across multiple genes, we found no evidence for it. The gene family expansions present in octopus are predominantly organized in clusters along the genome, rather than distributed in doubly conserved synteny as expected for a paleopolyploid12,13 (Supplementary Note 6.2). Although genes that regulate development are often retained in multiple copies after paleopolyploidy in other lineages, they are not generally expanded in octopus relative to limpet, oyster and other invertebrate bilaterians11,14 (Table 1 and Supplementary Notes 7.4 and 8).

Table 1 Metazoan developmental control genes Full size table

Hox genes are commonly retained in multiple copies following whole-genome duplication15. In O. bimaculoides, however, we found only a single Hox complement, consistent with the single set of Hox transcripts identified in the bobtail squid Euprymna scolopes with PCR16. Remarkably, octopus Hox genes are not organized into clusters as in most other bilaterian genomes15, but are completely atomized (Extended Data Fig. 2 and Supplementary Note 9). Although we cannot rule out whole-genome duplication followed by considerable gene loss, the extent of loss needed to support this claim would far exceed that which has been observed in other paleopolyploid lineages, and it is more plausible that chromosome number in coleoids increased by chromosome fragmentation.

Mechanisms other than whole-genome duplications can drive genomic novelty, including expansion of existing gene families, evolution of novel genes, modification of gene regulatory networks, and reorganization of the genome through transposon activity. Within the O. bimaculoides genome, we found evidence for all of these mechanisms, including expansions in several gene families, a suite of octopus- and cephalopod-specific genes, and extensive genome shuffling.

In gene family content, domain architecture and exon–intron structure, the octopus genome broadly resembles that of the limpet Lottia gigantea17, the polychaete annelid Capitella teleta17 and the cephalochordate Branchiostoma floridae14 (Supplementary Note 7 and Extended Data Fig. 3). Relative to these invertebrate bilaterians, we found a fairly standard set of developmentally important transcription factors and signalling pathway genes, suggesting that the evolution of the cephalopod body plan did not require extreme expansions of these ‘toolkit’ genes (Table 1 and Supplementary Note 8.2). However, statistical analysis of protein domain distributions across animal genomes did identify several notable gene family expansions in octopus, including protocadherins, C2H2 zinc-finger proteins (C2H2 ZNFs), interleukin-17-like genes (IL17-like), G-protein-coupled receptors (GPCRs), chitinases and sialins (Figs 1b, 2 and 3; Extended Data Figs 4, 5, 6 and Supplementary Notes 8 and 10).

Figure 2: Protocadherin expansion in octopus. a, For a larger version of panel a, see Extended Data Fig. 11. Phylogenetic tree of cadherin genes in Hsa (red), Dme (orange), Nematostella vectensis (mustard yellow), Amphimedon queenslandica (yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus kowalevskii (purple). I, Type I classical cadherins; II, calsyntenins; III, octopus protocadherin expansion (168 genes); IV, human protocadherin expansion (58 genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical cadherins. Asterisk denotes a novel cadherin with over 80 extracellular cadherin domains found in Obi and Cte. b, Scaffold 30672 and Scaffold 9600 contain the two largest clusters of protocadherins, with 31 and 17, respectively. Clustered protocadherins vary greatly in genomic span and are oriented in a head-to-tail manner along each scaffold. c, Expression profiles of 161 protocadherins and 19 cadherins in 12 octopus tissues; 7 protocadherins were not detected in the tissues sampled. Cells are coloured according to number of standard deviations from the mean expression level. Protocadherins have high expression in neural tissues. Cadherins generally show a similar expression pattern, with the exception of a group of sucker-specific cadherins. PowerPoint slide Full size image

Figure 3: C2H2 ZNF expansion in octopus. a, Genomic organization of the largest C2H2 cluster. Scaffold 19852 contains 58 C2H2 genes that are transcribed in different directions. b, Expression profile of C2H2 genes along Scaffold 19852 in 12 octopus transcriptomes. Neural and developmental transcriptomes show high levels of expression for a majority of these C2H2 genes. In a and b, arrow denotes scaffold orientation. c, Distribution of fourfold synonymous site transversion distances (4DTv) between C2H2-domain-containing genes. PowerPoint slide Full size image

The octopus genome encodes 168 multi-exonic protocadherin genes, nearly three-quarters of which are found in tandem clusters on the genome (Fig. 2b), a striking expansion relative to the 17–25 genes found in Lottia, Crassostrea gigas (oyster) and Capitella genomes. Protocadherins are homophilic cell adhesion molecules whose function has been primarily studied in mammals, where they are required for neuronal development and survival, as well as synaptic specificity18. Single protocadherin genes are found in the invertebrate deuterostomes Saccoglossus kowalevskii (acorn worm) and Strongylocentrotus purpuratus (sea urchin), indicating that their absence in Drosophila melanogaster and Caenorhabditis elegans is due to gene loss. Vertebrates also show a remarkable expansion of the protocadherin repertoire, which is generated by complex splicing from a clustered locus rather than tandem gene duplication (reviewed in ref. 19). Thus both octopuses and vertebrates have independently evolved a diverse array of protocadherin genes.

A search of available transcriptome data from the longfin inshore squid Doryteuthis (formerly, Loligo) pealeii20 also demonstrated an expanded number of protocadherin genes (Supplementary Note 8.3). Surprisingly, our phylogenetic analyses suggest that the squid and octopus protocadherin arrays arose independently. Unlinked octopus protocadherins appear to have expanded ∼135 Mya, after octopuses diverged from squid. In contrast, clustered octopus protocadherins are much more similar in sequence, either due to more recent duplications or gene conversion as found in clustered protocadherins in zebrafish and mammals21.

The expression of protocadherins in octopus neural tissues (Fig. 2) is consistent with a central role for these genes in establishing and maintaining cephalopod nervous system organization as they do in vertebrates. Protocadherin diversity provides a mechanism for regulating the short-range interactions needed for the assembly of local neural circuits18, which is where the greatest complexity in the cephalopod nervous system appears2. The importance of local neuropil interactions, rather than long-range connections, is probably due to the limits placed on axon density and connectivity by the absence of myelin, as thick axons are then required for rapid high-fidelity signal conduction over long distances. The sequence divergence between octopus and squid protocadherin expansions may reflect the notable differences between octopuses and decapodiforms in brain organization, which have been most clearly demonstrated for the vertical lobe, a key structure in cephalopod learning and memory circuits2,22. Finally, the independent expansions and nervous system enrichment of protocadherins in coleoid cephalopods and vertebrates offers a striking example of convergent evolution between these clades at the molecular level.

As with the protocadherins, we found multiple clusters of C2H2 ZNF transcription factor genes (Fig. 3a and Supplementary Note 8.4). The octopus genome contains nearly 1,800 multi-exonic C2H2-containing genes (Table 1), more than the 200–400 C2H2 ZNFs found in other lophotrochozoans and the 500–700 found in eutherian mammals, in which they form the second-largest gene family23. C2H2 ZNF transcription factors contain multiple C2H2 domains that, in combination, result in highly specific nucleic acid binding. The octopus C2H2 ZNFs typically contain 10–20 C2H2 domains but some have as many as 60 (Supplementary Note 8.4). The majority of the transcripts are expressed in embryonic and nervous tissues (Fig. 3b). This pattern of expression is consistent with roles for C2H2 ZNFs in cell fate determination, early development and transposon silencing, as demonstrated in genetic model systems23.

The expansion of the O. bimaculoides C2H2 ZNFs coincides with a burst of transposable element activity at ∼25 Mya (Fig. 3c). The flanking regions of these genes show a significant enrichment in a 70–90 base pair (bp) tandem repeat (31% for C2H2 genes versus 4% for all genes; Fisher’s exact test P value <1 × 10−16), which parallels the linkage of C2H2 gene expansions to β-satellite repeats in humans24. We also found an expanded C2H2 ZNF repertoire in amphioxus (Table 1), showing a similar enrichment in satellite-like repeats. These parallels suggest a common mode of expansion of a highly dynamic transcription factor family implicated in lineage-specific innovations.

To investigate further the evolution of gene families implicated in nervous system development and function, we surveyed genes associated with axon guidance (Table 1) and neurotransmission (Table 2), identifying their homologues in octopus and comparing numbers across a diverse set of animal genomes (Supplementary Notes 8, 9, 10). Several patterns emerged from this survey. The gene complements present in the model organisms D. melanogaster and C. elegans often showed striking departures from those seen in lophotrochozoans and vertebrates (Table 2 and Supplementary Note 10). For example, D. melanogaster encodes one member of the discs large (DLG) family, a key component of the postsynaptic scaffold. In contrast, mammals have four DLGs, which (along with other observations) led to suggestions that vertebrates possess uniquely complex synaptic machinery25. However, we found three DLGs in both octopus and limpet, suggesting that vertebrate and fly gene number differences are not necessarily diagnostic of exceptional vertebrate synaptic complexity (Supplementary Note 10.6).

Table 2 Ion channel subunits Full size table

Overall, neurotransmission gene family sizes in the octopus were very similar to those seen in other lophotrochozoans (Table 2 and Supplementary Note 10), except for a few strikingly expanded gene families such as the sialic acid vesicular transporters (sialins) (Supplementary Note 10.2). We did find variations in the sizes of neurotransmission gene families between human and lophotrochozoans (Table 2 and Supplementary Note 10), but no evidence for systematic expansion of these gene families in vertebrates relative to octopus or other lophotrochozoans. Although some gene families were larger in mammals or absent in lophotrochozoans (for example, ligand-gated 5-HT receptors), others were absent in mammals and present in invertebrates (for example, anionic glutamate and acetylcholine receptors). The complement of neurotransmission genes in octopus may be broadly typical for a lophotrochozoan, but our findings suggest it is also not obviously smaller than is found in mammals.

Among the octopus complement of ligand-gated ion channels, we identified a set of atypical nicotinic acetylcholine receptor-like genes, most of which are tandemly arrayed in clusters (Extended Data Fig. 7). These subunits lack several residues identified as necessary for the binding of acetylcholine26, so it is unlikely that they function as acetylcholine receptors. The high level of expression of these divergent subunits within the suckers raises the interesting possibility that they act as sensory receptors, as do some divergent glutamate receptors in other protostomes27. In addition, we identified 74 Aplysia-like and 11 vertebrate-like candidate chemoreceptors among the octopus GPCR superfamily of ∼330 genes (Extended Data Fig. 6).

We found, amid extensive transcription of octopus transposons, that a class of octopus-specific short interspersed nuclear element sequences (SINEs) is highly expressed in neural tissues (Supplementary Note 4 and Extended Data Fig. 8). Although the role of active transposons is unclear, elevated transposon expression in neural tissues has been suggested to serve an important function in learning and memory in mammals and flies28.

Transposable element insertions are often associated with genomic rearrangements29 and we found that the transposon-rich octopus genome displays substantial loss of ancestral bilaterian linkages that are conserved in other species (Supplementary Note 6 and Extended Data Fig. 9). Interestingly, genes that are linked in other bilaterians but not in octopus are enriched in neighbouring SINE content. SINE insertions around these genes date to the time of tandem C2H2 expansion (Extended Data Fig. 9d), pointing to a crucial period of genome evolution in octopus. Other transposons such as Mariner show no such enrichment, suggesting distinct roles for different classes of transposons in shaping genome structure (Extended Data Fig. 9c).

Transposable element activity has been implicated in the modification of gene regulation across several eukaryotic lineages29. We found that in the nervous system, the degree to which a gene’s expression is tissue-specific is positively correlated with the transposon load around that gene (r2 values ranging from 0.49 in the optic lobe to 0.81 in the subesophageal brain; Extended Data Fig. 8 and Supplementary Note 4). This correlation may reflect modulation of gene expression by transposon-derived enhancers or a greater tolerance for transposon insertion near genes with less complex patterns of tissue-specific gene regulation.

Using a relaxed molecular clock, we estimate that the octopus and squid lineages diverged ∼270 Mya, emphasizing the deep evolutionary history of coleoid cephalopods8,30 (Supplementary Note 7.1 and Extended Data Fig. 10a). Our analyses found hundreds of coleoid- and octopus-specific genes, many of which were expressed in tissues containing novel structures, including the chromatophore-laden skin, the suckers and the nervous system (Extended Data Fig. 10 and Supplementary Note 11). Taken together, these novel genes, the expansion of C2H2 ZNFs, genome rearrangements, and extensive transposable element activity yield a new landscape for both trans- and cis-regulatory elements in the octopus genome, resulting in changes in an otherwise ‘typical’ lophotrochozoan gene complement that contributed to the evolution of cephalopod neural complexity and morphological innovations.