Dinoflagellates are microbial eukaryotes that have exceptionally large nuclear genomes; however, their organelle genomes are small and fragmented and contain fewer genes than those of other eukaryotes. The genus Amoebophrya (Syndiniales) comprises endoparasites with high genetic diversity that can infect other dinoflagellates, such as those forming harmful algal blooms (e.g., Alexandrium). We sequenced the genome (~100 Mb) of Amoebophrya ceratii to investigate the early evolution of genomic characters in dinoflagellates. The A. ceratii genome encodes almost all essential biosynthetic pathways for self-sustaining cellular metabolism, suggesting a limited dependency on its host. Although dinoflagellates are thought to have descended from a photosynthetic ancestor, A. ceratii appears to have completely lost its plastid and nearly all genes of plastid origin. Functional mitochondria persist in all life stages of A. ceratii, but we found no evidence for the presence of a mitochondrial genome. Instead, all mitochondrial proteins appear to be lost or encoded in the A. ceratii nucleus.

Here, we present the complete genome of Amoebophrya ceratii, a parasite of the toxin-producing species Alexandrium catenella. Examining the A. ceratii genome structure and metabolism sheds new light on the early evolution of unusual genomic characteristics in dinoflagellates and suggests that the parasite has lost its plastid organelle and its mitochondrial genome, in spite of maintaining an otherwise normal aerobic mitochondrion.

The life cycle of Amoebophrya was described more than 40 years ago and was recently examined in detail by using electron microscopy ( 17 ). The infective free-living stage, the dinospore, has two flagella ( Fig. 1 ). The dinospore attaches to the host cell and enters its cytoplasm, losing the flagella in the process and becoming enclosed in a parasitophorous membrane. In most cases, the parasite crosses the host nuclear envelope, losing its parasitophorous membrane in the process ( 17 ). The growing parasite starts to digest its host, increases in size, and eventually forms the so-called beehive structure as a result of several consecutive mitotic divisions. The host cell wall then breaks down and releases a short-lived vermiform stage of the parasite, which divides into hundreds of infective dinospores ( 18 ). The maturation of the parasite within the host takes 2 to 3 days and is characterized by phases of differential gene expression ( 19 ).

The dynamics of HABs can be strongly affected by parasites, most commonly parasitic syndinians and perkinsids ( 13 ). Morphological features and molecular phylogenies place both lineages outside the core dinoflagellate group, together with the free-living genera Oxyrrhis and Psammosa ( 1 ). Sequencing on one deep-branching syndinian Hematodinium revealed that the parasite likely has secondarily lost the plastid organelle ( 14 ). The Amoebophryidae (Syndinea) is an exclusively endoparasitic family that comprises a large and diverse group of primarily environmental sequences, often referred to as the marine alveolate group II (MALV-II). Amoebophryidae includes a single genus, Amoebophrya ( 15 ), with seven described species that exhibit high genetic diversity ( 15 ). Amoebophrya species can infect a high proportion of blooming Alexandrium populations ( 13 , 16 ), and this infection has a direct effect on HAB formation and persistence ( 13 ).

Several species of dinoflagellates can produce potent toxins and are able to form harmful algal blooms (HABs) that have enormous impact on ecosystem functions ( 12 ). The species of the genus Alexandrium cause prominent HABs that persist for extended time periods under favorable abiotic and biotic conditions ( 12 ). Alexandrium species produce the potent neurotoxins, saxitoxin and its derivates, which are associated with paralytic shellfish poisoning ( 12 ) and have the potential to cause serious human disease and pose economic problems for fisheries.

The ancestor of dinoflagellates and apicomplexans was photosynthetic ( 7 ); however, currently, only some apicomplexan relatives Chromera and Vitrella and approximately half of the known core dinoflagellates maintain photosynthesis ( 8 ). Even photosynthetic dinoflagellates have highly reduced and fragmented plastid genomes (14 genes as compared to a typical plastid genome, which contains more than 100 genes), because most plastid genes have been transferred to the nucleus ( 7 ). Dinoflagellate and apicomplexan mitochondrial genomes are even more reduced, typically harboring only three protein-coding genes and fragments of ribosomal RNA (rRNA) genes ( 9 , 10 ), which represent the minimal mitochondrial genomes in aerobic species ( 11 ). However, recent examination of the respiratory chain in the photosynthetic Chromera velia showed that oxidative phosphorylation complexes I and III were lost, leaving only two protein-coding genes (coxI and coxIII) and fragments of the rRNA genes to be encoded in the mitochondrion ( 11 ).

Alveolates are a highly diverse group of eukaryotes, comprising three diverse phyla—dinoflagellates, apicomplexans, and ciliates—as well as a growing number of less-studied lineages, such as colponemids, chromopodellids, and perkinsids ( 1 , 2 ). Dinoflagellates include phototrophs, heterotrophs, mixotrophs, and parasites, which are characterized by chromosomes that are permanently condensed in a liquid-crystalline state throughout the cell cycle. Recently, genes encoding histone-like proteins ( 3 ) and a non-nucleosomal DNA packaging system involving unique proteins (with closest similarity to viruses) ( 4 ) have been discovered in dinoflagellates. Dinoflagellate genomes are usually 10 to 100 times larger than the human genome ( 5 ) and exhibit several unusual features whose evolutionary origins are unclear. In addition, dinoflagellate genes are typically expressed with a conserved short spliced leader (SL) sequence that is added by trans-splicing ( 6 ).

RESULTS AND DISCUSSION

Genomic characteristics and phylogenomics Total genomic DNA (gDNA) from dinospores of A. ceratii clone AT5.2 was sequenced and assembled, and contaminant sequences were removed on the basis of identity, coverage, and GC content criteria (Materials and Methods). This resulted in 2351 A. ceratii scaffolds totaling 87.7 Mb, with an average coverage of 110-fold. This genome size is smaller than the size indicated by flow cytometry (~120 Mb; fig. S1). This size difference between assembly and flow cytometry is likely due to repetitive elements, which collapse in the assembly to small contigs. The A. ceratii genome is substantially smaller than genomes in other dinoflagellates such as Hematodinium (50 times; ~4800 Mb) (14) and Symbiodinium (15 times; ~1100 to 1500 Mb) (20–22). To the best of our knowledge, this is the smallest dinoflagellate genome reported so far. The mean GC content of A. ceratii genome was calculated to be 55.9%, which is in the range of published dinoflagellate transcriptomes (23) but relatively higher compared to Symbiodinium spp. (43.6 to 50.5%) (20–22) and Hematodinium sp. (approximately 47%) draft genomes (Table 1) (14). Gene predictions identified 19,925 protein-coding genes and 39 transfer RNAs (tRNAs). We also mapped the transcript data obtained previously (16, 19) to the scaffolds and found that 12,200 transcripts mapped to this assembly. Despite the relatively small size, by dinoflagellate standards, the genome assembly appears to be largely complete, containing 89.1% of CEGMA (core eukaryotic gene mapping approach) conserved proteins (24). This is slightly higher than the Hematodinium genome (85.9%) (14). The A. ceratii predicted proteins were clustered into 4879 families by using OrthoMCL. Of these, 499 protein clusters belonged to 12 transposon domain families (table S1), indicating a high transposon activity in the A. ceratii genome. We further searched the A. ceratii genome and found 60 general transcription factors and 46 proteins with domains corresponding to specific transcriptional regulatory factors, numbers similar to other dinoflagellates and alveolates (table S2). Although transcription factors are not abundant in dinoflagellates, they likely play an indispensable role in adapting to changing conditions, as is common in other eukaryotes. Many transcripts of dinoflagellates are trans-spliced to a 22–base pair (bp) SL sequence (6), in which individual mRNAs may be processed from larger precursors by trans-splicing and polyadenylation. The presence of such SLs in gene-coding loci indicates the potential for mRNAs to be reintegrated into the genome as intronless genes after reverse transcription (25). We examined the A. ceratii genome for SLs and traces of such reintegration events. None of the predicted gene models was associated with a full-length SL motif; however, five gene models had truncated motifs (fig. S2A and table S3A). The low frequency of SL motifs at the genomic level suggests that mRNA reintegration events are rare. Fifty-three orphan full-length SL motifs were identified across 50 scaffolds (table S3B), and another 713 truncated SL motifs with identities of 73 to 100% (table S3C) were found. In the transcriptome dataset, 70 transcripts with single SL motifs were observed (fig. S2B). Only one contig contained a second truncated SL repeat (60% identity to the consensus sequence), and no third or fourth SL repeats were identified. Table 1 Features of A. ceratii and other dinoflagellate genomes. CDS, coding regions. N50 measures assembly quality as a weighted median of contig length. Higher N50 values denote greater contiguity. View this table: A. ceratii contained 51,066 introns in 15,016 predicted genes (fig. S3). In total, 28.4% of predicted genes were intronless, more than in the Symbiodinium genomes, whereas 61.1 to 98.3% of genes have introns (Table 1). This phenomenon is observed most probably as more streamlining forces act toward maintaining the small genome size in A. ceratii, as compared to the core dinoflagellates, and their parasitic lifestyle may not favor large gene family expansions (because of their dependency on the host-prey coevolution). Mapping of RNA sequencing (RNAseq) reads (table S4) onto the A. ceratii genome revealed that genes without introns were expressed at a similar level as transcripts with introns (table S4). Our data thus suggest that gene reintegration by retroposition is rare in Amoebophrya and such events may be more common in the more complex genomes of core dinoflagellates. The phylogenetic position of Amoebophrya with respect to other dinoflagellates has been debated, in part because only a few genes have been available for phylogenetic analysis [see (1)]. We used a concatenated set of 100 conserved nuclear proteins of three Amoebophrya species isolated from different hosts and 15 other dinoflagellates as well as 13 outgroup species to compute maximum likelihood and Bayesian phylogenies (Fig. 1). In these analyses, Amoebophrya branched before core dinoflagellates, but after Oxyrrhis marina and Perkinsus marinus, and as a specific sister group to Hematodinium (another Syndiniales parasite). This placement is in agreement with previously reported results based on concatenated ribosomal proteins (1).

Metabolic features and dependence on the host Pathogens and parasites frequently use host resources to obtain compounds required for their own metabolism and reproduction, a relationship that often leads to losses or modifications of biosynthetic pathways in the parasite. In the A. ceratii genome, many genes encoding enzymes involved in various metabolic pathways are present in multiple copies, a common feature of dinoflagellate genomes (table S5) (26). Genes involved in amino acid biosynthesis and purine and pyrimidine biosynthesis are present in particularly high copy numbers (fig. S4 and table S5). Other prominently expanded orthologous groups include proteins involved in protein-protein or protein-carbohydrate interactions (107 proteins), carbohydrate degradation (50 proteins), and detoxification (44 proteins), which may be associated with the utilization of host-derived compounds during the infection phase of Amoebophrya (table S1). Fatty acids are constituent building blocks for cell membranes; they act as targeting molecules to direct proteins to membranes and function as energy molecules for metabolic processes or messenger molecules, all important processes in a parasite. In photosynthetic eukaryotes, fatty acid synthesis in the plastid is carried out by a cyanobacterium-derived type II fatty acid synthase (FAS) multienzyme (27), whereas heterotrophic eukaryotes typically rely on a cytosolic multidomain type I FAS. Some apicomplexans contain both type I and type II FAS, while others have lost one or the other (27). A. ceratii has a type I FAS complex (g12138.t1) that is closely related to that of Hematodinium (14) and apicomplexans, but no plastid type II FAS enzymes were found in the A. ceratii genome (fig. S5). A. ceratii and Hematodinium also contain a type I PKS (polyketide synthase) complex (scaffold1619_size12283) (fig. S5), which is likely involved in the production of secondary metabolites and could possibly be involved in host interactions (27). Enzymes involved in the synthesis of most amino acids were present in the A. ceratii genome (table S5), with the exception of a few individual enzymes that have likely been functionally substituted (28). This demonstrates the limited dependency of A. ceratii on its host. The shikimate pathway required for the synthesis of tyrosine, phenylalanine, and tryptophan consists of seven broadly conserved enzymes, five of which (AroB, AroA, AroK, AroD, and AroE) are fused in some eukaryotes (29). In A. ceratii, this five-domain protein is additionally fused to AroC (chorismate synthase, g6770; Fig. 2A) and all six genes are cotranscribed, a pattern not observed in any other organism to date. Moreover, the seventh enzyme of the shikimate pathway, AroG, is fused to a multifunctional tryptophan synthetase gene (g13589; Fig. 2B), which we confirmed by polymerase chain reaction (PCR) using both gDNA and complementary DNA (cDNA) templates. Gene fusions can provide a simple mechanism for concerted expression in eukaryotes. However, certain enzymes involved in converting chorismate (final product of the shikimate pathway) to tyrosine, phenylalanine, or tryptophan are not found in the A. ceratii genome (table S5). In vascular plants, approximately 20% of carbon fixed by photosynthesis is directed to the shikimate pathway, and it produces precursors not only for aromatic amino acid biosynthesis but also for various secondary metabolite pathways (30). Fig. 2 Shikimate (g6770) and tryptophan (g13589) synthesis pathway multidomain genes. (A) Individual domains of the shikimate pathway are illustrated by colored boxes, and domains of the tryptophan pathway are represented with differently shaded gray boxes. (B) Schematic view of the biosynthetic pathway for tryptophan in A. ceratii. Circles represent intermediates that can be synthesized in A. ceratii, and arrows indicate the respective enzymatic activities. Arrows without circles indicate missing pathway components in A. ceratii. The colors for the shikimate enzymatic activities are as in (A). For simplicity, all tryptophan pathway steps are depicted in gray.

Analysis of metabolic pathways points to a loss of the plastid organelle The ancestor of dinoflagellates and apicomplexans was an alga, and most of their current representatives are either still photosynthetic or metabolically dependent on a reduced, nonpigmented plastid (31). Plastid loss has only been shown in Cryptosporidium and Hematodinium, which have circumvented the need for plastid-derived metabolites by salvaging compounds of host origin (14). We investigated whether A. ceratii, which falls in the same lineage as Hematodinium, also lacks evidence for a relict plastid (17). We first searched for plastid metabolic genes in the A. ceratii genome (scaffolds, contigs, and gene models) by comprehensive homology searches, but we could not identify any orthologs for enzymes found in apicomplexan or dinoflagellate plastids (Materials and Methods). The synthesis of isoprenoid units is missing altogether, suggesting that A. ceratii obtains these compounds from host cells, similarly to Hematodinium. The synthesis of tetrapyrroles, fatty acids, and iron-sulfur clusters is predicted to take place in the cytosol and mitochondria (fig. S6), and in single-gene phylogenies, only one enzyme (HemD) appears to be derived from the plastidial endosymbiont (fig. S6). Unlike in typical plastids, however, the A. ceratii HemD lacks an N-terminal extension and signal and transit peptides characteristic of plastid targeting, as confirmed by the transcriptomic analysis of the 5′ gene end (Materials and Methods). This strongly indicates that HemD in A. ceratii has been relocalized to the cytosol, much like its ortholog in Hematodinium (14). Because all other enzymes for tetrapyrrole synthesis are predicted to be in the cytosol or mitochondria (fig. S6) and no other pathway necessitates plastid presence, the metabolism of A. ceratii poses no apparent barrier to plastid loss. To examine whether more endosymbiont-derived genes are present in A. ceratii, we classified all of its predicted proteins by using an automated phylogenetic pipeline (32). Proteins clustering with red algae or green plants in phylogenetic trees populated from a local database of representative eukaryotic and prokaryotic sequences were manually inspected for potential plastid functions (Materials and Methods). No putative endosymbiont-derived proteins were identified. Overall, there is no evidence for a plastid in A. ceratii, and we conclude that it has lost the organelle altogether, presumably in its common ancestor with Hematodinium.