Although viruses are well-characterized regulators of eukaryotic algae, little is known about those infecting unicellular predators in oceans. We report the largest marine virus genome yet discovered, found in a wild predatory choanoflagellate sorted away from other Pacific microbes and pursued using integration of cultivation-independent and laboratory methods. The giant virus encodes nearly 900 proteins, many unlike known proteins, others related to cellular metabolism and organic matter degradation, and 3 type-1 rhodopsins. The viral rhodopsin that is most abundant in ocean metagenomes, and also present in an algal virus, pumps protons when illuminated, akin to cellular rhodopsins that generate a proton-motive force. Giant viruses likely provision multiple host species with photoheterotrophic capacities, including predatory unicellular relatives of animals.

Giant viruses are remarkable for their large genomes, often rivaling those of small bacteria, and for having genes thought exclusive to cellular life. Most isolated to date infect nonmarine protists, leaving their strategies and prevalence in marine environments largely unknown. Using eukaryotic single-cell metagenomics in the Pacific, we discovered a Mimiviridae lineage of giant viruses, which infects choanoflagellates, widespread protistan predators related to metazoans. The ChoanoVirus genomes are the largest yet from pelagic ecosystems, with 442 of 862 predicted proteins lacking known homologs. They are enriched in enzymes for modifying organic compounds, including degradation of chitin, an abundant polysaccharide in oceans, and they encode 3 divergent type-1 rhodopsins (VirR) with distinct evolutionary histories from those that capture sunlight in cellular organisms. One (VirR DTS ) is similar to the only other putative rhodopsin from a virus (PgV) with a known host (a marine alga). Unlike the algal virus, ChoanoViruses encode the entire pigment biosynthesis pathway and cleavage enzyme for producing the required chromophore, retinal. We demonstrate that the rhodopsin shared by ChoanoViruses and PgV binds retinal and pumps protons. Moreover, our 1.65-Å resolved VirR DTS crystal structure and mutational analyses exposed differences from previously characterized type-1 rhodopsins, all of which come from cellular organisms. Multiple VirR types are present in metagenomes from across surface oceans, where they are correlated with and nearly as abundant as a canonical marker gene from Mimiviridae. Our findings indicate that light-dependent energy transfer systems are likely common components of giant viruses of photosynthetic and phagotrophic unicellular marine eukaryotes.

Viruses are increasingly recognized as key participants in the marine carbon cycle, short circuiting the classical flow of carbon through food chains to higher trophic levels (1⇓–3). Much is known about how marine phages alter bacterial metabolism, such as supplementing photosynthetic machinery during infection (4, 5), and about viruses that infect protists (unicellular eukaryotes), especially photosynthetic taxa, and the auxiliary metabolic genes (AMGs) that they possess (6⇓–8). Over the last 15 y, there has also been the remarkable discovery of viruses with large genomes (>300 Kb) that infect eukaryotes, the so-called giant viruses (9⇓⇓⇓–13). Giant viruses encode numerous functions previously considered exclusive to cellular life, such as transfer RNA (tRNA) synthetases, translation initiation and elongation factors, and tRNAs. Those described so far primarily infect predatory protists that live in soils, wastewater, and freshwater, especially members of the Amoebozoa and Excavata eukaryotic supergroups, and have genomes that range up to 2.4 Mb (Fig. 1A) (9⇓⇓⇓–13). The 6 isolated from the ocean water column, an environment where both viruses and protists have massive ecological importance (14⇓⇓–17), infect 3 haptophyte algal species (Phaeocystis globosa, Emiliania huxleyi, and Chrysochromulina ericina), 1 green alga (Tetraselmis sp.), 1 stramenopile alga (Aureococcus anophagefferens), and 1 nonphotosynthetic predatory stramenopile (Cafeteria roenbergensis) (18⇓⇓⇓⇓–23). These marine viruses have smaller genomes, ranging from 370 to 670 Kb, than many other giant viruses, and all belong to the nucleocytoplasmic large DNA viruses (NCLDV) family, which houses smaller eukaryotic marine viruses as well (24) (Dataset S1). Nevertheless, the marine giants encode a number of AMGs that connect to how they alter host metabolism during infection, such as fermentation-related genes (20) and sphingolipid-biosynthesis genes (6) in algal viruses, essential information for considering downstream biogeochemical processes and modeling the impacts of virus–host interactions on ecosystem processes.

A giant virus infects a predatory protist that is considered to be among the closest living unicellular relatives of metazoans. (A) Schematic tree of eukaryotes, with supergroups indicated by colors or gray branches if in contentious positions. Lineages with giant viruses (pink) known (circles) or discovered here (star) are indicated. (B) Locations of single-cell sorting where ChoanoV1 and its host, B. minor, were recovered (Station M2), where ChoanoV2 (Station 67-70) was found, and where metatranscriptomes were sequenced from unmanipulated seawater (M1, M2, 67-70; Station 67-155, 785 km from shore, not displayed on map for scale reasons). (C) Histogram showing the population (circled) of sorted choanoflagellate cells (blue dots), including the viral-infected cell (pink), based on index sorting and V4 18S rRNA gene amplicon sequencing. Other data points reflect unsorted particles in the stained seawater analyzed. The box (green) indicates the position of YG bead standards run before and after sorting at the same settings. (D) Categorized summary of the top 10 BLASTp matches for 862 ChoanoV1 proteins (e-value < 10 −5 ) in cellular organisms and NCLDV.

The paucity of giant viruses isolated from marine ecosystems likely results from dependence of classical viral isolation methods on cultured hosts, such as the bacterivorous stramenopile Cafeteria, for recovering CroV (21). Unfortunately, many marine protists remain uncultured (15, 25) and hence, are not available for use as viral bait. This is especially so for predatory protists, in part because the natural consortia that constitute their food base are outcompeted by a few copiotrophic, relatively large bacterial taxa once in enriched medium in the laboratory (25). In some cases, metagenomics has been used to recover genome-level information while obviating cultivation. In particular, giant virus genomes have been assembled from metagenomic data acquired from low-diversity, simplified ecosystems [e.g., wastewater (12) and a hypersaline lake in Antarctica (26)]. However, these approaches are less successful in high-diversity environments, unless the biological entity has high abundance, and they fail to directly link virus to host (13), an important factor for understanding ecological impacts. To overcome these challenges, we integrated multiple culture-independent and laboratory methods to perform this cross-scale study, in which we first sorted individual wild predatory protists and used single-cell metagenomics to examine these eukaryotes and coassociated entities. With a resulting genome from an uncultured giant virus in hand, we asked how its predicted functional attributes differed from the marine giant virus genomes characterized previously, all of which come from cultivation-based isolation and sequencing, and from the plethora of giant viruses from nonmarine habitats. Furthermore, we identified conserved attributes and established the distribution and biochemical function of a viral rhodopsin that thus far seems unique to giant viruses in the marine biosphere.

Results and Discussion

A Viral Chromophore Biosynthesis Pathway. Demonstration of VirR DTS proton-pumping activity on illumination raises questions regarding the natural source of the carotenoids needed to produce the light-harvesting chromophore, retinal (50, 51), especially in a nonphotosynthetic host, like Bicosta. Most algae, including PgV’s host Phaeocystis, biosynthesize the required pigment, β-carotene (and related carotenoids), as well as the retinal-producing carotenoid cleavage oxygenase (Blh) (Fig. 4). However, most heterotrophic eukaryotes, including animals, do not biosynthesize β-carotene, instead acquiring carotenoids through diet. As expected, cultured genome-sequenced choanoflagellates encode only early steps that overlap between sterol and carotenoid biosynthesis and a final cleavage enzyme (Dataset S5). Likewise, BLASTx searches against the Bicosta 4-well partial genome assembly failed to recover carotenoid biosynthesis enzymes. Remarkably, the ChoanoVirus genome analyses exposed both the β-carotene biosynthesis pathway and Blh, with 4 proteins being adjacent to one another, similar to the pathway in bacteria (76) (Fig. 4, SI Appendix, Fig. S12A, and Dataset S5). Eastern North Pacific metatranscriptomes confirmed expression of all components (Fig. 4). Thus, while the algal virus relies on its host to biosynthesize the pigment used in light-energy transfer, ChoanoViruses encode the complete rhodopsin-based photosystem. Fig. 4. Functional attributes of ChoanoViruses include chromophore biosynthesis. Shown are carotenoid pathway components and final retinal-forming cleavage step in genome data from haptophytes (Phaeocystis antarctica and Chrysochromulina representing P. globosa, which lacks genome data), choanoflagellates (M. brevicollis and S. rosetta), and relevant viruses and in metatranscriptomes. The stars indicate the two ChoanoVirus genomes and a metatranscriptome from the station where ChoanoV1 was recovered. The circle indicates the only cultured virus with a rhodopsin. *These taxa lack Blh but have RPE65 used for retinal production (e.g., in vertebrates and relatives). Detection in Pacific metatranscriptomes based on reads recruited to ChoanoV1 by BLASTx (e-value < 10−10); those that mapped at >95% nucleotide identity are indicated in Dataset S5. OPP, pyrophosphate group; FPP, farnesyl diphosphate; GGPP, geranylgeranyl diphosphate. The evolutionary origins of the retinal biosynthesis proteins in the ChoanoViruses remain unclear. They seem to derive from archaea (phytoene synthase) or marine bacteria (phytoene desaturase) or are too divergent for robust phylogenetic conclusions (lycopene cyclase, Blh) (SI Appendix, Fig. S12 B–E). In each case, the respective ChoanoV1 and ChoanoV2 proteins clustered together, indicating their common origin. Rhodopsin-bearing bacterial or archaeal lineages with retinal biosynthesis-related genes are each thought to have acquired them together as a unit by HGT (77). However, despite the 4 ChoanoVirus retinal biosynthesis genes being colocated in the genome, long branch lengths and incomplete taxonomic sampling make it unclear whether these proteins were accumulated over time or acquired in a single HGT event, although the latter scenario seems most likely.