Giant DNA viruses are visible under a light microscope and their genomes encode more proteins than some bacteria or intracellular parasitic eukaryotes. There are two very distinct types and infect unicellular protists such as Acanthamoeba. On one hand, Megaviridae possess large pseudoicosahedral capsids enclosing a megabase-sized adenine–thymine-rich genome, and on the other, the recently discovered Pandoraviruses exhibit micron-sized amphora-shaped particles and guanine–cytosine-rich genomes of up to 2.8 Mb. While initiating a survey of the Siberian permafrost, we isolated a third type of giant virus combining the Pandoravirus morphology with a gene content more similar to that of icosahedral DNA viruses. This suggests that pandoravirus-like particles may correspond to an unexplored diversity of unconventional DNA virus families.

The largest known DNA viruses infect Acanthamoeba and belong to two markedly different families. The Megaviridae exhibit pseudo-icosahedral virions up to 0.7 μm in diameter and adenine–thymine (AT)-rich genomes of up to 1.25 Mb encoding a thousand proteins. Like their Mimivirus prototype discovered 10 y ago, they entirely replicate within cytoplasmic virion factories. In contrast, the recently discovered Pandoraviruses exhibit larger amphora-shaped virions 1 μm in length and guanine–cytosine-rich genomes up to 2.8 Mb long encoding up to 2,500 proteins. Their replication involves the host nucleus. Whereas the Megaviridae share some general features with the previously described icosahedral large DNA viruses, the Pandoraviruses appear unrelated to them. Here we report the discovery of a third type of giant virus combining an even larger pandoravirus-like particle 1.5 μm in length with a surprisingly smaller 600 kb AT-rich genome, a gene content more similar to Iridoviruses and Marseillevirus, and a fully cytoplasmic replication reminiscent of the Megaviridae. This suggests that pandoravirus-like particles may be associated with a variety of virus families more diverse than previously envisioned. This giant virus, named Pithovirus sibericum, was isolated from a >30,000-y-old radiocarbon-dated sample when we initiated a survey of the virome of Siberian permafrost. The revival of such an ancestral amoeba-infecting virus used as a safe indicator of the possible presence of pathogenic DNA viruses, suggests that the thawing of permafrost either from global warming or industrial exploitation of circumpolar regions might not be exempt from future threats to human or animal health.

Ten years ago, the discovery of Acanthamoeba polyphaga Mimivirus revealed the existence of giant DNA viruses with particles large enough to be visible under a light microscope (1, 2). Further sampling of various environments and geographical locations led to the isolation of Mimivirus variants (3, 4) and more distant relatives, two of which have been fully sequenced: Moumouvirus (5) and Megavirus chilensis (6). All of these viruses share the same distinctive particle structure: a unique external fiber layer enclosing a pseudoicosahedral protein capsid of about 0.5 µm in diameter, itself containing lipid membranes surrounding an electron-dense nucleocapsid. They share an adenine–thymine (AT)-rich (>70%) linear DNA genome with sizes up to 1.26 Mb encoding up to 1,120 proteins (6). They all encode a full transcription apparatus allowing them to replicate in the host’s cytoplasm. These common features suggested that all giant viruses were to belong to a single family (Megaviridae) sharing the same particle morphology, genomic features, and replication strategy (7, 8).

This assumption was radically challenged by the discovery of Pandoraviruses (9), which show different and somewhat opposite characteristics. These Acanthamoeba-infecting viruses exhibit much larger amphora-shaped virions 1–1.2 μm in length. Their guanine–cytosine (GC)-rich (>61%) genomes are up to 2.8 Mb long and encode up to 2,500 proteins sharing no resemblance with those of Megaviridae (9). Finally, Pandoravirus particles do not incorporate the transcription machinery that would allow them to entirely replicate in the host’s cytoplasm. Known giant viruses infecting Acanthamoeba were thus thought to belong to two very dissimilar types in terms of particle structure, genome characteristics, and replication strategies. Here we describe a third type of giant virus named “Pithovirus” (from the Greek word pithos designating the kind of large amphora handed over by the gods to the legendary Pandora) propagating in an even larger pandoravirus-like particle, but exhibiting a replication cycle and genomic features reminiscent of those of large icosahedral nucleocytoplasmic DNA viruses. Giant viruses are thus much more diverse than initially assumed.

Results

Particle Morphology. If the notion that DNA may persist over geological timespans (>1 My) is gaining a progressive acceptance, the fact that cellular organisms might survive that long remains a contentious issue (10, 11). Because of its neutral pH and reducing and anaerobic properties, northeast Siberian permafrost is among the most suitable environments to look for long-term surviving microorganisms (12) or even plants (13). Except for few studies targeting the Influenza virus and the Variola virus over historical timespans (14, 15), the possibility that DNA viruses might remain infectious over a much longer time scale has not yet been investigated. We recently initiated such a survey using Acanthamoeba as bait and its giant DNA viruses as safe surrogates for pathogenic viruses. In this context, Pithovirus was initially spotted using light microscopy as ovoid particles (Fig. S1) multiplying in a culture of Acanthamoeba castellanii inoculated with a sample of Siberian permafrost from the Kolyma lowland region (SI Materials and Methods). This sample was aseptically collected from a permafrost layer corresponding to late Pleistocene sediments older than 30,000 y (13). Similar paleosoils are known to contain cysts of Acanthamoeba cells (16). After amplification, the particles were analyzed by transmission electron microscopy confirming that Pithovirus shared the overall morphology of the Pandoraviruses (9) with slightly larger dimensions (∼1.5 µm in length, 500 nm in diameter). The virions exhibit a 60 nm-thick structured envelope made of one layer of parallel stripes (Fig. 1). An internal membrane (Fig. 1A) encases a compartment without discernible substructures except for an electron-dense sphere (50 nm in diameter) seen episodically but in a reproducible fashion (Fig. S1) and a tubular structure parallel to the long axis of the particle (Fig. 1B). At variance with the Pandoraviruses, the apex aperture of the Pithovirus particle appears sealed by a protruding cork (80 nm thick and 160 nm wide) with a hexagonal grid structure (Fig. 1 A and C) reminiscent of the organization of capsomers in icosahedral virions. A coil of rolled-up membrane seems to be connected to this apex structure (Fig. 1A). Fig. 1. Electron microscopy imaging of the Pithovirus replication cycle in A. castellanii. (A) Apex of the Pithovirus particle showing its unique cork made of 15 nm-spaced stripes, rolled membranes underneath, and the internal membrane. (B) Two perpendicular views of the Pithovirus particles (cross- and longitudinal sections). The particles are wrapped into a 60 nm-thick envelope made of 10 nm-spaced parallel stripes. A lipid membrane is enclosing a homogeneous interior where a tubular structure is seen episodically, but in a reproducible fashion (arrowhead). (C) Top view of the cork revealing a hexagonal honeycomb-like array. (D) Bottom view of the particle showing the striated organization of the envelope. (E) An opened Pithovirus particle in the host vacuole. Parts of the expelled cork are visible (black arrows) and the internal membrane of the particle (black arrowhead) appears ready to fuse with the vacuole membrane. (F) Maturing virions at a late stage of infection. Structures made of stripes, pieces of cork, and dense material accumulate (white arrowhead) in the periphery of the virion factory (VF). These structures may contain preassembled particle building blocks (Fig. S1). The cell nucleus (N) is visible. (G) Inset highlighting a late stage of virion maturation with globular striated structures accumulating at the virion periphery. (H) Various stages of particle assembly in the same cell. (I) Incompletely assembled rectangular particle lacking its thick envelope. The striated cork is already visible. (J) At a later stage, the particle adopts its final rounded shape while its envelope thickens. (K) Orthogonal view of an immature virion showing the envelope in the process of wrapping the particle.

Replication Cycle. The Pithovirus replication strategy was documented by following its propagation in axenic Acanthamoeba cultures over an entire multiplication cycle, starting from purified particles. A complete lysis of infected cultures occurred in 10–20 h depending on the initial number of virus particles. As for Pandoraviruses, the replication cycle begins with the phagocytosis of individual particles. First, the Pithovirus particles lose their apical cork allowing the underlying lipid membrane to fuse with the cellular vacuole membrane (Fig. 1E). This creates a channel between the most internal compartment of the virion and the cell cytoplasm (9). In contrast with the Pandoraviruses, the cell nucleus maintains its shape throughout the entire Pithovirus replication cycle (Movie S1). The first visible sign of infection is the formation of an area cleared of cytoplasm subcellular structures, 4–6 h postinfection (Fig. 1F). Numerous vesicles start accumulating in this presumable virion factory (Fig. S1). The process of virion formation is reminiscent of Pandoravirus, the envelope and the interior of the Pithovirus particles being assembled or “knitted” simultaneously (9) (Fig. 1H and Movie S1). First, rectangular-shaped closed particles with their characteristic cork appear at the periphery of the virion factory (Fig. 1I). Later on, their outer wall thickens and the particles take their final ovoid shape, still lacking their thick striated tegument (Fig. 1J). This layer is built subsequently and piecewise, as evidenced in cross section images of immature particles (Fig. 1K) as well as by the fuzzy appearance of the tegument-like envelope in the latest stages of particle maturation (Fig. 1G and Fig. S1). After 6–8 h, particles at various stages of maturation may coexist in the same virion factory. Besides dense vesicles of unknown composition accumulating inside the infected cells, shapeless blobs made of pieces of striated envelope, pieces of the corks, and diffuse material reminiscent of the mature particles interior (Fig. 1F and Fig. S1) are seen at the periphery of the virion factories. They might be reservoirs of partially organized virion building blocks. Mature particles are found in equal amounts in the cytoplasm or in vacuoles suggesting that they could exit the cell by exocytosis (Movie S1). The replicative cycle ends with the cells releasing hundreds of particles upon lysis. Despite their bacterial-like dimensions no image of particles undergoing binary fission was encountered throughout our comprehensive electron microscopy study, hinting at the viral nature of Pithoviruses before the analysis of their gene content.

Genome Sequencing. DNA from purified particles was sequenced using the standard Illumina protocol (2× 100 nt paired end reads). The resulting 600 kb of unique sequence data readily allowed the identification of most genes, but could not be assembled in less than 80 nonoverlapping scaffolds. The finished genome was obtained by sequencing a Nextera mate pair library (5–8 kb inserts) using the Illumina MiSeq platform (1,066,320 reads, 2× 250 nt) combined with 77,241 PacBio RS long single-end reads. Unexpectedly, given the similar morphology of their particles, the Pithovirus genome was found to be completely different from the Pandoravirus’s in terms of size, nucleotide composition, topology, and gene content. The Pithovirus genome consists of an AT-rich (64%) dsDNA molecule of a mere 610,033 bp whereas the Pandoraviruses exhibit a GC-rich genome (> 61%) of up to 2.8 Mb. As for the Iridoviridae (17) and the Acanthamoeba-infecting Marseilleviridae (18), the Pithovirus genome sequence either corresponded to a terminally redundant circularly permutated linear DNA molecule or to a closed circle (Fig. S2), in contrast with the linear Pandoravirus genomes flanked by terminal repeats (9). Finally the Pithovirus genome is predicted to encode a mere 467 proteins, much less than the 2,500 predicted proteins of Pandoravirus salinus. The Pithovirus particle appears to be out of proportion with its gene content compared with other DNA viruses such as the Phaeocystis globosa virus packing a similar number of genes into an icosahedral capsid 150 nm in diameter (150-fold less in volume) (19).

Genome Annotation. The genome sequence was analyzed using BLAST (National Center for Biotechnology Information, NCBI) and a combination of motif search and protein-fold recognition methods (SI Materials and Methods). As is customary on the discovery of the first member of a previously unknown virus family, the proportion of Pithovirus-predicted proteins with recognizable homologs in the NCBI database was low (152/467 = 32.5%). For comparison, it was 60.6% for Mimivirus (2), 41% for Marseillevirus (18), and only 15.7% for Pandoravirus (9). The best matches are distributed almost equally among DNA viruses, bacteria, and eukaryotes, suggesting the absence of close relatives among previously sequenced organisms (Fig. 2A). A very similar distribution (χ2 = 0.683, P > 0.95) was computed for the 159 predicted protein validated by their detection in the particle proteome (Table S1), confirming that the small fraction of database matches was not due to bioinformatic overpredictions (Fig. 2C). Fig. 2. Distributions of the Pithovirus protein closest homologs. (A) All predicted protein sequences against the NCBI NR (non-redundant) database. (B) Distribution of the 51 best-matching viral proteins. (C) Subset of the 159 proteins detected in the particle proteome. (D) Distribution of the predicted protein functions. Although only one-third of the best database matches corresponded to viral proteins (representing 11% of the total gene content), the absence of genes coding for translation components, ATP-generating enzymes, or related to cell-division confirmed the viral nature of Pithovirus (20). The low level of sequence similarity of these best matches (44% identical residues in average across the highest BLAST scoring segment pairs) (Tables S1 and S2) as well as their dispersion among different DNA virus families argues against Pithovirus being a member of any one of them. On the basis of this distribution (Fig. 2B), the Pithovirus appears globally more similar to Marseilleviridae (19 best hits), then Megaviridae (15 best hits), and then Iridoviridae (10 best hits), all of which are well-established families of icosahedral large DNA viruses. Remarkably, there were only five Pithovirus proteins with their closest homologs in Pandoraviruses. None of these proteins are clearly associated with a functional attribute, except for a remote phosphoglycerate mutase homolog [Pithovirus protein #15 (pv15), 27% identity]. An unusually large fraction of the Pithovirus genome (129 kb, 21.2%) corresponds to multiple regularly interspersed copies of a noncoding repeat, the intricate structure of which produces a unique fractal-like dot-plot pattern (Fig. 3 A and B). These repeats exhibit GC content (23%) much lower than the coding regions (41%), similar to the one of intergenic regions (24%). These repeats are composed of gene-free 2 kb-long tandem arrays of a well-conserved 150-bp palindromic motif (Fig. 3C). This motif is not similar to a previously described mobile element and unrelated to the repeats found in some Iridovirus genomes (21) or Emiliania huxleyi virus (22). The high-repeat content of the Pithovirus genome decreases its coding density to an unusually low 68%. This value is restored to 85.7% (i.e., 1,048 bp per gene), typical of viruses and prokaryotes, when the repeat moiety is not taken into account. The unique Pithovirus genome structure may result from the multiplication of an invasive selfish DNA sequence or from physical constraints reflecting a specific genome organization, mode of replication, packaging, or transcription. The high copy number of these palindromic repeats contrasts with their known instability in cellular genomes where they tend to be rapidly eliminated after several rounds of replication (23, 24). Fig. 3. Distribution, structure, and expression of Pithovirus genome repeats. (A) Alignment (dot-plot) of the Pithovirus genome nucleotide sequence against itself. Repeated sequences appear as a black patchwork. The x axis shows genomic position and the y axis shows the gene position. The upper part of the figure shows the distribution of genes on the forward strand (red) and reverse strand (blue). (B) Enlarged view on one of the repeat-containing regions. Each cross is characteristic of a palindromic sequence whereas parallel lines indicate tandem repeated sequences. Notice that each palindromic sequence is itself repeated multiple times. (C) Sequence logo showing the sequence conservation of the palindromic repeats. (D) The transcription level assigned to each genome position is defined as its coverage by RNA-seq reads. The repeat regions are the least expressed. Among the 152 predicted proteins with a database match, only 125 (26.7% of the 467 predicted proteins) are associated with functional attributes. Most of them are poorly informative such as protein–protein interaction motifs (e.g., zinc-finger, ankyrin domain, leucine-rich, or collagen triple helix repeats), or motifs involved in various signaling/regulatory pathways (e.g., kinase, phosphatase, GTP-binding) (Table S1). Typical of large DNA viruses, the dominant functional categories were DNA transcription (17 genes), DNA repair (11 genes, including an ATP-dependent DNA ligase), nucleotide synthesis (7 enzymes, including a ThyX alternative thymidylate synthase), and DNA replication (5 genes). Other categories include carbohydrate processing, RNA processing, and various hydrolases and oxidoreductases (Fig. 2D). At variance with other giant DNA viruses, no component of the protein translation machinery, including tRNA, is encoded in the Pithovirus genome. The presence of a complete virus-encoded transcription machinery (most of which found in the particle; Table S1) is consistent with the cytoplasmic location of the Pithovirus replication (Fig. 1). In contrast with the extremely high-repeat content of the genome, a single intervening sequence, a group I self-splicing intron, was detected and validated in the DNA-dependent RNA polymerase large subunit gene (RPB1, pv366–368). As borderline database matches were further scrutinized, we detected a remote similarity (E value = 0.002, 21% identical residues) between the pv460 gene product and the divergent major capsid protein (MCP) characteristic of Iridoviruses (e.g., in Megalocytivirus; GenBank accession no. AFE85881.1). The FUGUE server (25) predicted that the pv460 protein would adopt the structural “jelly-roll” fold common to large DNA virus MCPs. However, although the pv460 gene is transcribed, its product was not found in the mature virion. Like the Poxvirus D13 protein (26, 27), the pv460 protein might only play a transient role in the particle morphogenesis.

Phylogenetic Analysis. Despite its sizable Pandoravirus-like particle, Pithovirus exhibits a replication cycle and a gene content (Fig. 2 and Table S2) more similar to those of previously described large icosahedral eukaryotic DNA viruses, such as Marseilleviridae and Iridoviridae. This global picture is consistent with the neighbor-joining clustering pattern of the Pithovirus DNA polymerase (Fig. 4). It is further confirmed by the phylogenetic positioning of three other viral core proteins shared with Pandoraviruses using maximum likelihood-based analyses and a cladogram based on the presence/absence of 205 conserved viral genes (Fig. S3). In all cases, Pithovirus was positioned within well-supported clades including Marseilleviruses, Iridoviruses, or both. The tree topologies confirmed the unexpected absence of a close evolutionary relationship between Pithovirus and the look-alike Pandoraviruses. Fig. 4. Clustering of viral and eukaryotic DNA polymerases. A multiple alignment of 57 eukaryotic and large virus DNA polymerase sequences (569 ungapped positions) was computed using the default options of the MAFFT server (40). The neighbor-joining tree was built using the JTT substitution model (estimated α = 1.05) and 100 bootstrap resamplings were performed. The tree was rooted at the basis of the eukaryotes and collapsed for bootstrap values <50 before drawing using MEGA5 (41). The Pithovirus DNA polymerase sequence (red) does not cluster with the Pandoraviruses (purple), but falls within a clade clustering the Iridoviruses and Marseilleviruses (orange). Other colors are used to distinguish eukaryotes (turquoise) and viruses from different families: Megaviridae (green), Phycodnaviridae (blue), Herpesviridae (dark gray), Baculoviridae (light gray), Asfar (black), and Poxviridae (gray).

Particle Proteome. The proteome analysis of purified Pithovirus virions identified 159 different gene products, two-thirds of them corresponding to unknown functions (Fig. 2C and Table S1). The number of proteins making the Pithovirus and Pandoravirus particles (i.e., 210) is thus similar, despite their fivefold difference in gene number (9). This finding refutes the simple idea that large and complex particles should always correlates with large genomes, as observed until now (2, 3, 5, 6, 9). However, despite their overall similar morphology and complexity, the Pithovirus and Pandoravirus particles are made of entirely different sets of proteins, only sharing one pair of homologous proteins (pv384 and ps500) with highly discrepant abundances in their respective proteomes (ranked 8th and 162nd, respectively) (Table S1). In further contrast with the Pandoraviruses, the Pithovirus particles incorporate the complete transcriptional machinery encoded by its genome (Table S1). The availability of a preloaded functional transcription apparatus in the particle is the sine qua non for a fully cytoplasmic replication cycle also characterizing the Megaviridae (8) and the Poxviridae (26, 28). The presence of glycosylated proteins in the Pithovirus particles [also found in Marseillevirus (18) and Mimivirus (29)] is another noticeable difference from Pandoravirus, in which none were found (Fig. S4). In the absence of detectable MCP or core protein homologs, the major structural components of the Pithovirus particle remain to be identified. Potential candidates include the ankyrin domain-containing proteins and collagen triple helix repeat-containing proteins encoded in multiple copies in the genome. None of the ankyrin domain-containing proteins are detected in the proteome whereas 7 of the 12 collagen repeat-containing proteins were readily detected and may participate in the structural scaffold, corresponding to the striated tegument patterns (Fig. 1). Upon semiquantitative analysis (30) (Table S1), the four far most abundant proteins are encoded by genes pv449, pv461, pv93, and pv106. Unfortunately neither their amino acid sequences nor predicted fold provide clues about their functions or evolutionary origin. Besides most of the enzymes related to nucleotide synthesis or nucleic acid processing, a few others are associated with the particle, such as an adenylosuccinate synthetase (pv90), an ADP ribosyl glycohydrolase (pv118), GTP-binding proteins (pv116, pv127, and pv137) and a GTPase (pv213), a protease (pv133), protein kinases (pv156, pv405, and pv438), hydrolases (pv212 and pv289), oxidoreductases (pv342 and pv467), dehydratase (pv385), and a glycosyltransferase (pv406). These enzymes, as well as 37 A. castellanii proteins, all detected in low abundance (Table S1), might be simple by-standers or play a role at the earliest stages of infection.