A novel temperate bacteriophage, named SpaA1, was isolated from Staphylococus pasteuri recovered from soils of the Garwood Valley, Southern Victoria Land, Antarctica. Bacterial cultures were grown from single colonies in liquid nutrient medium in the presence of mitomycin C to induce prophages from lysogenic bacteria. SpaA1 was isolated from the growth medium and examined by transmission electron microscopy (TEM) ( Figure 1A ). The morphology of SpaA1 is typical of the Siphoviridae family of phages. SpaA1 virions have isometric heads (B1 morphotype) with a diameter of ∼63 nm. The virion tails are ∼210 nm long and appear to be flexible and non-contractile.

General Features of the SpaA1 Genome

The genome of phage SpaA1 consists of 42,784 bp flanked by complementary 9-bp single stranded cohesive (cos) ends (5′-…TGGAGGAGG -3′ and 3′-CCTCCTCCA…-5′). Using GeneMark.hmm [32], 63 open reading frames (ORFs) were identified as probable protein-coding genes. The predicted proteins encoded by these 63 ORFs were compared to the non-redundant protein sequence database (National Center for Biotechnology Information, NIH, Bethesda) using PSI-BLAST [33] and the Conserved Domain Database using RPS-BLAST [34]. Analysis of the most similar proteins (best hits) for all predicted gene products of SpaA1 reveals three major regions of apparent different origins suggesting a modular architecture of the genome (Figure 2; Table 1).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Architectures of SpaA1, BceA1 and MZTP02 genomes: comparison with BLAST protein matches to phage proteins in four Bacillus genomes. The horizontal bars represent DNA sequences (all to scale) with annotated CDS on the forward (upper) or reverse (lower) strand shown as pointed boxes, generally in alternating blue and purple. The red, green and yellow shading indicates the three functional modules of phages SpaA1 and BceA1 (center) which are 100% identical except for the area around ORF47 (bright red), and the 99% nucleotide identical matching region in module I with phage MZTP02 (second row from top). Rather than the original annotation for MZTP02, annotation based on SpaA1/BceA1 genome analysis (Table 1) is shown, with grey colouring for partial sequences (1 and 19), and genes with frame shifts (12, 13, 17, 18). The bottom three bars represent complete contigs from three separate Bacillus genomes, with red/yellow highlighting top BLAST matches from SpaA1/BceA1 module I and III proteins, showing synteny visually. The top row of bars represents seven contigs from another Bacillus draft genome with green highlighting for BLAST protein matches from SpaA1/BceA1 module II proteins. Three of these contigs have been truncated for display. For clarity, additional BLAST matches to other contigs from these bacterial genomes are not shown (e.g. SpaA1/BceA1 ORF37 matches another contig in B. thuringiensis var. monterrey BGSC A4J1). This figure was drawn using GenomeDiagram [61] and Biopython [62]. https://doi.org/10.1371/journal.pone.0040683.g002

The nucleotide sequence of the first module (left and coloured red in Figure 2) of the SpaA1 genome is almost identical to the sequence of the entire 15,717 bp genome of another bacteriophage, MZTP02 (apart from its 5′ - and 3′- terminal regions of 41 bp and ∼370 bp long, respectively) that was isolated from Bacillus thuringiensis, strain MZ1 in China [31] (Figure 2). Unlike SpaA1 DNA which contains terminal cos ends, MZTP02 DNA contains 40-bp terminal inverted repeats and its 5′-terminus is covalently bound to a terminal protein presumably encoded by ORF9 (according to our annotation; [31]). Interestingly, an almost identical sequence is present as a prophage in the genome of B. thuringiensis var. monterrey BGSC 4AJ1 (locus IDs: bthur0007_34460 to bthur0007_34660, accession no. NZ_CM000752.1) and B. cereus Rock4-2 (locus IDs: bcere0023_35280 to bcere0023_35430, accession no. NZ_ACMM01000283.1). The 19 potential ORFs located in this region encode predicted structural proteins and proteins involved in assembly of SpaA1 and thus form the “structural” module of the genome. The architecture of this module in SpaA1 shows features that are typical of other bacteriophages of the family Siphoviridae. In particular, there is clear synteny among genes encoding virion subunits and proteins involved in virion assembly [29]. The genes for head and tail assembly are encoded in the same transcriptional orientation, with the head genes located upstream of the tail genes (Figure 2 and Table 1). The predicted head genes include the large and small terminase subunits (ORF3 and ORF4, respectively), the portal protein (ORF5), the minor capsid subunit (ORF6), the scaffold protein (ORF8), gp-like tail connector (ORF1) and head-tail adapter (ORF11); the tail genes include the major tail subunit (ORF12) and the tape measure protein (ORF17), followed by the tail fiber protein (ORF18) and the minor tail protein (ORF19) (Table 1). The length of the tape measure protein gene corresponds to the length of the phage tail and is thus commonly the largest gene in the genome [29]. In SpaA1, however, the tape measure protein (979 aa) is only the second largest protein, the largest being the minor tail structural protein (1569 aa). Bacillus phage TP21-L also has a minor structural protein that is larger than the tape measure protein [35]. For most of the known phages, the size of the tape measure protein corresponds to a fairly constant 0.15 nm of tail length per amino acid residue [36]. However, the tail length-to-amino acid ratio for SpaA1 is ∼0.20 nm per amino acid residue, suggesting that this protein might be somewhat more extended than those in other known phages.

The gene arrangement in the second SpaA1 genome module (coloured green in Figure 2), which consists of genes with functions in DNA integration, replication, transcription, cell entry and exit (ORF20–ORF46), and may be denoted the ‘replication module’, is very similar to the organization of the corresponding regions in several prophages of B. thuringiensis Kurstaki strain (Figure 2, Table 1). The longest conserved gene array (locus_ID: bthur0006_5910 to bthur0006_6000; accession no. NZ_CM000751.1) contains the first 10 ORFs in this region. In particular, the replication module encompasses five predicted transcriptional regulators (ORFs 25, 33–35 and 45) and four putative DNA-binding proteins (ORFs 24, 28, 31, and 46). Other ORFs related to replication in this module include ones encoding a FtsK/SpoIIIE- like protein (ORF27), and three proteins containing HTH and DnaB domains (ORF29), a DnaD domain (ORF41) and a predicted ATPase related to DnaC (ORF42). The module also encodes an antirepressor (ORF37), two proteins involved in cell lysis (ORFs 22 and 23) and two integrases, ORF20 which shows 95% amino acid sequence identity with the integrase of prophage lamdaBa02 (accession number EEM54966.1), and ORF30 which shows 80% amino acid sequence identity with an integrase from B. thuringiensis (accession number EAO53934.1).

The third genomic module (coloured yellow in Figure 2) of SpaA1 is similar to a portion of B.cereus AH676 prophage and contains additional regulatory and recombination related genes including a potential recombination protein U (ORF53) and a potential DNA-binding protein (ORF54). ORFs 55 and 56 are similar to the N-terminal and C-terminal parts of an RNA polymerase sigma 70 factor, respectively. The last nucleotide of the TAA termination codon of ORF55 is also the first nucleotide of the ATG initiation codon of ORF56 within a TAATG sequence. However, the reading frame of ORF56 extends 5′ without an initiation codon to nucleotide 39374 in SpaA1, and a -1 frameshift in the region of nucleotides 39385–39390 during translation of ORF55 could result in a single protein of 206 amino acids which is similar to an intact RNA polymerase sigma factor from B. cereus (accession number ACM16007.1). Interestingly, approximately 70% of dsDNA long-tailed phages including siphoviruses exploit the programmed frameshift mechanism for gene expression and the majority of frameshift candidates appear to use a -1 frameshift [37]. However, no canonical -1 frameshift signal has been detected by KnotInFrame, a tool for the prediction of ribosomal frameshift events [38]. Alternatively, ORF55 and ORF56 might encode two distinct proteins possibly forming a two-subunit complex. ORF40 of SpaA1 encodes a second RNA polymerase sigma 70 factor that is not closely related to the ORF55/56 sigma factor and is most similar to a homolog from B. thuringiensis (accession number EEM99580.1). The longest region of synteny conservation between SpaA1 and AH676 contains 6 ORFs (locus_ID: bcere0027_53380 to bcere0027_53450; accession no. NZ_CM000738.1).

Phage terminase genes can be used to construct phylogenetic trees which correlate with the structure of the phage DNA termini [39]. However, we have detected evidence of recombination in the MZTP02 region that encompasses at least the gene for the large terminase subunit of SpaA1. The majority of the ORFs within the ORF1-ORF18 region (the MZTP02sequence) show best hits into several Bacilli genomes (Figure 3A), and the tree for phage portal protein SPP1, taken as a typical example, clearly demonstrates clustering with sequences from these organisms (Figure 3B). In contrast, the tree for ORF4, the large subunit of phage terminase, shows very different topology (Figure 3C), suggesting that notwithstanding the synteny in this region (Figure 2), ORF4 appears to have been acquired from a different, unknown source. The topology of the tree for ORF3, the small subunit of phage terminase, was compatible with the typical, SPP1-like topology (Figure 3B and 3D). Thus, the large subunit gene apparently was displaced via ‘in situ’ recombination [40], an observation that further emphasizes the mosaicism in the phage genomes.

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Phylogenetic analysis of selected SpaA1 genes. A. Bacterial and phage genomes sorted by the number of ORFs matching the SpaA1/MZTP02 region (based on the up to 200 best hits in NR database). On the left, the actual number of hits is indicated. Color code: three bacterial genomes with the 17-15 ORFs matching the SpaA1/MZTP02 region:purple; three bacterial genomes with the 13-12 matching ORFs: light blue; the phage with the largest number of hits matching the SpaA1/MZTP02 region:orange. B, C, D. Unrooted maximum likelihood trees for three ORFs the SpaA1/MZTP02 region. Each terminal tree node is labelled with GenBank Identifier (GI) number and full systematic name of an organism. Color code is the same as in the Figure 3A. The SpaA1 phage sequences are shown in red. Bootstrap support (percentage) are indicated for selected internal branches. https://doi.org/10.1371/journal.pone.0040683.g003

Neither the second nor the third genomic modules of SpaA1 completely match any known prophages or phages. Even with the most closely related phages, such as Cherry [41], EJ [42], phBC6A51 [43] and the deep-sea thermophilic phage D6E, [44] there are only a few significantly similar predicted proteins (Figure 3A and Table 1) indicating that SpaA1 represents a novel group of tailed phages.

The overall G + C content of the phage is 35.63% strongly resembling its host S. pasteuri (35%, [45]) as well as the host for MZTP02 (B. thuringiensis, 35.3%, [46]). No significant differences in the GC content were detected among the three genomic modules of SpaA1.