Open reading frames (ORF) for the glycoprotein precursor GPC, the nucleoprotein NP, the matrix protein analog Z, and the polymerase L, and their orientation are indicated (A); blue bars represent sequences obtained by pyrosequencing from clinical samples. Secondary structure predictions of intergenic regions (IR) for S (B, C) and L segment sequence (D, E) in genomic (B, D) and antigenomic orientation (C, E) were analyzed by mfold; shading indicates the respective termination codon (opal, position 1), and its reverse-complement, respectively.

RNA extracts from two post-mortem liver biopsies (cases 2 and 3) and one serum sample (case 2) were independently submitted for unbiased high-throughput pyrosequencing. The libraries yielded between 87,500 and 106,500 sequence reads. Alignment of unique singleton and assembled contiguous sequences to the GenBank database ( http://www.ncbi.nlm.nih.gov/Genbank ) using the Basic Local Alignment Search Tool (blastn and blastx; [29] ) indicated coverage of approximately 5.6 kilobases (kb) of sequence distributed along arenavirus genome scaffolds: 2 kb of S segment sequence in two fragments, and 3.6 kb of L segment sequence in 7 fragments ( Figure 2 ). The majority of arenavirus sequences were obtained from serum rather than tissue, potentially reflecting lower levels of competing cellular RNA in random amplification reactions.

Our data represent genome sequences directly obtained from liver biopsy and serum (case 2), and from cell culture isolates obtained from blood at CDC (case 1 and 2), and from liver biopsies at NICD (case 2 and 3). No sequence differences were uncovered between virus detected in primary clinical material and virus isolated in cell culture at the two facilities. In addition, no changes were detected between each of the viruses derived from these first three cases. This lack of sequence variation is consistent with the epidemiologic data, indicating an initial natural exposure of the index case, followed by a chain of nosocomial transmission among subsequent cases.

Sequence gaps between the aligned fragments were rapidly filled by specific PCR amplification with primers designed on the pyrosequence data at both, CU and CDC. Terminal sequences were added by PCR using a universal arenavirus primer, targeting the conserved viral termini (5′-CGC ACM GDG GAT CCT AGG C, modified from [30] ) combined with 4 specific primers positioned near the ends of the 2 genome segments. Overlapping primer sets based on the draft genome were synthesized to facilitate sequence validation by conventional dideoxy sequencing. The accumulated data revealed a classical arenavirus genome structure with a bi-segmented genome encoding in an ambisense strategy two open reading frames (ORF) separated by an intergenic stem-loop region on each segment ( Figure 2 ) (GenBank Accession numbers FJ952384 and FJ952385).

Phylogenetic relationships of LUJV were inferred based on full L (A) and S segment nucleotide sequence (B), as well as on deduced amino acid sequences of L (C), NP (D), Signal/G2 (E) and G1 (F) ORF's. Phylogenies were reconstructed by neighbor-joining analysis applying a Jukes-Cantor model; the scale bar indicates substitutions per site; robust boostrap support for the positioning of LUJV was obtained in all cases (>98% of 1000 pseudoreplicates). GenBank Accession numbers for reference sequences are: ALLV CLHP2472 (AY216502, AY012687); AMAV BeAn70563 (AF512834); BCNV AVA0070039 (AY924390, AY922491), A0060209 (AY216503); CATV AVA0400135 (DQ865244), AVA0400212 (DQ865245); CHPV 810419 (EU, 260464, EU260463); CPXV BeAn119303 (AY216519, AF512832); DANV 0710-2678 (EU136039, EU136038); FLEV BeAn293022 (EU627611, AF512831); GTOV INH-95551 (AY358024, AF485258), CVH-960101 (AY497548); IPPYV DakAnB188d (DQ328878, DQ328877); JUNV MC2 (AY216507, D10072), XJ13 (AY358022, AY358023), CbalV4454 (DQ272266); LASV LP (AF181853), 803213 (AF181854), Weller (AY628206), AV (AY179171, AF246121), Z148 (AY628204, AY628205), Josiah (U73034, J043204), NL (AY179172, AY179173); LATV MARU10924 (EU627612, AF485259); LCMV Armstrong (AY847351), ARM53b (M20869), WE (AF004519, M22138), Marseille12 (DQ286932, DQ286931), M1 (AB261991); MACV Carvallo (AY619642, AY619643), Chicava (AY624354, AY624355), Mallele (AY619644, AY619645), MARU222688 (AY922407), 9530537 (AY571959); MOBV ACAR3080MRC5P2 (DQ328876, AY342390); MOPV AN20410 (AY772169, AY772170), Mozambique (DQ328875, DQ328874); NAAV AVD1240007 (EU123329); OLVV 3229-1 (AY216514, U34248); PARV 12056 (EU627613, AF485261); PICV (K02734), MunchiqueCoAn4763 (EF529745, EF529744), AN3739 (AF427517); PIRV VAV-488 (AY216505, AF277659); SABV SPH114202 (AY358026, U41071); SKTV AVD1000090 (EU123328); TAMV W10777 (EU627614, AF512828); TCRV (J04340, M20304); WWAV AV9310135 (AY924395, AF228063).

Phylogenetic trees constructed from full L or S segment nucleotide sequence show LUJV branching off the root of the OW arenaviruses, and suggest it represents a highly novel genetic lineage, very distinct from previously characterized virus species and clearly separate from the LCMV lineage ( Figure 3A and 3B ). No evidence of genome segment reassortment is found, given the identical placement of LUJV relative to the other OW arenaviruses based on S and L segment nucleotide sequences. In addition, phylogenetic analysis of each of the individual ORFs reveals similar phylogenetic tree topologies. A phylogenetic tree constructed from deduced L-polymerase amino acid (aa) sequence also shows LUJV near the root of the OW arenaviruses, distinct from characterized species, and separate from the LCMV branch ( Figure 3C ). A distant relationship to OW arenaviruses may also be inferred from the analysis of Z protein sequence ( Figure S1 ). The NP gene sequence of LUJV differs from other arenaviruses from 36% (IPPYV) to 43% (TAMV) at the nucleotide level, and from 41% (MOBV/LASV) to 55% (TAMV) at the aa level ( Table S1 ). This degree of divergence is considerably higher than both, proposed cut-off values within (<10–12%), or between (>21.5%) OW arenavirus species [31] , [32] , and indicates a unique phylogenitic position for LUJV ( Figure 3D ). Historically, phylogenetic assignments of arenaviruses have been based on portions of the NP gene [1] , [33] , because this is the region for which most sequences are known. However, as more genomic sequences have become available, analyses of full-length GPC sequence have revealed evidence of possible relationships between OW and NW arenaviruses not revealed by NP sequence alone [34] . Because G1 sequences are difficult to align some have pursued phylogenetic analyses by combining the GPC signal peptide and the G2 sequence for phylogenetic analysis [16] . We included in our analysis the chimeric signal/G2 sequence ( Figure 3E ) as well as the receptor binding G1 portion ( Figure 3F ); both analyses highlighted the novelty of LUJV, showing an almost similar distance from OW as from NW viruses.

Protein motifs potentially relevant to LUJV biology

Canonical polymerase domains pre-A, A, B, C, D, and E [35]–[37] are well conserved in the L ORF of LUJV (256 kDa, pI = 6.4; Figure 4). The Z ORF (10.5 kDa, pI = 9.3) contains two late domain motifs like LASV; however, in place of the PTAP motif found in LASV, that mediates recognition of the tumor susceptibility gene 101, Tsg101 [38], involved in vacuolar protein sorting [39],[40], LUJV has a unique Y 77 REL motif that matches the YXXL motif of the retrovirus equine infectious anemia virus [41], which interacts with the clathrin adaptor protein 2 (AP2) complex [42]. A Tsg101-interacting motif, P 90 SAP, is found in LUJV in position of the second late domain of LASV, PPPY, which acts as a Nedd4-like ubiquitin ligase recognition motif [43]. The RING motif, containing conserved residue W 44 [44], and the conserved myristoylation site G 2 are present [45]–[47] (Figure 4). The NP of LUJV (63.1 kDa, pI = 9.0) contains described aa motifs that resemble mostly OW arenaviruses [48], including a cytotoxic T-lymphocyte (CTL) epitope reported in LCMV (GVYMGNL; [49]), corresponding to G 122 VYRGNL in LUJV, and a potential antigenic site reported in the N-terminal portion of LASV NP (RKSKRND; [50]), corresponding to R 55 KDKRND in LUJV (Figure 4).

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. Schematic of conserved protein motifs. Conservation of LUJV amino acid motifs with respect to all other (green highlight), to OW (yellow highlight), or to NW (blue highlight) arenaviruses is indicated; grey highlight indicates features unique to LUJV. Polymerase motifs pre-A (L 1142 ), A (N 1209 ), B (M 1313 ), C (L 1345 ), D (Q 1386 ), and E (C 1398 ) are indicated for the L ORF; potential myristoylation site G 2 , the RING motif H 34 /C 76 , and potential late domains YXXL an PSAP are indicated for the Z ORF; and myristoylation site G 2 , posttranslational processing sites for signalase (S 59 /S 60 ) and S1P cleavage (RKLM 221 ), CTL epitope (I 32 ), zinc finger motif P 415 /G 440 , as well as conserved cysteine residues and glycosylations sites (Y) are indicated for GPC. * late domain absent in NW viruses and DANV; † PSAP or PTAP in NW viruses, except in PIRV and TCRV (OW viruses: PPPY); # G in all viruses except LCMV ( = A); ‡ D in NW clade A only; § conserved with respect to OW, and NW clade A and C; HD, hydrophobic domain; TM, transmembrane anchor. https://doi.org/10.1371/journal.ppat.1000455.g004

The GPC precursor (52.3 kDa, pI = 9.0) is cotranslationally cleaved into the long, stable signal peptide and the mature glycoproteins G1 and G2 [51]–[54]. Based on analogy to LASV [55] and LCMV [56], signalase would be predicted to cleave between D 58 and S 59 in LUJV. However, aspartate and arginine residues in the −1 and −3 positions, respectively, violate the (−3,−1)-rule [57]; thus, cleavage may occur between S 59 and S 60 as predicted by the SignalP algorithm. The putative 59 aa signal peptide of LUJV displays a conserved G 2 , implicated in myristoylation in JUNV [58], however, it is followed in LUJV by a non-standard valine residue in position +4, resembling non-standard glycine residues found in Oliveros virus (OLVV [59]) and Latino virus (LATV; http://www2.ncid.cdc.gov/arbocat/catalog-listing.asp?VirusID=263&SI=1). Conservation is also observed for aa residues P 12 (except Amapari virus; AMAV [60]), E 17 [61](except Pirital virus; PIRV [62]), and N 20 in hydrophobic domain 1, as well as I 32 KGVFNLYK 40 SG, identified as a CTL epitope in LCMV WE (I 32 KAVYNFATCG; [63]) (Figure 4).

Analogous to other arenaviruses, SKI-1/S1P cleavage C-terminal of RKLM 221 is predicted to separate mature G1 (162 aa, 18.9 kDa, pI = 6.4) from G2 (233 aa, 26.8 kDa, pI = 9.5) [52],[53],[64]. G2 appears overall well conserved, including the strictly conserved cysteine residues: 6 in the luminal domain, and 3 in the cytoplasmic tail that are included in a conserved zinc finger motif reported in JUNV [65] (Figure 4). G2 contains 6 potential glycosylation sites, including 2 strictly conserved sites, 2 semi-conserved sites N 335 (absent in LCMVs and Dandenong virus; DANV [19]) and N 352 (absent in LATV), and 2 unique sites in the predicted cytoplasmic tail (Figure 4). G1 is poorly conserved among arenaviruses [16], and G1 of LUJV is no exception, being highly divergent from the G1 of the other arenaviruses, and shorter than that of other arenaviruses. LUJV G1 contains 6 potential glycosylation sites in positions comparable to other arenaviruses, including a conserved site N 93 HS (Figure 4), which is shifted by one aa in a motif that otherwise aligns well with OW arenaviruses and NW arenavirus clade A and C viruses. There is no discernable homology to other arenavirus G1 sequences that would point to usage of one of the two identified arenavirus receptors; Alpha-dystroglycan (α-DG) [66] that binds OW arenaviruses LASV and LCMV, and NW clade C viruses OLVV and LATV [67], or transferrin receptor 1 (TfR1) that binds pathogenic NW arenaviruses JUNV, MACV, GTOV, and SABV [68] (Figure S2).

In summary, our analysis of the LUJV genome shows a novel virus that is only distantly related to known arenaviruses. Sequence divergence is evident across the whole genome, but is most pronounced in the G1 protein encoded by the S segment, a region implicated in receptor interactions. Reassortment of S and L segments leading to changes in pathogenicity has been described in cultured cells infected with different LCMV strains [69], and between pathogenic LASV and nonpathogenic MOPV [70]. We find no evidence to support reassortment of the LUJV L or S genome segment (Figure 3A and 3B). Recombination of glycoprotein sequence has been recognized in NW arenaviruses [14], [16], [33], [34], [71]–[73], resulting in the division of the complex into four sublineages: lineages A, B, C, and an A/recombinant lineage that forms a branch of lineage A when NP and L sequence is considered (see Figure 3C and 3D), but forms an independent branch in between lineages B and C when glycoprotein sequence is considered (see Figure 3D). While recombination cannot be excluded in case of LUJV, our review of existing databases reveals no candidate donor for the divergent GPC sequence. To our knowledge is LUJV the first hemorrhagic fever-associated arenavirus from Africa identified in the past 3 decades. It is also the first such virus originating south of the equator (Figure 1). The International Committee on the Taxonomy of Viruses (ICTV) defines species within the Arenavirus genus based on association with a specific host, geographic distribution, potential to cause human disease, antigenic cross reactivity, and protein sequence similarity to other species. By these criteria, given the novelty of its presence in southern Africa, capacity to cause hemorrhagic fever, and its genetic distinction, LUJV appears to be a new species.