The 2013–2015 Ebola virus disease (EVD) epidemic is caused by the Makona variant of Ebola virus (EBOV). Early in the epidemic, genome sequencing provided insights into virus evolution and transmission and offered important information for outbreak response. Here, we analyze sequences from 232 patients sampled over 7 months in Sierra Leone, along with 86 previously released genomes from earlier in the epidemic. We confirm sustained human-to-human transmission within Sierra Leone and find no evidence for import or export of EBOV across national borders after its initial introduction. Using high-depth replicate sequencing, we observe both host-to-host transmission and recurrent emergence of intrahost genetic variants. We trace the increasing impact of purifying selection in suppressing the accumulation of nonsynonymous mutations over time. Finally, we note changes in the mucin-like domain of EBOV glycoprotein that merit further investigation. These findings clarify the movement of EBOV within the region and describe viral evolution during prolonged human-to-human transmission.

Here, we provide an analysis of 232 new, coding-complete EBOV Makona genomes from Sierra Leone. We compared these genomes to 86 previously available genomes: 78 unique genomes from Sierra Leone (), 3 genomes from Guinea (), and 5 from healthcare workers infected in Sierra Leone and treated in Europe. We use this combined data set obtained from 318 EVD patients during the height of the epidemic in Sierra Leone and Guinea to better understand EBOV transmission within Sierra Leone and between countries. In addition, we use it to understand viral population dynamics within individual hosts, the impact of natural selection, and the characteristics of the now hundreds of new mutations that have emerged over the longer course of the epidemic.

While the insights gleaned from sequencing early in the outbreak informed public health efforts (), the continued human-to-human spread of the virus raises questions about ongoing evolution and transmission of EBOV. Our laboratory teams in Sierra Leone, at Kenema (Kenema Government Hospital [KGH]) and at Bo (US Centers for Disease Control and Prevention [CDC]), continued to perform active diagnosis and surveillance in Sierra Leone following our initial study (). After a 6-month delay of sample shipment due to regulatory uncertainty about inactivation protocols, we again began to determine EBOV genome sequences. We have sequenced samples at high depth and with technical replicates to characterize genetic diversity of EBOV both within (intrahost) and between (interhost) individuals. To support global outbreak termination efforts, we publicly released these genomes prior to publication as they were generated, starting with a first set of 45 sequences in December 2014 and continuing with regular releases of hundreds of sequences through May 2015.

Published EBOV Makona genomes from clinical samples obtained early in the outbreak in Guinea (three patients) and Sierra Leone (78 patients) () demonstrated that near-real-time sequencing could provide valuable information to researchers involved in the global outbreak response. Analysis of these genomes revealed that the outbreak likely originated from a single introduction into the human population in Guinea at the end of 2013 and was then sustained exclusively by human-to-human transmissions. Genomic sequencing further allowed the identification of numerous mutations emerging in the EBOV Makona genome over time. As a consequence, the evolutionary rate of the Makona variant over the time span of the early phase of the outbreak could be estimated and predictions made about the potential of this new EBOV variant to escape current candidate vaccines, therapeutics, and diagnostics ().

The 2013–2015 Western African Ebola virus disease (EVD) epidemic, caused by the Ebola virus (EBOV) Makona variant (), is the largest EVD outbreak to date, with 26,648 cases and 11,017 deaths documented as of May 8, 2015 (). The outbreak, first declared in March 2014 in Guinea and traced back to the end of 2013 (), has also devastated the neighboring countries of Sierra Leone and Liberia, with additional cases scattered across the globe. Never before has an EBOV variant been transmitted among humans for such a sustained period of time.

Nomenclature- and database-compatible names for the two Ebola virus variants that emerged in Guinea and the Democratic Republic of the Congo in 2014.

Similar patterns of excess T-to-C mutations within short regions were also observed by. In our data set of 318 genomes, five possessed obvious stretches of T-to-C mutations within short regions. We also tested more broadly whether excessive T-to-C mutations occurred in all sequences and found a significant enrichment of T-to-C transitions relative to all other types of transitions ( Figure 4 D). To determine whether viral sequence divergence is related to T-to-C transition enrichment, we compared relative T-to-C transition rates in sequences with stretches of T-to-C mutations (n = 5) to the top 5% of remaining sequences by sequence divergence (n = 15) and to the bottom 95% of sequences (n = 298) ( Figure 4 E). While the sequences with T-to-C stretches showed the strongest T-to-C enrichment, we found moderate enrichment of T-to-C transitions in the 5% most divergent sequences.

Visual inspection identified a subset of sequences that are more likely to contain B cell escape variants ( Figure 4 C). In particular, three sequences (e.g., G4955.1) had a threonine-to-alanine mutation at GP amino acid position 485, a conserved threonine that is required for in vivo protection by the 14G7 antibody (). Additionally, two sequences had short stretches of T-to-C mutations in GP (four or more T-to-C mutations within a 200 nucleotide region; Figure 4 C), both of which occur within B cell epitopes.

To test the hypothesis that antibodies drive diversifying selection of GP, we looked for enrichment of mutations within B cell epitopes within that protein. Effective humoral immunity depends on antibody binding to specific B cell epitopes (). Using experimentally determined B cell epitopes obtained from the Virus Pathogen Database and Analysis Resource (ViPR;), we found that nonsynonymous mutations in GP do indeed occur more frequently in epitopes than expected by chance ( Figure 4 B). This correlation supports the hypothesis that humoral immunity exerts selective pressure on the virus, driving immune evasion via accumulation of nonsynonymous mutations within GP B cell epitopes.

Although we observe less constraint on nonsynonymous changes during the 2013–2015 epidemic than between outbreaks, one anomaly is the genomic sequence encoding the mucin-like domain of the EBOV glycoprotein (GP), for which we observe more nonsynonymous substitutions than expected under neutrality, both within and between EVD outbreaks. Selective pressure acting on a region can be estimated with the standard statistic d/d, which has an expected value of 1.0 for neutral evolution and less than 1 for purifying selection; in the mucin-like domain, the mean posterior d/dwithin this outbreak is 4.74, and between outbreaks is 1.44 ( Figure 4 A). GP is the only surface-exposed viral protein on EBOV virions, and as such, it is the primary target of antibodies (). This finding therefore raises the possibility that antibodies might be driving diversifying selection and rapid evolution in this region. This observation is based on a very small number of substitutions (eight nonsynonymous and four synonymous within the outbreak), however, and is not statistically significant (posterior probability that d/dis elevated within-outbreak = 92.9%); the situation should be clarified as more sequencing becomes available. If diversifying selection is occurring here, then the observed changes are very unlikely to represent population-level selection for transmission among humans; this would only occur if previously infected individuals were frequently being exposed to new infections. Instead, we hypothesize that these changes represent within-host selection for EBOV to escape a developing humoral immune response.

The relationship between the effectiveness of purifying selection and its duration is also apparent in the overall pattern of nonsynonymous mutations in our data set. Selection filters the accumulation of coding variants in the EBOV genome ( Figures 3 C and 4 A ). Nonsynonymous mutations, which are more likely to be deleterious, make up a decreasing fraction of coding mutations as we analyze longer timescales: intrahost variants > individual patients (external branches) > multiple patients (internal branches) > between outbreaks. The fraction seen between outbreaks represents the effect of long periods of evolution in the unknown EBOV reservoir. As selection acts to remove deleterious alleles over time, fewer nonsynonymous mutations can be detected. This pattern holds true across the EBOV Makona genome ( Figure 4 A).

(E and F) Elevated T-to-C rates are genome wide but are limited to a subset of sequences. Accumulation of mutation increases linearly with time. However, some individual samples show more genetic distance than expected based on sample date. Samples with short stretches of T-to-C mutations (orange) show a significant enrichment of T-to-C mutations, as expected. Excluding these samples, the top 5% of samples by genetic distance (yellow) lack localized stretches but still show moderate enrichment of T-to-C mutations genome wide. The bottom 95% of samples (beige) show no enrichment of T-to-C mutations. Error bars represent binomial sampling intervals.

(D) Genome-wide increase in T-to-C mutations. We observe more T-to-C transitions within the 2013–2015 outbreak than any other transition, after correcting for nucleotide content. Error bars represent binomial sampling intervals.

(C) Local enrichment of T-to-C mutations within GP B cell epitopes. We observed five sequences with short stretches (<200 nucleotides) of concentrated T-to-C mutations. Of these five sequences, two (shown here, samples 20141582 and G5119.1) contain stretches of T-to-C SNPs (blue points) within GP epitopes (light blue bars). Additionally, we observe a T-to-C mutation at amino acid position 485 (blue diamond) in three samples (one shown here, G4955.1), which is otherwise completely conserved among members of all ebolavirus species ().

(B) Nonsynonymous variants are enriched in B cell epitopes of GP. We calculated the fractions of nonsynonymous (NS) and synonymous (S) consensus SNPs and intrahost variants (iSNVs) within experimentally determined B cell epitopes (data from ViPR;). Dotted line represents the fraction of GP amino acids in ViPR epitopes. Nonsynonymous SNPs (p = 0.004) and iSNVs (p = 0.037) in GP occur more frequently in epitopes than expected by chance (two-sided exact binomial test). Numbers indicate fraction of each variant type within GP epitope regions. Error bars represent binomial sampling intervals.

(A) Nonsynonymous variants are enriched in the mucin-like domain of GP. Estimates of log(ω) (a.k.a., log(d N /d S )) per coding sequence within the Western African EVD outbreak (left) and between EVD outbreaks (right) demonstrate gene-specific patterns of natural selection.

How purifying selection acts at different timescales can also be seen in the distribution of mutations in the EBOV Makona genealogy. Deleterious mutations are more likely to result in transmission-impaired viruses and dead-end infections and may therefore only be present in individual patients. Mutations unique to individual patients are those that occur on the external branches of the phylogenetic tree, whereas internal branch mutations are those present in multiple samples in our data set. Thus, in the model of incomplete purifying selection, we expect external branches to be characterized by a higher rate of nonsynonymous substitution than internal branches; in the latter, selection has had more opportunity to filter out deleterious mutants. Internal branches, by definition, have produced multiple descendent lineages and are thus less likely to include mutations with fitness costs. To test this hypothesis, we estimated the numbers of nonsynonymous and synonymous changes on the virus genealogy and recovered their accumulation rates ( Figure 3 B). Nonsynonymous mutations indeed occurred at lower frequency on internal than on external branches, suggesting that most are removed by purifying selection because of their fitness costs and hence represent evolutionary dead ends. Synonymous mutations, which likely have less impact on fitness, occurred at more comparable frequencies on internal and external branches.

We previously reported that new mutations accumulated more rapidly in the viral population early in the outbreak than over the long-term in the reservoir (). We hypothesized then that the higher rate early in the outbreak resulted from incomplete purifying selection—that is, we were detecting transient nonsynonymous variants that would later be removed by purifying selection (). The observed evolutionary rate is thus not an estimate of the underlying mutation rate since some deleterious mutations are purged by selection before they can be detected. But neither is it an estimate of the long-term substitution rate since other deleterious mutations have not been eliminated by selection at the time of analysis. We hypothesized that the EBOV Makona evolutionary rate would decline following the addition of genomes covering a longer evolutionary timescale. Such a decline is well characterized in members of other species (). With the present data set, we were able to examine the evolution of the virus over a longer time period. We found that the most probable estimated evolutionary rate of EBOV Makona is indeed markedly lower (mean posterior rate = 1.25 × 10substitutions per site per year) and is closer to the long-term rate than to the rate estimated early in the outbreak ( Figures 3 A and S4 ).

(C) Enrichment for nonsynonymous mutations at shorter timescales. Intrahost (all variants that appear within a single host at less than 100% frequency); unique interhost (SNPs fixed in exactly one individual); shared interhost (SNPs fixed in two or more individuals); shared between EVD outbreaks (internal branch SNPs on a between-outbreak tree).

(B) Purifying selection. We estimated nonsynonymous (red) and synonymous (blue) substitution rates on external (unique to an isolate, potential dead end) and internal (shared by multiple isolates, evidence of human-to-human transmission) branches. Nonsynonymous mutations accumulate faster on external branches than on internal branches. For synonymous mutations, the difference between external and internal branches is less pronounced.

In summary, we conclude that a combination of human-to-human transmission and recurrent mutations is likely responsible for the iSNV pattern observed in Figure 2 A. This hypothesis is supported by the iSNV at position 18,911: samples containing this variant often cluster on the phylogenetic tree ( Figure 2 B), although more isolated samples may represent separate mutation events. More generally, pairs of samples that share an iSNV are typically located near one another phylogenetically; these pairs are separated by an average of 0.16 years of evolution, whereas random pairs are separated by an average of 0.30 years (p < 10, randomization test). These results suggest transmission of iSNVs in at least some cases and therefore suggest that the transmission bottleneck is wide enough to facilitate the transmission of low- or intermediate-frequency variants between hosts.

The remaining possible sources for persistently shared iSNVs are co-transmission and recurrent mutation. In either case, the iSNV could be maintained by balancing selection or could be evolving neutrally. Figure 2 A suggests that selection is not the primary cause of persistence, since synonymous and nonsynonymous variants are equally common among the shared iSNVs, and selective pressures are likely to be different for the two classes of variant. All shared iSNVs are unlikely to be simply the product of recurring mutation: if they were, they should have a frequency spectrum heavily weighted toward low frequency, characteristic of new mutations. However, that is not the case. For example, the variant at position 18,911 is found at >15% frequency in eight different samples ( Figure S3 C), a much higher frequency than expected if the change represented a de novo mutation in each sample.

We can rule out superinfection and contamination as primary explanations for the iSNVs in our data because none of the iSNVs are located at common SNP positions. For example, a SNP at position 14,019 is at intermediate frequency in the population (found in ∼40% of samples we sequenced) and defines the SL4 lineage ( Figure 1 A). If superinfection were common among EVD patients, we would expect to sometimes see both SL3 and SL4 viruses in the same patient, which would appear as an iSNV at that position. Contamination would result in a similar pattern, with intermediate-frequency SNPs appearing as iSNVs in contaminated samples. Additionally, contamination would be most visible in low-coverage, low-RNA-content samples because contaminants would make up more of the RNA available for sequencing, whereas samples with extremely high coverage would be the most visible contaminants ( Figure S3 B). The highest coverage sample (G4960.1) contains genomes belonging to lineage SL3 only and lacks the SL4 SNP, so if there were widespread contamination, we would see a low-frequency iSNV at position 14,019 in SL4 samples with iSNVs. Since SL3 and SL4 samples were processed together (eight of nine sequencing batches contained multiple samples from both lineages) and we saw no instances of an iSNV at that position, we conclude that superinfection and contamination are not important contributors to iSNVs.

Intrahost variants (iSNVs) that appear during the course of the epidemic may provide valuable information about human-to-human transmission. In particular, shared iSNVs have been used to estimate the relative size of the transmission bottleneck () and to identify human-to-human transmission chains (). In the current data set, which includes 85 samples with at least one iSNV ( Figure S3 A), several iSNVs are shared among two or more patients, often spanning several months of the EVD epidemic ( Figure 2 A). The existence of shared iSNVs could be explained by patient infection from multiple sources (superinfection), sample contamination, recurring mutations (with or without balancing selection to reinforce mutations), or co-transmission of slightly diverged viruses that arose by mutation earlier in the transmission chain.

(B) Phylogenetic placement of derived alleles at genomic position 18,911 implies both repeated transmission within clades as well as some amount of recurrent mutation. Colored tips are sized according to frequency of iSNV at position 18,911. Tips with small black points are those with iSNV calls at any position; other tips represent samples with no iSNV calls. This figure shows only the portion of the tree relevant for this analysis; large branches with no SNPs or iSNVs at position 18,911 are not shown.

(A) Certain intrahost variants (iSNVs) appear in samples throughout the 2013–2015 EVD epidemic, suggesting that iSNVs can be transmitted between patients. Variants shared between two or more samples are shown as rows of connected points; each row is a genomic position (ordered by position along the genome, top to bottom), and each point indicates the presence of the iSNV in a patient.

(A) Distribution of the number of iSNVs per sample. Replicate sequencing and iSNV calling was completed for 150 samples, of which 65 had no iSNV calls. Mean iSNVs per sample (including samples without iSNVs) = 2.04; mean iSNVs per sample (among samples with iSNVs) = 3.6. (B) Sample coverage by date shows the temporal distribution of samples containing Ebola virus (EBOV) genomes with and without iSNV calls. As expected, samples with iSNV calls have generally higher coverage. (C) Intermediate-frequency variants can persist over time with minimal genetic drift, as demonstrated by the iSNV at position 18,911. The existence of intermediate frequency (10%–30%) iSNVs in many different samples over time provides an argument against recurring mutations and may suggest a relatively wide transmission bottleneck between patients.

Similarly, publicly available EBOV genomes from this outbreak can shed light on exportation of EBOV from Sierra Leone into other countries. All published genomes from elsewhere, including 26 from Liberia and 4 from Mali, lack the Sierra Leone-defining SL3 mutation ( Figure 1 B and Experimental Procedures ). Given that 97% of Sierra Leonean EBOV sequences have the SL3 variant, extensive exportation would result in the spread of SL3 EBOV genomes, a spread that is not seen in the limited samples available to date. At least in Sierra Leone, and with the exception of events at the onset of the epidemic, transmission has likely been primarily within national borders ( Figure S2 and Experimental Procedures ), rather than by free interchange with neighboring countries.

(A) Nine Ebola virus (EBOV) Makona genomes (right-hand most circles) from the Freetown area with four groups of apparently ancestral EBOV genomes (middle circles)). Groups of genetically identical genomes (circles) are related to each other by simple vertical relationships (arrows). Solid circles are shown on the date of the earliest sample in the group; the circle area is proportional to the number of samples containing viruses with that genome; arrows represent a set of non-homoplasic SNPs and point from ancestral to derived alleles. Here, “SL3” and “SL4” do not refer to entire clades, but to the viruses that exactly match the canonical SL3 and SL4 genomes with no further mutations. (B) Geographic mapping of one epidemiological route that may account for four of the nine Freetown viruses shown in (A). Groups of identical viruses are shown at their first observed location.

As the epidemic developed within Sierra Leone, the SL3 lineage continued to dominate the viral population within the country, with no evidence for additional imported EBOV lineages. In our data set, 97% of the genomes carry the SL3 mutation and the remainder belong to SL2 ( Figure 1 A). These results link all Sierra Leonean EVD cases to the initial introduction of EBOV into Sierra Leone, and they provide further evidence that all EVD cases during this outbreak arose from human-to-human transmission rather than from further zoonotic introductions from the unknown EBOV reservoir. This means that no newly imported viral diversity was detected after the initial introduction (); all newly sampled viruses likely descended from those sequenced in the initial weeks of the outbreak. The genetic similarity of these viruses suggests that importation from other countries was minimal, although we cannot definitively rule out a re-introduction from elsewhere for the SL2 viruses (3%) in our data set.

(B) Lack of EBOV Makona SL3 spread to Liberia or Mali. Shown is a median-joining haplotype network constructed from a coding-complete EBOV genome alignment including 340 EBOV Makona sequences. Each colored vertex represents a sampled viral haplotype, with colors indicating countries of origin. Colors are as in (A), with the exception that the distinction is no longer made between older (Gire) and newer (Park) Sierra Leonean data sets (both are now dark blue), and two additional countries are shown (Liberia in yellow, Mali in red). The size of the each vertex is relative to the number of sampled isolates. Hatch marks indicate the number of mutations along each edge.

(A) Phylogenetic and temporal placement of recently sequenced Ebola virus (EBOV) within Sierra Leone. New EBOV genomes (232 genomes, dark blue), sampled from June 16 through December 26, 2014, provide a high-resolution view of the accumulated genetic diversity and fill in the missing ancestry between EBOV Makona genome data sets. The maximum clade credibility (MCC) tree was inferred using Bayesian evolutionary analysis by sampling trees (BEAST), with tips anchored to sampling date. Tips are labeled for EBOV from five non-African health-care workers (HCWs) infected in Sierra Leone and treated in Europe (sequenced by other groups, light green). Previously described nested EBOV Makona lineages SL1, SL2, and SL3, as well as a new lineage SL4, are labeled at their most-recent common ancestor (MRCA) nodes.

A previous study of EBOV Makona sequences elucidated viral transmission and evolution during the early stages of the outbreak in Sierra Leone () from late May to early June, 2014. The first reported EVD cases in Sierra Leone stemmed from two genetically distinct EBOV Makona lineages, believed to have been introduced from Guinea. One of these lineages (SL1) was more closely related to the then-available three Guinean genomes (two to five mutations) than the second lineage (SL2), which was characterized by four additional mutations. This finding suggested that SL2 had evolved from SL1 some months before it was observed in Sierra Leone. A third lineage (SL3), derived from SL2, emerged in mid-June 2014. SL3 differs from SL2 by a single mutation at position 10,218, first found as an intrahost variant (polymorphism within one individual) at a low frequency. SL3 became the most prevalent lineage in Sierra Leone during the first 3 weeks of the outbreak there, with SL1 disappearing soon after the appearance of SL3. The SL3-defining mutation is epidemiologically important, as it is the first commonly circulating mutation observed to arise within Sierra Leone’s borders.

Very recently, another 175 EBOV Makona genomes were published based on a cohort from Sierra Leone, mostly sampled from the area of Freetown in the Fall of 2014 (). Although these data were not included in our analyses, they are unlikely to significantly alter our primary findings ( Figure S1 ).

(A) 175 recently published Ebola virus Makona samples from Sierra Leone () describe lineages that fall within the genetic diversity of our current dataset (MCC tree from BEAST, as in Figure 1 ). (B) They span a two month period (Sep 28 to Nov 11, 2014) that falls within the temporal sampling of our current data and shows a consistent evolutionary rate.

We constructed a second, independent genome library for each of 150 high-quality samples from the KGH cohort to reliably determine intrahost single-nucleotide variants (iSNVs) at low frequencies (). We identified 247 iSNVs (25 insertion/deletions that were excluded from all analyses, 73 nonsynonymous, 71 synonymous, and 78 noncoding), including 21 iSNVs shared by multiple patients.

In combination with the 86 previously published EBOV Makona genomes (), we analyzed a total of 318 genomes (see Experimental Procedures ), all aligned against the earliest sampled Guinean genome (GenBank: KJ660346.2 ). In this set, we observed 464 single-nucleotide polymorphisms (SNPs; 125 nonsynonymous, 176 synonymous, and 163 noncoding). We also observed five single-base insertions and two double-base insertions in noncoding regions. We mapped all of the variants to primer-binding sites for known sequence-based diagnostics () and found no mutations in these sites that were present in more than one Sierra Leonean sample ( Table S2 ).

While we are continuing attempts to glean genomic information from compromised samples of the recent KGH cohort, important information may have been lost. In particular, samples from many EBOV-infected health-care workers at KGH, which could provide important insights into hospital-based transmissions, were compromised.

Using this pipeline, we successfully assembled 232 EBOV Makona coding-complete genomes (150 from KGH and 82 from the CDC cohort, spanning June 16 to December 26, 2014). Each assembled sequence was at least 18.5 kb in length, with a maximum of 6% ambiguous base calls per genome. The median assembly had 374× coverage, was 18.9 kb long, and had no ambiguous bases. Despite extensive sequencing, successful full-genome assembly was difficult to obtain from the KGH cohort (73% failed genome assemblies; 374× mean coverage; Table S1 ), compared to a previous cohort from the same laboratory, described in(11% failed genome assemblies; 2,000× mean coverage). The high assembly failure rate of the more recent KGH cohort is likely due to the mandatory in-country implementation of a new EBOV sample deactivation protocol and to long delays for sample shipments amidst the outbreak response (see Experimental Procedures ). In contrast, only 7% of samples from the CDC cohort failed to assemble. However, these samples had been pre-selected for sequencing based on high EBOV titers, as estimated by qPCR. In addition, the CDC cohort samples were collected more recently, did not remain in lysis buffer for an extended period, and were subjected to a different sample deactivation protocol than the KGH cohort samples.

We implemented a new computational pipeline, viral-ngs:v1.0.0, for viral genomic de novo assembly, intrahost variant calling, and genome analysis and annotation. This pipeline is available via open-source software () and utilizes a generalized workflow engine to run on a wide variety of computer hardware configurations (). Through a partnership with DNAnexus, this pipeline is also available in a secure cloud-compute environment to enable consistent analyses across laboratories with limited computational resources ( Experimental Procedures ).

We performed massively parallel genome sequencing on 673 samples from two EVD patient cohorts. The first cohort included 575 blood samples from 484 EVD patients confirmed by laboratory staff at KGH from June 16 through September 28, 2014. The second cohort included blood samples from 88 EVD patients from throughout Sierra Leone confirmed at Bo by CDC laboratory staff from August 20, 2014 through January 10, 2015. Samples from both EVD cohorts were sequenced using previously described methods ( Experimental Procedures ).

Discussion

Our findings from 232 EBOV Makona genomes sampled in Sierra Leone over 7 months during the 2013–2015 EVD outbreak in Western Africa demonstrate the value of continued sequencing throughout an epidemic. We tracked the movement of EBOV throughout Sierra Leone and determined the frequency of EBOV movement into and out of that country. Although it is not unlikely that the virus continued to cross the national borders of Sierra Leone throughout the epidemic, these observations suggest that, at least in late 2014, cross-border introductions were not an important factor in the development of the epidemic. We were unable, however, to draw any conclusions about export to Guinea since few EBOV sequences from there are currently available.

The sequence data display EBOV Makona evolution in the context of prolonged human-to-human transmission and provide an updated view of genomic diversity. Based on the rates of nonsynonymous and synonymous changes that are shared or are unique to an individual host, we concluded that purifying selection becomes increasingly effective over time, as it has more opportunity to remove deleterious mutants.

While the effects of purifying selection in this extended EVD outbreak are clear, these evolutionary changes do not imply that positive selection or adaptation to humans are occurring. Rather, the data suggest that evolutionary changes over time through natural selection are sufficient to remove newly arisen alleles that are less fit in the human environment. To date, no published study has found experimental evidence of selection for alleles beneficial to the virus within the current outbreak.

It is important to recognize, however, that the long-term human-to-human transmission observed during the 2013–2015 EVD outbreak is historically unique for EBOV. At the beginning of each EVD outbreak, EBOV enters the human population with little or no genetic diversity. In the case of the current EVD outbreak, EBOV has now maintained fitness while expanding across a much larger space of genetic diversity than in previous EVD outbreaks, the largest of which comprised only 318 human infections. This degree of diversity will undoubtedly affect researchers’ ongoing efforts to develop or improve candidate diagnostics, vaccines, and therapeutics for EVD, many of which are targeting EBOV sequences directly (PCR, nucleic-acid based therapeutics) or indirectly (antibody cocktails).

Sanchez et al., 1998 Sanchez A.

Trappier S.G.

Ströher U.

Nichol S.T.

Bowen M.D.

Feldmann H. Variation in the glycoprotein and VP35 genes of Marburg virus strains. Wertheim and Worobey, 2009 Wertheim J.O.

Worobey M. Relaxed selection and the evolution of RNA virus mucin-like pathogenicity factors. The mucin-like domain of the EBOV glycoprotein, in contrast to the rest of the EBOV genome, appeared to be under diversifying selection based on a high ratio of nonsynonymous-to-synonymous mutations. While not statistically significant because of the small number of SNPs in the region, our observation is in agreement with many previous studies (). As the EBOV GP, especially the mucin-like domain, is the target of many antibodies, a plausible hypothesis is that the humoral immune response exerts selective pressure on GP, resulting in an accumulation of nonsynonymous mutations. In support of this hypothesis, regions of GP corresponding to experimentally determined B cell epitopes are significantly enriched in nonsynonymous, but not in synonymous, variants. There are two important caveats to this analysis: (1) these epitopes are determined in vitro and therefore may not be epitopes in vivo if they are not immunodominant, and (2) there is no experimental evidence to suggest that the majority of observed variants disrupt antibody binding to these epitopes.

Olal et al., 2012 Olal D.

Kuehne A.I.

Bale S.

Halfmann P.

Hashiguchi T.

Fusco M.L.

Lee J.E.

King L.B.

Kawaoka Y.

Dye Jr., J.M.

Saphire E.O. Structure of an antibody in complex with its mucin domain linear epitope that is protective against Ebola virus. While further experimental testing is required to validate an immune evasion hypothesis, we have highlighted a few prime candidates to consider. Genomes from three samples share a threonine-to-alanine mutation at GP amino acid position 485, a position that is conserved among all members of the Ebolavirus genus. This position is indispensable for binding of the protective antibody 14G7 (); the observed variant at this site may therefore be the result of escape from antibody-mediated selection. Additionally, two samples each possess multiple mutations within a single experimental B cell epitope in GP, which are likely to evade antibody recognition if those regions are relevant epitopes in vivo.

Tong et al. (2015) Tong Y.-G.

Shi W.-F.

Di Liu

Qian J.

Liang L.

Bo X.-C.

Liu J.

Ren H.G.

Fan H.

Ni M.

et al. China Mobile Laboratory Testing Team in Sierra Leone

Genetic diversity and evolutionary dynamics of Ebola virus in Sierra Leone. Gélinas et al., 2011 Gélinas J.-F.

Clerzius G.

Shaw E.

Gatignol A. Enhancement of replication of RNA viruses by ADAR1 via RNA editing and inhibition of RNA-activated protein kinase. Zahn et al., 2007 Zahn R.C.

Schelp I.

Utermöhlen O.

von Laer D. A-to-G hypermutation in the genome of lymphocytic choriomeningitis virus. Carpenter et al., 2009 Carpenter J.A.

Keegan L.P.

Wilfert L.

O’Connell M.A.

Jiggins F.M. Evidence for ADAR-induced hypermutation of the Drosophila sigma virus (Rhabdoviridae). Intriguingly, the two samples with multiple mutations within a single B cell epitope each possess a distinct short stretch littered with T-to-C transitions, a phenomenon also observed in. Excessive T-to-C and A-to-G mutation of virus genomes has been observed previously as a result of adenosine deaminases acting on RNA (ADARs;). When acting on viral genomic RNA, ADARs cause a pattern of excess A-to-G transitions that are represented by T-to-C transitions in our data set. These transitions are known to occur either promiscuously within 200 nucleotide stretches or in a sequence-specific manner; therefore, we investigated both possibilities. While only five of the 318 sequences in our data set contained obvious T-to-C stretches, we showed that the top 5% of sequences by sequence divergence, excluding the five sequences with T-to-C stretches, were also moderately enriched for T-to-C transitions across the genome. The remaining 95% of sequences appeared to show no enrichment. We do not know whether this phenomenon is caused by ADAR acting upon genomic RNA, as we cannot exclude the possibility of bias by the EBOV RNA polymerase or other effects. Additionally, it is yet unclear whether these T-to-C mutations have an anti-viral or other effect on viral fitness. These questions open avenues of research into molecular mechanisms shaping EBOV evolution.

Gire et al., 2014 Gire S.K.

Goba A.

Andersen K.G.

Sealfon R.S.G.

Park D.J.

Kanneh L.

Jalloh S.

Momoh M.

Fullah M.

Dudas G.

et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. The results of some of the specific genome analysis methods that we introduced here, while promising, will require denser EBOV genome sampling to yield sufficient information to influence the EVD outbreak response. Among these methods is transmission analysis, which could prove valuable for improved understanding of hospital-based transmissions and therefore for improved infection control. Inference of the ancestral genetic state is often straightforward, with clear patterns of new variations layering on previously existing variations; viruses that appear to be descended from others in the same data set are separated only by new mutations that are seen nowhere else in the data set. This kind of genetic relationship does not guarantee a transmission relationship between two patients since many viruses can share identical genomes. However, since viruses with identical genomes are often epidemiologically related (), we can infer that viruses that appear to descend from other viruses in our data set are either in or epidemiologically close to the same transmission chain.

Folarin et al., 2014 Folarin O.A.

Happi A.N.

Happi C.T. Empowering African genomics for infectious disease control. Unfortunately, long delays of shipping samples from the field and required changes to the EBOV inactivation protocol caused severe degradation of many samples, which prevented identification of variants and transmission analysis. This loss should serve as a reminder that standardized and optimized protocols for sample collection, virus deactivation, and shipment are crucial for a rapid worldwide response to any new infectious disease outbreak. An important future research effort will be aimed at understanding which certified EVD sample deactivation protocols are best suited for high-quality genomic sequencing. Complications with sample shipment also emphasize the need for establishing in-country sequencing capabilities either before or at the onset of future EVD outbreaks ().