Conclusions: A relatively simple network model provides insight on the role of local effects such as saturation that would be difficult to otherwise quantify. Our results predict that exponential growth of an epidemic is driven by the exposure of new communities, underscoring the importance of limiting this spread.

Results: The effects of community mixing, together with stochastic effects, can explain the qualitative difference in the growth of Ebola virus cases in each country, and why the probability of large outbreaks may have recently increased. An increase in the rate of Ebola cases in Guinea in late August, and a local fitting of the transient dynamics of the Ebola cases in Liberia, suggests that the epidemic in Liberia has been more severe, and the epidemic in Guinea is worsening, due to discrete seeding events as the epidemic spreads into new communities.

Methods: We combine a discrete, stochastic SEIR model with a three-scale community network model to demonstrate that the different regional trends may be explained by different community mixing rates. Heuristically, the effect of different community mixing rates may be understood as the observation that two individuals infected by the same chain of transmission are more likely to share the same contacts in a less-mixed community. Local saturation effects occur as the contacts of an infected individual are more likely to already be exposed by the same chain of transmission.

Background: In mid-October 2014, the number of cases of the West Africa Ebola virus epidemic in Guinea, Sierra Leone and Liberia exceeded 9,000 cases. The early growth dynamics of the epidemic has been qualitatively different for each of the three countries. However, it is important to understand these disparate dynamics as trends of a single epidemic spread over regions with similar geographic and cultural aspects, with likely common parameters for transmission rates and reproduction number R0.

Introduction As of mid-October 2014, the number of reported suspected cases of the Ebola epidemic in West Africa had exceeded 9,000 cases, which is likely a significant underestimate 1,2,3 . Markedly different dynamical behaviors can be observed for the growth curves of the epidemic in the countries of Guinea, Sierra Leone and Liberia (Fig. 1). Most immediately, the epidemic in Liberia has been growing at a much faster rate in Liberia than in Guinea. Although the epidemic likely began much earlier in Guinea 4, Liberia had approximately the same number of cases in early August, twice as many cases by the end of August and nearly three times as many cases by mid-September. Even more striking, the number of cases in Guinea appears to have been growing sub-exponentially until late-August (approximately linearly with a slope of about 3 cases per day) while the number of cases in Liberia has been growing exponentially (approximately 10 cases per day averaged for July, 40 cases per day averaged for August and 70 cases per day in September). The growth dynamics of the epidemic in Sierra Leone appears to be intermediate between these two. It would be helpful to understand these different growth patterns within the context of a single epidemic, since a better understanding of the source of these different patterns may yield productive ideas for curbing the exponential growth of the epidemic in Liberia. We describe a stochastic network model with three levels of community structure (households and communities of households within a country population) on which we model SEIR transmission dynamics for the spread of Ebola infection. We are able to fit the WHO Ebola case data of each country by varying only the community mixing parameter (connectivity) for each country (see Fig. 3). Observing that the long term dynamics of epidemic spread within communities are linear due to local saturation effects, we are able to demonstrate that linear growth phases followed by exponential growth phases are consistent with seeding of the epidemic to new communities. A variety of computational and statistical models have been used help to characterize and resolve the mechanisms underlying trends in the growth of this epidemic. The models of 5 and 6 include a parameter to estimate and predict the effect of control measures on the epidemic. SEIR models such as that of 5 and 7 are four-compartment models that resolve infectious dynamics between populations based on their susceptibility and infectiousness and account for the time scales of viral incubation and infectiousness. SEIR models with seven compartments 8,9,2 further resolve the effects of varying degrees of transmission among, for example, community, hospital, and funeral populations. These computational models focus on different aspects of the epidemic to explain or observe the marked differences of the growth curves for the epidemic in each country. The models 5 and 6, accounting for the effect of control measures, find that their models find that their models identify slowing of the growth of the epidemic only for Guinea and Sierra Leone. The model of 8 accounting for different community, hospital and funereal transmission rates, predicts that a higher number of transmissions from funerals in Liberia could account for the faster rate of growth of the epidemic in Liberia compared to Sierra Leone. Likewise, the model of 2 predicted a higher fraction of patients with no effective measures to limit transmission, including burial transmission, in Liberia. Especially interesting differences among the three countries were described by 7 in their methodology to observe changes in the effective reproductive number over time. In particular they found that the effective reproductive number rose for Liberia and Guinea. The authors observe that this increase occurred somewhat early on during the Liberia outbreak in mid-July, when the outbreak spread to densely populated regions in Monrovia, and during the Guinean outbreak in mid-August, around the time the outbreak spread to densely populated regions in Conakry. We would like to provide a proof of principle explanation for the differences in the dynamics of the 2014 Ebola epidemic based on differences in the community network properties of the affected regions, even as the number of daily interactions, transmission rates and in particular the average number of people infected by each person R 0 within a naïve population is the same for all three countries. A prediction of our model is that the effective reproductive number R e of the epidemic decreases quickly to values close to 1, indicating significant saturation effects for all three countries. Although we use a very simple homogeneous network model and our four model parameters are under-constrained, a particular choice of reasonable model parameters captures relevant average behaviors of the epidemic (for example, by fitting the growth curves of the epidemic) and we use these to interpret trends in the epidemic growth over time and between countries. A description of the roles of model parameters can be used to make predictions about the future growth of the epidemic in the context of potential epidemic controls that modify these parameters. Fig. 1: Cumulative Ebola Cases for Guinea, Sierra Leone and Liberia over Early 5 Months of the Epidemic (March 22nd − August 22nd) The Importance of Network Interactions The roles of incomplete mixing within communities, heterogeneity in contact transmission, local saturation and the co-incidence of multiple transmission chains can have significant effects on epidemic dynamics 13,14,15. The importance of network considerations, in particular the importance of reducing the contacts between exposed and unexposed groups, for controlling this epidemic has been described 16. One of the most important features to capture about the Ebola virus (in particular for a model resolved at an individual, mechanistic level) is that there is a high probability of transmission between close contacts, but a lower probability among casual contacts. For example, in a formal study of transmission chains in the 1979 Ebola virus outbreak in Sudan, it was found that care-takers of the sick had a 5.1 higher rate of transmission than other family members with more casual contact 17. Likewise, in the 1976 Ebola outbreak in Zaire, the probability of transmission was 27.3% among very close contacts (spouses, parents and children) but only 8% for other relatives 18. In our model we organize individuals within households (a broadly defined term meant to represent the set of potential “close” contacts) and households are organized within local communities (a larger, modularly structured network of the population of less likely and less infectious interactions). The term “household” has been used in the literature for networks of close interactions, so we use this term here for convenience, but in the case of Ebola virus transmission, a network of close contacts would include overlapping household, hospital and funereal networks, and the concept of ‘household’ should be expanded to include these. Two-scale community models with different transmission rates for close (or local) contacts and casual (or global) have been studied previously 19,20,21 and references therein) where the smaller scale compartment may be called households, clusters or sub-graphs. With this paradigm, there are two transmission rates, and thus two scale-dependent reproductive numbers that would sum to the global reproductive number R 0 of the epidemic. We define R 0H as the average number of infections that occur within the household from an infected individual, and R 0C as the average number of infections that occur within the community from a single infected individual. The reproductive number R 0 is the average number of infections expected in a completely naïve population (i.e., exactly one infected individual and no immunized individuals), so R 0 ≈R 0H +R 0C .However, in the context of local saturation effects, R 0 =1 is no longer a relevant parameter for defining a threshold for epidemic growth 22,23. We implement a discrete, probabilistic SEIR model on a social connectivity network that combines elements of a two-scale network model by 20 and the modular network of 24 to create what is technically a three-scale community network (individuals are organized within families, and families within modular local communities that are subsets of the entire population). This connectivity structure is simple by construction (more regular than small-world networks and lacking long-range connections of different lengths) yet nevertheless enables a parametrization of the level of mixing that exists between communities. Specifically, a population that is “well-mixed” has a larger number of families interacting within each local community. Details of the model are described in the Methods section. The effects of incomplete community mixing are significant even for moderate population sizes 19 and cannot be reproduced by unstructured mean-field models that assume complete mixing (e.g., see 25). This emphasizes the role of individual and network-based models to resolve these effects.

Methods A stochastic, individual-based SEIR model is implemented for a population with a network structure of two edge types: close contacts among members of a household and casual contacts among members of a local community. This component of the model is comparable to the two-scale SIR community model described by 20 (we use a discrete lattice-based simulation approach instead of a Markov approach and we add an exposed period for which an individual is exposed but not infectious). Local communities of each household are modeled as the set of r nearest households, for this we refer to the modular lattice approach of 24. See Fig. 2 for a schematic of our three-scale network. This last component enables us to systematically vary the extent of community mixing. Households of size H: Modeled on an L×H lattice A population of size P=L·H is modeled as an L×H lattice where H=|h i | corresponds to a fixed number of individuals in a household h i (i=1, … , L). Each i th row of the lattice represents a distinct household h i with members {n i,1 , n i,2 , … , n i,H }. Here the parameter L is arbitrarily large so that P represents an effectively infinite population (in simulations this only requires that L >> the number of families participating in the epidemic). Each node (i, j) on the lattice represents the j th family member of the i th household and has a state S ∈ {S, E, I, R} corresponding to whether the individual is susceptible, exposed, infected or refractory. Communities of size C=2r+1 households: Modeled on an L×H lattice with modular structure The population of households is further organized within communities c i (i=1,…, L). The i th community c i is the set of individuals within all households within distance r of the i th household (i.e., C i is the set of individuals within households {h i-r , … , h i-1 , h i , h i+1 , … , h i+r }). Communities are comprised of a fixed number of households (C=2r+1) and thus each community is a set of C·H individuals. With periodic boundary conditions, a population with L households would have L overlapping communities. We choose a number of communities much larger than the number participating in the epidemic, so that our results are not affected by the choice of boundary conditions. Network Structure of Close and Casual Contacts on the Lattice A network of two edge types representing close versus casual contacts is defined for the population (see Fig. 2). Individuals within a household are connected by edges that represent potential close contacts, and individuals within a community are connected by edges that represent potential casual contacts. Thusly, each household may be thought of as a well-mixed (completely connected) graph of H vertices and each community may be thought of as a well-mixed (completely connected) graph of C·H vertices. From an individual’s perspective, an individual located at the (i,j) th node is connected to each member of the i th household with potential close contacts and to each member of the i th community with potential casual contacts. The described network structure is completely homogeneous: every individual is centered within a network that is identical to that of every other individual. Fig. 2: Network Structure of Individuals, Households and Two Communities ci and ci+2 SEIR Dynamics Initially, all individuals (lattice nodes/network vertices) are susceptible (state S) except one individual that is exposed (state E) representing “patient 0”. In simulations, time steps are discrete and correspond to exactly one day. States are updated at each time step with the following transition probabilities: p(S→E)=probability that a susceptible individual becomes exposed =(1-probability of no exposures from any infected contacts) =(1-(1-t h )i h ·(1-t c )i c ), where t H is the transmission probability within a household (probability of exposure per day per infectious household contact), i H is the number of infectious household contacts, t C is the transmission probability within a community (probability of exposure per day per infectious community contact), and i C is the number of infectious community contacts. Transmission probabilities are the probability of transmission per day per infectious individual along a link between an infectious and susceptible individual, and are the products of (1) the probability of the two individuals interacting on a given day and (2) the probability that an interaction will result in exposure. p(E→I)=probability that an exposed individual becomes infectious =1/γ, where γ is the average incubation period. p(I→R)=probability that an infectious individual will become refractory =1/λ, where λ is the average infectious period. For the incubation and infectious periods, we follow recent modeling groups 5,2 based on data in 23,24,25,26 using γ=5.3 days and λ=5.61 days. We observe that these periods may be longer 4,26, as used in 9,6,8 and varied in 7. Reparametrization of transmission rates in terms of R 0 and expected number of contacts η The extent of mixing increases with the size of the community C. A community of size C means that each individual in that community has an equi-probable chance of interacting with each other member of the community. However, it becomes increasingly clear as the community size increases that members of the community will not interact with every member of the community every day. Rather, we assume that individuals interact with an average number of people η each day, where η is a fixed value independent of community size. We define the reproductive numbers R 0H and R 0C as the average number of secondary infections resulting from a single infectious individual in a completely naïve (all susceptible) population due to household and community interactions, respectively. Since each network (household and community networks) is homogeneous, the values of R 0H and R 0C do not depend on the location of an infectious individual in the network and is proportional to the number of susceptibles in the network (equal to (H-1) or (C-1) in a naïve population) times the probability (λ·t) that each susceptible individual will become exposed over the infectious period due to contact with the infectious individual: R 0H ≈ (H-1)·λ·t H , R 0C ≈ (C-1)·λ·t c ,[F1] Since R 0 is a key epidemiological parameter, it is helpful to make this a control parameter of the model rather than the transmission rate. Thus, we solve for the transmission rates in terms of R 0 H and H, or R 0 C and C: so that Four Free Model Parameters: R 0H , R 0C , H, C Since our incubation and infection periods are fixed, our model is completely prescribed by four intuitive free parameters: the household reproductive number R 0H , the community reproductive number R 0C , the household size H and the community size C. Note that since an infected individual becomes exposed from the community or from their household, we have R 0 ≈ R 0H +R 0C , where R 0 is the usual reproductive number [F1]. Model Response Parameters Our model tracks the states of individuals over time. In simulations, an individual is defined as an Ebola case when they become infectious. This assumes that an individual is not recognized as a case until they are infectious and that there is no delay in identifying infectious individuals. Many response variables can be calculated, included the fraction of infections not occurring due to a contact already being infected (saturation), the rate of spread of the infection through the population (as, for example, a function of the average distance from the initial infected individual), the structure of the chain of transmission from any single exposed individual, etc. In the results presented here, we focus on the cumulative number of infectious cases per day. We also calculate the effective reproductive number R e which is the average number of infections resulting from each infectious individual. At the conclusion of a simulation, R e is calculated as the total number of infections caused by any refractory individual divided by the total number of refractory individuals (i.e., care is taken to not include the data of infectious individuals that have not yet become refractory.) Case Data and Matching Epidemiological Curves for Ebola Case Number versus Time We compare simulated Ebola cases per simulation days with Ebola cases per day for Guinea, Sierra Leone and Liberia. Case data was found on Wikipedia 11 compiled from WHO case reports 12 and was retrieved on October 15th. Since the date of the first case is not given, and especially since the number of days between the first case and the n th case can be highly variable, we synchronize the simulation day and the calendar date by using the first day that there are 48, 95 or 51 cases in Guinea, Sierra Leone and Liberia, corresponding with March 22nd, June 15th, and June 22nd, respectively. In Figs. 3 and 6, simulation total case numbers are shown as the average results of n=100 simulations ± the standard error (the standard error is calculated as the standard deviation divided by the square root of n). A simulation is only counted and included in the 100 replicates if the outbreak does not die out before reaching the observed number of cases. A comparison of general R-square coefficients of determination was used to identify parameter values providing a good fit to the empirical data. In particular, a locally optimal parameter value for R 0C for given values of H and R H was verified, and a globally optimal value of C for given values of H, R H and R 0C (Fig. 3) was verified, by changing each parameter one-at-a-time and confirming that these values maximized R-square while all other parameters were held fixed. Simulations were completed using the software package Matlab. The script for calculating R-square was written by Jered Wells. Matlab scripts used to generate all figures can be found at: http://www.southalabama.edu/mathstat/personal_pages/byrne/PLoS_MATLABscripts.htm [F1] Accounting for saturation effects that occur even over the infectious period of the first infectious individual in a naïve infectious population, more accurate but less penetrable descriptions for the community network can be found in Appendix 1.

Results Fitting Country Case Data for the First Five Months of the Epidemic The number of cases over time for Guinea, Sierra Leona and Liberia over the first five months of the epidemic (March 22nd – August 22nd) is shown in Fig. 3. We sought to describe these dynamics in the context of the spread of a single epidemic within a contiguous region, so that key epidemiological parameters, in particular transmission rates and the basic reproductive number, would be the same in each country. We also assumed average household size would be the same for each country and required that the difference between countries be described solely by changing the community size, C, a measure of the size of the community within which infectious and susceptible individuals interact by casual contacts. It is thus natural to refer to C as the community mixing size. For the fixed set of parameters H=16, R 0H =1.8 and R 0C =0.55, average simulation results yielded good fits to the empirical increase in case number over time as the community mixing size C was increased from C=9 to C=33 to C=51 for Guinea, Sierra Leone and Liberia (Fig 3). This result provides a proof of principle that the differences in the growth dynamics of the epidemic can be explained by different levels of community mixing. The particular set of values in Fig. (3) were chosen to be within likely ranges of their empirical values (see Discussion). A local sensitivity analysis was done to establish that R 0C and C were locally optimized for the given choice of H=16, R 0H =1.8, but a systematic search of parameter space was not done to determine the shape of the set of all parameters, still within the likely range of their empirical values, that would provide equally good fits of the data. Since the epidemic growth rate generally increases with increases of any one parameter, different parameter values may yield similar results if one parameter is raised while decreasing another, though we note there are complex effects on the shape of the transient behavior in each case. In the Appendix, we include several examples of alternative fits to the data in which, for example, the household size H is decreased or increased, or the household reproductive number R 0H is increased or decreased. These changes to H and R 0H require commensurate changes in C and R 0C in order to achieve good fitting of the data. In these results, we focus on the parameter set {H, R 0H , R 0C }={16,1.8,0.55} since these are consistent with previously described epidemiological data for Ebola virus (see discussion for further details). The effective epidemic reproduction number R e is calculated from the model simulations. The value of R e is time-dependent (plots are provided in the Appendix) but decrease to values close to one for each country (R e =1.03±0.01, 1.10±0.02 and 1.17±0.02 for Guinea, Sierra Leone and Liberia, respectively). These values indicate very strong saturation effects, as many members of a household do not infect anyone after all members of their household have been exposed. Fig. 3: Country Specific Model Fits During 5 Months 3/22 – 8/22 of the 2014 Epidemic Stochastic Variability of Individual Simulations and Stochastic Spread of the Epidemic The simulation curves in Figs. 3 and Fig. 7 show the average number of Ebola virus cases versus time for simulations that did not die out before reaching a threshold number of cases. However our model is probabilistic and individual simulation curves are quite variable (Fig. 4a). A significant source of variability is whether the infection spreads beyond a small number of cases. Since stochastic effects are significant (Fig. 4a), and the epidemic fails to reach the threshold number of cases for a large fraction of the simulations, we describe the likely spread of an epidemic with a histogram of the distribution of outbreak sizes in Fig. 4bcd. The distribution of outbreak sizes for our model parameters describing the growth dynamics of Guinea and Liberia (Fig. 4cd) are bimodal: simulations frequently result in no spread (23±1% of all simulations result in no secondary infection) or epidemics are likely to become quite large (>950 infections) if there is spread. For Guinea, there is some non-negligible probability that an intermediately-sized outbreak does not become an epidemic (considering the low frequency of outbreaks of size 150—950 in Fig. 4c). For the parameters that describe Liberian growth dynamics (Fig. 4d), in contrast, the probability that an intermediately sized outbreak will spontaneously extinguish is negligible (not once occurring in 2000 simulations). To underscore the possibility that Guinea is a population that may be in transition from one likely to have only small outbreaks to one that would always have large epidemics like Liberia, we provide a histogram of the distribution of outbreak sizes for a population with even smaller community mixing size (C=5) (Fig. 4b). For this population, there is a sizable probability of small outbreaks, with a very small probability of a large one. Insets of each panel Fig. 4bcd show that the early dynamics is similar for all community mixing rates. Fig. 4: Probabilistic results of the model. Location of Country Parameters in Phase Space and Predictions for Curbing Epidemic Growth The locations of parameter values in phase space that fit the Guinea, Sierra Leone and Liberia data are shown in Fig 5. In Fig. 5a, the community reproductive number R 0C is varied along the x-axis from 0.05 to 0.85, and the community mixing size C is varied along the y-axis from 1 to 65. The three countries are located along the x-axis at R 0C =0.55, and along the y-axis at C=9 (Guinea), C=33 (Sierra Leone) or C=51 (Liberia). The shaded regions indicate increasing numbers of cases in the first 100 days after the outbreak has been established (defined by reaching 50 cases). This diagram shows the effect of decreases in C and R 0C on the growth of the epidemic during an early time period. For example, the phase diagram of Fig. 5a predicts that Liberia could have the slower Guinea-type growth dynamic if the reproductive number within communities was decreased from R 0C =0.55 to R 0C =0.25 (approximately a 50% reduction in community transmission). Fig. 5: Phase Diagrams of Parameter Space Extrapolating from the Five Months March 22nd – August 22nd to Interpret Epidemic Dynamics through October 15th During the writing of this manuscript, the growth dynamic of the epidemic in Guinea changed abruptly by an increase in the average number of Ebola cases per day. Our model parameters that provided a good fit to the Guinea data March 22nd – August 22nd (Fig 3a) predicted that the number of cases in Guinea would continue with its previous trend of 3.3±0.5 new cases per day (Fig 6a, the solid black line). However, the abrupt change in slope suggested a new source of cases. We calculated the difference between the empirical number of Ebola cases and the predicted number of Ebola cases over time to find the number of ‘new cases’ that would be supplied by a putative second outbreak. In particular, there were 823 cases reported for Guinea on September 3rd whereas our model predicted only 623. We thus used our model to fit a second Guinean outbreak with 200 cases on September 3rd, and interpolated that the difference between the empirical data on August 29th (703 cases ±14 cases) was approximately 100 cases larger than simulation predictions on that date (599 cases ±15 cases), which we used to initialize a new outbreak in simulations (dotted line). (The sum of the two outbreaks fit the data for Guinea very well when the community mixing parameter was C=51 for the second Guinean outbreak. Our model predicted that this second outbreak began with its first case in early July. In the absence of any further seeding events, the model prediction for the number of Ebola virus cases in Guinea is 2,083 (standard error 34 cases) on November 1st and 2990 (standard error 42 cases) on December 1st, with a predicted linear growth dynamic of 30±1 cases per day. Likewise, our model parameters that provided a reasonably good fit for the growth of the epidemic in Liberia over the earlier time period made a poor prediction for the growth over the next six weeks (Fig. 6b). Our model generally predicts a growth dynamic that is linear after a transient exponential period. We repeated the method described for Guinea for fitting the Ebola cases that were in excess of our modeling predictions and these results are summarized in Figs. 6b and 6c. In Fig 6b, a guiding exponential is drawn (dashed line) to show that the cumulative case data for Liberia is fit well by an exponential. Since early case data is quite noisy, we used the exponential fit to predict that the first 100 cases would have occurred on July 10th. The predictions of simulations, in which the day of the 100th case was defined to be July 10th, are shown in Fig. 6b (solid gray line). Simulation predictions are indistinguishable from the exponential through mid-August, confirming that simulations predict a growth dynamic that is transiently exponential. After this transient exponential period, the predicted growth dynamic was linear (with a slope of approximately 27±1 case per day). We repeated the procedure described for fitting the Guinean case data, in which a new outbreak with 100 cases is simulated each time the number of reported cases exceeds the number of predicted cases by 100 cases. This method predicted three independent outbreaks, all with community mixing size C=51, beginning in late June, mid-July and late July. While the superposition of these simulation outbreaks resulted in a very good fit of the data, especially the transient growth dynamics, we note that equally good fits could be certainly achieved by other linear combinations of increasingly smaller outbreaks. In the absence of any further seeding events, the model prediction for the number of Ebola virus cases in Liberia is 6,097 (standard error 55 cases) on November 1st and 8,529 (standard error 65 cases) on December 1st, with a predicted linear growth dynamic of 81±1 cases per day. Generally, our model predicts a growth dynamic that is linear after a transient exponential growth period. In the context of our modeling framework, without the simulation of secondary outbreaks, the linear growth can be understood as the depletion of susceptibles within participating communities, so that new infections occur only with the exposure of households with overlapping communities. The slope of this increase will depend in a complex way on the propagation of saturation effects from infected sources and on the connectivity of individuals between adjacent communities. Fig. 6: Extending Country Specific Model Fits for Guinea and Liberia