Analysis outline

Focusing on the Zaire Ebola virus (EBOV), we use a stochastic epidemiological compartmental model29, to simulate both pathogen spill-over and subsequent human-to-human disease transmission (Fig. 2). Within grid cells (0.0416°—5.6 km at equator) covering continental Africa, we used a Susceptible, Exposed, Infectious, Funeral and Removed (SEIFR) EVD-EBOV disease compartmental model following13,19,23 to estimate the number of individuals per compartment, in each time step t, for present day bioclimatic, land use and demographic conditions. Although some previous compartmental models for EBOV have included a Hospital compartment51, adding this complexity was not feasible over large and poorly known geographical areas. Without knowing more about the spatial variation in health seeking behaviour, exactly which grid cells contain clinics, and the variation of healthcare resources in these clinics, adding in this compartment would not likely significantly improve our model’s ability to predict the progression of outbreaks. Furthermore, hospital interventions had the least impact controlling EVD outbreaks in a recent meta-analysis (24). Grid cell size was chosen as the highest resolution at which computation was feasible while being able to use a non-stochastic human movement model to approximate contacts per cell (see details below). All mapping and analyses were carried out in R v.3.2.252. Each stage of the EMM simulation is discussed in more detail below:

Stage 1: SEIFR compartmental model within grid cells

We used starting EBOV transmission characteristics of incubation time = 7 days, onset of symptoms to resolution = 9.6 days, maximum case fatality rate for very low income countries (CFR) σ = 0.78, and burial time = 2 days23 to parameterize the SEIFR compartmental model to determine transition rates α (between Exposed to Infectious compartments), γ σ (Infectious to Funeral), γ 1–σ (Infectious to Removed), and γ F (Funeral to Removed) (Fig. 2). To incorporate sensitivity around these transmission parameters, we allowed values to vary for each simulation run by sampling from a Gaussian distribution where the mean was their initial value and the standard deviation was fifth of the mean, to give a reasonable spread of values. For each time step t, the number of individuals moving between all compartments was estimated by drawing randomly from a binomial distribution (Supplementary Equation 1), parameterized using the respective compartmental transition rates. Transition rates for compartments were assumed to be the same in all grid cells except for the transition between Susceptible to Exposed. The per grid cell Susceptible to Exposed transition rates were determined by the force of zoonotic infection λ z , and the force of infection λ (Fig. 2) and these were calculated as follows:

(a) Force of Zoonotic Infection, λ z . The force of infection for zoonotic transmission λ z , per time step t, was estimated as the product of the probability of host presence H, and spill-over rate κ (Supplementary Equation 2). Without any evidence to the contrary15,53, we parameterized H by calculating the spatial probability of the presence of the most likely EBOV reservoir host species based on available data (Old World fruit bat species Epomophorus gambianus gambianus, Epomops franqueti, Hypsignathus monstrosus, and Rousettus aegyptiacus see Supplementary Table 1) within each grid cell across the African continent using species distribution models (SDMs)54 and assuming constant pathogen prevalence. We also calculated the spatial probability of the presence of other species which are known to provide an alternative route of infection, but likely do not act as reservoirs (Gorilla spp., Pan spp., and Cephalophus spp.)12. SDMs for each species were inferred using boosted regression trees (BRT) using distribution data from the Global Biodiversity Information Facility (GBIF)55 and 11 present day bioclimatic and land use variables (Supplementary Table 2). Data with coarse scale GBIF spatial coordinates (decimal degree coordinates with less than four decimal places) were filtered out of the analysis. To reduce spatial autocorrelation and duplicate records, any records that co-occurred in the same grid cell were removed. Lastly, GBIF records older than 1990 were discarded to ensure samples more closely matched the current landscapes. BRT tree complexity was set at 5 reflecting the suggested value and the learning rate was adjusted until >1000 trees were selected56. A total of 25 models were estimated for each species using four-fifths of the distribution data as a training dataset and one fifth as a testing dataset, chosen randomly for each model. Those with the highest predictive ability (high area under operating curve, AUC—all models AUC > 0.9) were selected as the best model for each species (Supplementary Fig. 2). The most important spatial variables determining distributions across the different reservoir host species were BIO7 Temperature Annual Range, BIO13 Precipitation of Wettest Month, BIO2 Mean Diurnal Temperature Range and Land Use-Land Cover (Supplementary Fig. 3). The outputs from all putative reservoir (bat) species were combined into a single value representing the probability of any reservoir species being present and a similar approach was taken for the non-reservoir host species. First, we assume that complete mixing occurs within each grid cell and that if secondary host species are present in a cell they meet the presumed primary hosts. Then we calculate one minus the probability of each host presence, multiplied across all species, giving us the probability of there being no species present in any given cell. One minus this value gives the probability of at least one species being present. We assumed that bats were the default, fundamental component for transmission (i.e., acting as the ‘working’ reservoir) and, given complete mixing in each cell, apes and duikers were a secondary, less common route of human infection. Since roughly a third of index cases have been attributed to ape/duiker host spill-overs10, in cells with both groups we down-weighted the probability of the ape/duiker occurrence by two thirds and reservoir occurrence by one third when combining the two lots of layers together, though, we note, due to the similar habitat requirements a very high majority of the cells containing duikers and apes also had high probability for bats so the precise value here will have a limited impact. The final resulting probability was bounded by zero and one. Additionally, as EBOV presence in non-reservoir host species is impossible without the presence of reservoir hosts, cells with a reservoir host probability of zero were given a value of zero irrespective of the non-reservoir host score. For computational simplicity, we assume that all human individuals have equal chance of exposure to infected host species. The initial value used for spill-over rate κ, per time step t, was estimated from the number of historic outbreaks O (defined here as distinct clusters of cases) taken from empirical EBOV outbreak data12, and the number of historically susceptible individuals S h (inferred from human population estimates from 1976 to 2015 from36 (see Supplementary Equation 3). During each simulation run, κ was allowed to vary using the same method as for the compartmental transmission parameters above.

(b) Force of Infection, λ. The force of infection for human-to-human transmission λ per time step t, was estimated as the product of the effective contact rate β, and the number of individuals that can transmit the disease in each relevant compartment (‘Infectious’ and ‘Funeral’) per grid cell (Supplementary Equation 4). We assumed that β for the Infectious and Funeral compartments was equivalent, due to the simplicity of the movement model used as we would not be able to reasonably differentiate between the two compartments and maintain computational efficiency. Indeed, with the contact rates of moving individuals from the Infectious compartment being offset by aggregations of individuals at funerals it is not clear if there would be a large difference when approximated using a gas model. We estimated the effective contact rate β, as the basic reproduction number R 0 divided by the product of the total number of individuals N, and infectious duration D (the sum of Infectious and Funeral compartment time, 11 days taken from23). As a starting value for R 0 we used a value of 1.757 and this was allowed to vary per simulation run using the same method as for the compartmental transmission parameters above. As per previous research29, we incorporated spatial variance in contact rates among grid cells using a weighting factor m, whereby the effective contact rate in grid cells with greater than expected contact rates was increased and decreased where fewer contacts were predicted (Supplementary Equation 5). We estimated m by creating an ideal free gas model of human movement within each grid cell and approximated collision frequency per person per day, using the following: the total individuals in each grid cell (estimated from Gridded Population of the World v358, an individual interaction sphere of radius 0.5 m, and the per person, daily walking distances in metres vΔt, where v is walking velocity, and Δt equals time period (Supplementary Equation 6). To capture geographic variation in human movement patterns, each grid cell was assigned a value for per person daily walking distance, based on the empirical relationship between daily walking distances and per person per country Gross Domestic Product (measured as Purchasing Power Parity from36) (Supplementary Table 3). As the availability of mass transit as an alternative to walking tends to be centrally controlled, we assumed that grid cells in each country had the same value.

Under observed conditions, the effective reproduction number R e decays over time as both efforts are made to control disease spread and the pool of susceptible reduces, which results in R 0 being equal to R e only when time t is zero. Therefore, to calculate effective contact rate β, we allowed R e to decay per time step t (Supplementary Equations 7, 8 and 9). However, countries that can invest more in health infrastructure (e.g., barrier nursing, surveillance) should see a more rapid reduction in R e over time compared to countries that do not have such infrastructure and also a concomitantly, a decrease in CFR. Therefore, we derived an empirical estimate of the relationship between wealth (measured using GDP-PPP per capita) and both the relative rate of decay of R e over time (Supplementary Equation 10) and CFR (Supplementary Equation 11). Using a spatially disaggregated poverty data layer59 we weighted the per grid cell per time step R e reduction and CFR accordingly to the values in each grid cell. While we found the relationship between wealth and both R e and CFR reduction over time to be best described using curves with exponents of −0.08 and −0.02, respectively, this was inferred using relatively few data points (Supplementary Table 4). In our simulation runs, therefore, we allowed these exponents to vary similarly to the parameters above, to allow either more linear declines or deeper curves to best estimate the true impact of this relationship.

Stage 2: SEIFR compartmental model between grid cells

We allowed those individuals that had contracted EBOV to travel between grid cells (specifically individuals in Exposed and Infectious but not Funeral compartments) (Fig. 2), but assumed for simplicity that the overall net movement of susceptible individuals between cells was zero. As previously supported with empirical data, we employed a movement model that was weighted by both geographic distance and human density30,32, and was also geographically constrained to known transportation routes. The transmission rate ε, of individuals between target compartments of different grid cells was estimated by two different methods: between grid cells along road networks ε r , and along flight routes ε f . We sampled randomly, from a binomial distribution, the number of travellers per grid cell and time step t (Supplementary Equation 1) with the probability of travel by road per day ε r , being proportional to the distance to the nearest road using the Global Roads Open Access Data Set (Global Roads Open Access Data Set from60). Global roads dataset contains in total 585413 routes from tracks to multi-lane highways and has been extensively validated for Africa60. We allowed travellers to move freely (agnostic to any particular transportation method or country boundary) across the continent up to 10 road junctions in any directionalong the road network starting from the centroid of the target cell (Global Roads Open Access Data Set from60), giving a potential of up to 500 km of linear travel per time step. Each proposed travel end point was given an individual probability from the daily distance travelled probability curve from Fig. 2f of ref. 61, which is derived from transport data and validated against mobile phone data. For air travel, we set the potential pool of travellers as the individuals in grid cells containing airports across the world from Open Flights Airport Database62, plus all the Exposed individuals in the 8 grid cells surrounding each airport grid cell. We sampled randomly from a binomial distribution the number of travellers per grid cell and time step t (Supplementary Equation 1) with the probability of travel by air per day ε f , being proportional to the total number of flights per day divided by the population of that country36. We allowed travellers to move up to 2 edges on the current airline routes from airport origin using62. This approximates a traveller taking either a one or two-legged journey. Final destinations were sampled at random, based on all potential air routes having equal priority, but in most cases potential destinations were located nearby, which by default meant that more distance travel was less likely than travel to a nearby location. We decided to keep each edge on the network as equal likelihood due to a lack of comprehensive and detailed information that we could find on passenger numbers at the time of modelling. For both road and air travellers, individuals were then added to the correct compartment of their final destination in the new grid cell and removed from the same compartment of the original source grid cell.

Stage 3: Impact of future anthropogenic change

(a) Future force of zoonotic infection λ z . We recalculated values of the force of zoonotic infection λ z , by estimating the probability of EBOV host presence, H 2070 under several different future integrated scenarios that incorporate projections of bioclimatic and land use variables (Supplementary Table 2). Estimates of bioclimatic variables for 2070 were based on the HADGem2-AO climate model37 under three Representative Concentration Pathways: CAM-RCP4.5, AIM-RCP6.0, and MESSAGE-RCP8.539,63,64. These different options were, specifically: (i) GCAM-RCP 4.5 (High Climate Mitigation)—stabilization scenario in which total radiative forcing is stabilized shortly after 2100, (ii) AIM-RCP 6.0 (Low Climate Mitigation)—stabilization scenario in which total radiative forcing is stabilized shortly after 2100, without overshoot, by the application of a range of technologies and strategies for reducing greenhouse gas emissions (iii) MESSAGE-RCP8.5 (Business as Usual Emissions)—worsening scenarios with ongoing increasing, unchecked, greenhouse gas emissions over time, leading to high greenhouse gas concentration levels. Although we only used a single overall climate model (HADGem2-AO) due to computational constraints, this model offers good agreement with other alternative models65 and a ‘middle of the road’ option in terms of realised changes to future host distributions compared to 32 other models (Supplementary Fig. 7; Supplementary Table 5). To estimate host presence probability in the future we needed to predict fine-scale future habitat data under the RCP scenarios. As only coarse land-use categorisations are currently available66 with, for instance, a ‘primary’ land-use having a wide variety of possible natural habitats from arctic tundra to tropical rainforest, we therefore separately empirically estimated future land use-land cover (LULC) change using the spatiotemporal patterns contained within the MODIS land-cover time series35. For each grid cell, at the same grid cell resolution as set out above, we calculated the frequency of each LULC change seen in the 2001–2012 MODIS dataset for the surrounding square of 25 grid cells around each grid cell. We converted this frequency to a probability by dividing by the total possible number of changes, and based on these probabilities, we simulated yearly LULC change across the region of interest for each grid cell from 2012 until 2070, and ran this simulation 100 times to create a bank of future possible landscapes. These were then summarized into three consensus landscapes representing low (with anthropogenic changes rejected where possible), medium (by choosing the majority consensus across all 100 runs) and high anthropogenic change, (more anthropogenic changes (e.g. urban, cropland, mosaic cropland) were preferentially chosen across the landscape) and we aligned these three scenarios to SSP1, SSP2 and SPP3, respectively (for scenario details see below)38,39.

(b) Future force of infection λ. Using predicted human demographic variables and poverty levels for 2070, we recalculated values for the force of infection λ, by estimating the number of individuals per grid cell, n and effective reproduction number, R e . We inferred human population estimates per grid cell for 2070 by using the Gridded Population of the World v458 for present day and multiplying each cell by the expected future proportional change over that time period predicted by three Shared Socio-economic Pathways: SSP1, SSP2 and SSP3. Specifically, these pathways represent: (i) SSP1 (Sustainable Development)—high regional cooperation, low population growth due high education and high GDP growth, (ii) SSP2 (Middle of the Road Development)—a ‘processes as usual’ scenario with ongoing levels of population growth and wealth, with medium estimates for both these by 2070, and (iii) SSP3 (Regional Rivalry Development)—regional antagonism, high population growth, unsustainable resource extraction and low economic growth. Future poverty estimates per country were similarly inferred using a spatially-disaggregated GDP layer59 multiplied by the expected change in per country GDP over the time period as predicted by the SSP integrated scenario, with any future changes in wealth interacting with the R e parameter to affect disease epidemiology accordingly. We note that as our travel probability is defined per person, increasing future populations will see a proportional increase in the amount of both road and air travel, though with unknown patterns of future trade and travel we kept the travel network the same shape.

(c) Comparison of simulation runs. We reran the EMM simulations under 5 plausible combinations of 2070 future environmental and socio-economic scenarios of global change and greenhouse gas concentrations, matching greater greenhouse gas emissions to less progressive and less cooperative socio-economic scenarios as follows: GCAM-RCP4.5/SSP1 (High Climate Mitigation + Sustainable Development), GCAM-RCP4.5/SSP2 (High Climate Mitigation + Middle of the Road Development), AIM-RCP6.0/SSP2 (Low Climate Mitigation + Middle of the Road Development), AIM-RCP6.0/SSP3 (Low Climate Mitigation + Regional Rivalry Development), MESSAGE-RCP8.5/SSP3 (Business as Usual Emissions + Regional Rivalry Development)38,39,64,67. For each of the six scenarios we aimed for 2500 runs of 365 days, each day measuring the number of spill-overs, the number of secondary cases associated with each spill-over, and the geographical areas affected. This allowed us to measure likelihood of spill-overs leading to small, medium and very large outbreaks, and also to determine the geographical areas with the highest risk of experiencing cases. We also noted the destination of any flights out of Africa that contained infected people.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.