This study was carried out in three phases. The first step involved generating ENMs for each of the two or three inferred chimpanzee populations, which required the acquisition and preparation of chimpanzee presence data from across the study region and the processing of environmental data to define niche dimensions. The second step involved: (i) using quantitative methods to determine whether optimal habitats for the inferred chimpanzee populations differed significantly from each other, and (ii) examining which variables made the largest contributions to differences between niches occupied by each population. The final step involved examining how climate change might affect the optimal habitat of each population in the future.

Preparation of species occurrence data

Species occurrence data (Table 3 and Additional file 3) were obtained from www.ellioti.org [41] and from publications that involved sampling and/or observing wild chimpanzee populations across Cameroon and Nigeria from the late 1990s and early 2000s including both P. t. ellioti (N = 656) and P. t. troglodytes (N = 98) [48-51]. Occurrence data were compiled as geographic coordinates that indicated locations where chimpanzees were seen, heard, and/or indirect evidence of chimpanzee activity was found (nests, feeding sign, or tool use). Fecal and hair samples were shipped to the United States at ambient temperature, then stored at -20°C upon receipt. All samples were transported from Cameroon to the United States in full compliance with Convention of International Trade in Endangered Species of Wild Fauna and Flora (CITES) and Center for Disease Control (CDC) export and import regulations. Analysis of these samples was carried out with IACUC approval from the University at Albany – State University of New York.

Table 3 Species occurrence data Full size table

Duplicate occurrences with the same geographic coordinates were trimmed using ENMtools [29]. Second, an altitude map layer was created and used to trim duplicate occurrences that fell into the same grid cell of 1 km2. The remaining localities were projected in ArcMap 10 [52] for visual inspection to confirm that no more than one occurrence point fell into any one grid cell of the environmental data. Coordinates of occurrence data were then exported as a .csv formatted file for input into the Maxent software [53].

Preparation of present environmental data

Environmental data used for this study are listed in Additional file 4. These environmental predicting factors were selected to best describe the habitat exploited by chimpanzees in Cameroon and Nigeria and included: (i) climatic factors and measures of climate stress such as isothermality and temperature seasonality [54,55], (ii) topographic factors such as elevation, slope, and percent tree cover [56,57], and (iii) anthropogenic presence as measured by human population density across the study area [58]. All environmental predicting factors were based on data gathered from 1994 to 2010, which corresponds to the time range of when all occurrence data were collected. Maps of environmental variables were transformed into the WGS 1984 coordinate projection because it preserves curvilinear features of the data and keeps it from being warped since the study area is within 15 degrees latitude of the equator [59]. This coordinate system also assured that the data retain compatibility with most publically available shapefiles for future projects and applications. All environmental layers used have a resolution of 30-arcseconds (about 1 km2), which was the finest resolution available at the time of publication for these layers at this multi-country scale.

Maxent modeling under present conditions

ENMs were generated using a presence-only model implemented using the program Maxent [53]. This method was chosen for several reasons. Firstly, presence-only models, like Maxent, are useful because presence locality data are becoming more widely available for many taxa. Secondly, absence records are not widely available for chimpanzees and those that are available have often questionable accuracy due to the species’ large home ranges. Thirdly, a large comparative study has shown that the Maxent model outperforms other presence-only models such as GARP in many applications [60]. Finally, Maxent has also performed successfully in recent studies of other elusive and motile species [61-64].

The dataset of occurrence localities (described below) was divided into subsets for two- and three-populations from the inferred genetic structure shown by Mitchell et al. [27]. In the two-population model, occurrence data for P. t. troglodytes were separated from P. t. ellioti according to whether the point occurred north versus south of the Sanaga. The three-population model, included the group of presence points from P. t. troglodytes located south of the Sanaga, and the presence points from P. t. ellioti were subdivided into two groups. The first group was composed of presence points from P. t. ellioti west of the Mbam River, which is the main tributary of the Sanaga and demarcates the boundary of the ecotone. The second group of presence points was from P. t. ellioti located in the ecotone region found east of the Mbam River in central Cameroon.

Models were created using Maxent [53] with the default convergence threshold (10-5) and 100 cross-validated replicates. This cross-validation replicate process involved the random splitting of occurrence data into a number of equal-sized “folds” or groups where models were created leaving out one fold for each run. For each replicate, the excluded fold is used to evaluate the model [53].

Testing model performance

Final models were evaluated using the area under the curve (AUC), which is a value widely used to measure model performance [60,65,66]. In brief, AUC values were created by comparing model performance to a random model of associations between presence localities and environmental predicting factors [66]. AUC values range from 0.5 to 1.0; with values close to 0.5 corresponding to a model that is no better at predicting an ecological niche than a random model, and a value of 1.0 corresponds to a model with a perfect fit. Values greater than 0.9 are ”very good”, 0.7-0.9 are “good”, and less than 0.7 are “uninformative” [28].

A jackknife test was also performed using Maxent to evaluate the individual contribution of each environmental predicting factor to each model. In the jackknife test, the contribution of each factor is tracked while the model is being created. Maxent does this by creating models with one predicting factor removed at a time and compares the jackknifed model gain to the gain of the complete model with all environmental predictors included. The factors that reduce the overall gain of the model when excluded become the most important [53].

ENM comparison testing

Pairwise niche comparisons were carried out in ENMtools [29] to compare the degree of niche overlap between ENMs for both the two- and three-population models. For the three-population model, a round-robin comparison approach was implemented. For each comparison, two test statistics were calculated to estimate the degree of niche overlap: Schoener’s D [29] and the test statistic I, which was developed by Warren et al. [30]. Values of D and I are observed measurements of niche overlap that were used in the following analysis. In an ecological sense, Schoener’s D assumes that the suitability scores produced by Maxent are proportional to species abundance, whereas the test-statistic I, treats the two ENMs as probability distributions [29]. The significance of the observed D and I test statistics were evaluated in ENMtools by randomly partitioning a pooled set of occurrence data from two populations into two new datasets with the same number of occurrences as the original two populations. ENMtools then used these two new pseudo-populations to create ENMs using the Maxent algorithm. The D and I test statistics were then calculated to estimate the degree of overlap between the two new ENMs. A null distribution of values of D and I was created from 100 random pseudo-populations created using ENMtools. The observed values of D and I were then compared to the null distribution of D and I values generated by random permutation. Significant deviations of observed values from the null values indicate that the niches occupied by the two populations under consideration are divergent [29]. The observed overlap values were compared to their respective null distributions using a student t-test in R [67].

Climate change scenarios

The three different scenarios implemented in this study were A1B, A2A, and B2A (Additional file 5). The A1B scenario describes an integrated or homogenous world where economic growth is high and there is a balance between the use of fossil fuels and non-fossil fuels [32]. The A2A scenario describes a heterogeneous world with a steadily increasing human population throughout the century. The B2A scenario describes a divided world similar to the A2A scenario, but with each country or region working independently to reduce their emissions and the human population is steadily increasing throughout the century at a slower rate than the A2A scenario. These three scenarios describe a range of possible results of climate change over the next century that may play a role in the niche availability of chimpanzees in Cameroon and Nigeria.

Preparation of data for future climate modeling

In order to model the distribution of these chimpanzee populations in the future, the following are required: 1) presence localities of chimpanzees in the present time, 2) a set of environmental variables used to describe their habitat for the present time, and 3) a matching set of environmental variables for each year under each climate scenario being explored. Since some measures of the environment cannot be predicted well using climate scenarios, due to other factors such as human disturbance, the projected models of distribution for the chimpanzee populations were created using only the climatic and topographic factors summarized in Additional file 4. For each scenario, bioclimatic files were created for each year being tested. In order to obtain the best mean values for each scenario, bioclimatic files were created for a number of global climate models (GCMs) and averaged for each scenario/year combination. The GCMs used for each scenario were obtained from www.ccafs-climate.org [31]. For any given scenario created by a GCM, minimum temperature (tmin), maximum temperature (tmax), and precipitation (prec) layers were obtained. Next, these three files were used to create the set of 19 bioclimatic files following the methods of Ramirez-Villegas and Bueno-Cabrera [68]. This was performed for each GCM for each climate scenario/year combination. Finally, environmental factors from each set of GCMs for a given scenario/year combination were averaged using ArcMap 10 for use in Maxent.

Maxent modeling procedure under future climate scenarios

Modeling population distribution under climate change with Maxent is similar to modeling present distributions, and requires the same present occurrence coordinates and present environmental predictor variables [69-71]. However, modeling future climate scenarios additionally requires a matching sets of environmental variables for each time interval and climate scenario be specified for all populations under consideration. Maxent models the probability distribution for the present variables, as usual, to build a set of criteria that describes suitable habitat for the present time, and then examines future environmental variables for areas across the study area that best meet the species’ niche requirements. This analysis was completed by averaging 100 randomly-seeded replicates using the previously described cross-validation technique.