Significance Knowing where people are is critical for accurate impact assessments and intervention planning, particularly those focused on population health, food security, climate change, conflicts, and natural disasters. This study demonstrates how data collected by mobile phone network operators can cost-effectively provide accurate and detailed maps of population distribution over national scales and any time period while guaranteeing phone users’ privacy. The methods outlined may be applied to estimate human population densities in low-income countries where data on population distributions may be scarce, outdated, and unreliable, or to estimate temporal variations in population density. The work highlights how facilitating access to anonymized mobile phone data might enable fast and cheap production of population maps in emergency and data-scarce situations.

Abstract During the past few decades, technologies such as remote sensing, geographical information systems, and global positioning systems have transformed the way the distribution of human population is studied and modeled in space and time. However, the mapping of populations remains constrained by the logistics of censuses and surveys. Consequently, spatially detailed changes across scales of days, weeks, or months, or even year to year, are difficult to assess and limit the application of human population maps in situations in which timely information is required, such as disasters, conflicts, or epidemics. Mobile phones (MPs) now have an extremely high penetration rate across the globe, and analyzing the spatiotemporal distribution of MP calls geolocated to the tower level may overcome many limitations of census-based approaches, provided that the use of MP data is properly assessed and calibrated. Using datasets of more than 1 billion MP call records from Portugal and France, we show how spatially and temporarily explicit estimations of population densities can be produced at national scales, and how these estimates compare with outputs produced using alternative human population mapping methods. We also demonstrate how maps of human population changes can be produced over multiple timescales while preserving the anonymity of MP users. With similar data being collected every day by MP network providers across the world, the prospect of being able to map contemporary and changing human population distributions over relatively short intervals exists, paving the way for new applications and a near real-time understanding of patterns and processes in human geography.

Our knowledge of human population numbers and distribution for many areas of the world remains poor (1) despite their importance for policy (2, 3), operational decisions (4), and research (5⇓–7) across many fields. In the 1990s, a growing interest in the global mapping of human populations emerged (8, 9), leading to the advanced development of methodologies that undertake the spatial downscaling of human population count data from censuses summarized over large and irregular administrative units to grid squares of 100 m to 5 km resolution (10⇓⇓⇓⇓⇓–16). Initial efforts to downscale these data used simple areal weighting methods (10, 17) or dasymetric modeling approaches (13⇓–15), which use ancillary layers to redistribute population counts within administrative units (18). Modeling techniques that spatially downscale population numbers into gridded datasets continue to be refined, with basic dasymetric models increasing in sophistication, incorporating multiscale remotely sensed and geospatial data and making improvements in the type of statistical algorithms used in the modeling process (19⇓–21). These detailed population databases have proven crucial for studies reliant on information about human population distributions, typically for calculating populations at risk for human or natural disasters (22⇓–24), to assess vulnerabilities (7, 25), or to derive health and development indicators (3, 5, 26, 27). However, despite improvements, these data still have many limitations.

Regardless of how sophisticated these methods are, they remain largely constrained by population count data from censuses that form the basis for the estimation of population distributions across large areas (10⇓⇓⇓⇓⇓⇓–17). Although the increasing use of global positioning and geographical information system technologies has supported the improved collection of census data and their processing, censuses remain an infrequent and expensive source of detailed population data. Moreover, for many low-income countries, the unreliability of estimates, low spatial resolution, and complete lack of contemporary data represent further limitations. These restrictions mean that the latest health indicators or estimates of populations at risk often may be based on outdated and coarse input population data (26, 28, 29), a particularly restrictive feature when accurate contemporary numbers may be required for disaster impact assessments, epidemic modeling, or conflict relief planning. Human populations are dynamic, moving daily, seasonally, and annually, resulting in rapidly changing densities. Attempts have been made to model and map these dynamics for high-income countries (20, 30), but the data streams upon which such models are based currently are unavailable to most of the world, particularly resource-poor regions.

The proliferation of mobile phones (MPs) offers an unprecedented solution to this data gap. The global MP penetration rate (i.e., the percentage of active MP subscriptions within the population) reached 96% in 2014 (31). In developed countries, the number of MP subscribers has surpassed the total population, with a penetration rate now reaching 121%, whereas in developing countries, it is as high as 90% and continuing to rise (31). MP networks, also called cellular networks, are composed of cells, i.e., geographic zones around a phone tower. Each MP communication can be located by identifying the geographic coordinates of its transmitting tower and the associated cell. This network-based positioning method is simple to implement, and its accuracy depends directly upon the network structure; the higher the density of towers, the higher the precision of the MP communication geolocalization (32). Records detailing the time and associated cell of calls and text messages from anonymous users therefore provide a valuable indicator of human presence, and coupled with the increasing use of MPs, offer a promising alternative data source for increasing the spatial and temporal detail of large-scale population datasets. Data provided by communication tools are opening up new opportunities for studying sociospatial behaviors (33⇓⇓–36). MP call detail records were used in the past for studying human mobility patterns at the individual level (37⇓–39) or for mapping human movements and activities using aggregated data (40⇓⇓⇓–44). Most of these studies focused on specific cities or city neighborhoods or groups, and were aimed at understanding traffic flows (40), mapping the intensity of human activities at different times (42⇓–44), or exploring seasonality in foreign tourist numbers and destinations (45, 46). Population movement analyses based on MP data are particularly promising for improving responses to disasters (47, 48) and for planning malaria elimination strategies (49⇓–51). However, to date, these data have not been assessed in their capacity to map human population at fine spatial and temporal resolutions over large geographical extents.

Using Portugal and France as case studies, this study examines how aggregated MP data might be used efficiently to map population distributions at the country scale and reveal otherwise unmeasurable patterns in space and time. We also assess how such predictions compare with existing state-of-the-art downscaling methods. To facilitate widespread use, the methodologies were designed to be easy to implement while minimizing the impact of phone use and network coverage heterogeneities across social groups, regions, and network providers.

Discussion The increasing penetration of mobile phones and other information and communication tools used daily by a large proportion of the global population offers a wealth of new spatiotemporal data that are contributing to the “big data” revolution. These new data have the potential to profoundly transform the way we think about and conduct science, especially geographical analyses, as most of these data are implicitly or explicitly spatial (59, 60). In operational and governmental decisions, these data also may be valuable for supporting rapid responses to disruptive events or longer-term planning purposes. In the specific application presented here, spatially and temporally detailed population distribution datasets potentially may provide the essential denominator required in many fields, such as studying collective human responses to disease outbreaks (61, 62), emergencies (63, 64), or any application for which information on daily, seasonal, or annual changes in population distribution is useful. This study demonstrates how the analysis of MP data that are collected readily every day by phone network providers can complement traditional census outputs. Not only can population maps as accurate as census data and existing downscaling methods be constructed solely from MP data, but these data offer additional benefits in terms of measuring population dynamics. Further, as highlighted in SI Appendix, section A.3, a combination of both the MP and RS methods facilitates the improvement of both spatial and temporal resolutions and demonstrates how high-resolution population datasets can be produced for any time period. In countries where detailed human population census data are available at high resolution, the main value added is not so much in the gain in spatial resolution, but more in the ability to estimate population numbers and densities at high spatial resolution for any time period. This ability allows us to follow how population distribution changes through time in relation to the week, the season, or any particular event affecting populations over large spatial extents. The relevance of the MP approach is even greater in low-income countries where population distribution data may be scarce, outdated, and unreliable. In Africa, great variation exists in the quality of spatially referenced population data. In Malawi for example, censuses have been performed once per decade for the past three decades and data are readily available at the level of enumeration areas (i.e., administrative units of 9.38 km2 on average). In contrast, in the Democratic Republic of the Congo (DRC), the most recent census was undertaken in 1984 and data are available only at the level of territories (i.e., administrative units of 12,466 km2 on average). However, in the DRC, the MP penetration rate, although biased toward certain demographic groups, is relatively high [69% on average by the end of 2014 in Africa (31)], and the MP approach would produce considerable improvements in current knowledge of how population is distributed in the country. Even if at present the most remote and isolated populations may not have reception in some low-income countries, possibly affecting the ability to produce a comprehensive countrywide map, network coverage continues to grow at a rapid rate everywhere. Applying the approach to countries such as the DRC, where reliable training data may not be available, requires some adjustments and assumptions, particularly regarding the relation between the MP user density and the population density, through estimates for the parameters α and β . This relation indeed may vary among and within countries according to the penetration rate of the network operator and phone use behaviors. Network access costs and cultural differences among countries may, for instance, result in communication via text messages being preferred over calls in some countries. Such differential phone use among countries might largely be accounted for by adjusting total populations by using national population counts. A further complication is that phone use and penetration rates rarely are uniform within countries. In France, the general penetration rate varies from 62.8 in the Franche-Comté region to 117.9 in Ile-de-France, according to the Autorité de Régulation des Communications Electroniques et des Postes (www.arcep.fr; accessed February 2, 2014). Such regional MP ownership information generally is available either from independent bodies such as regulators or phone operators themselves, or may be estimated through national household surveys, such as the Demographic and Health Surveys (dhsprogram.com; accessed April 1, 2014), and give a first indication of potential phone use variations among regions. The spatially stratified cross-validation procedure used here enables assessment of the impact of regional variations on model parameters in Portugal (SI Appendix, section B) and France (SI Appendix, section C.4), as well as the impact of such variation on population mapping accuracies (SI Appendix, section B.3). Spatial variations in phone use behaviors also may be the result of economic, social, demographic, or cultural characteristics that may be spatially clustered, therefore biasing population density estimates. Although a complete analysis of such potential biases is beyond the scope of this study, here we showed that phone use behaviors were relatively stable across space and time in Portugal and that a large part of the variation is correlated with population density and therefore is captured by the coefficient β (SI Appendix, section C.3). To be applied widely and to facilitate the acquisition of MP data, the method outlined here may be simplified by using the density of phone calls instead of the density of different users over a certain time window. Even if the resulting population density datasets are marginally less accurate, this approach allows the method to become independent from user identifier data and further reduces privacy concerns (SI Appendix, section C.2). Similarly, using daily-aggregated data instead of night data again reduces the accuracy of estimates marginally, although notably simplifying the acquisition and processing of MP data. The observed robustness of the MP method offers promise for extension of the mapping to other countries and network providers. However, applying the method to low-income countries where penetration rates are increasing rapidly but still exclude an important fraction of the population would require further sensitivity analyses of the impact of phone use inequalities, especially as marginalized populations also are the most vulnerable to disasters, outbreaks, and conflicts. Mobility estimates in Kenya were found to be surprisingly robust to the substantial biases in phone ownership across different geographical and socioeconomic groups (65), but these results would need to be confirmed for population density estimates. Mobile phone call data records are collected constantly by network providers, but the potential of such data is demonstrated only sporadically. A wider use of such data currently is impeded principally by privacy and data access concerns. The use of call data records does raise important privacy concerns linked to fundamental questions of personal freedom and ethics. Studies of individual mobility patterns provide little anonymity, as the movements of individuals can be reconstructed in time and space, even if spatially and temporally coarsened datasets are used (66). Here, by using only phone call activity aggregated by towers, neither individual data nor connections between towers are used, guaranteeing the privacy of MP users. A facilitated access to anonymized and aggregated forms of these data would greatly improve our knowledge of human population distributions and movements. Network providers sometimes are reticent to share their data because of privacy and marketing concerns. However, this study has shown that aggregated and anonymized MP data might cost-effectively provide accurate maps of population distribution for every country in the world for every month. Partnerships between governments and phone companies supported by appropriate incentives might enable fast and cheap production of population maps in emergency contexts, enabling rapid assessments of populations at risk or those affected by disasters, disease outbreaks, or conflict.

Materials and Methods MP and Population Data. Two large datasets of MP calls obtained from major carriers in Portugal and France were used as proxies for population activity in the countries. Datasets cover the following periods: July to August 2007 and November 2007 to June 2008 (10 mo) for Portugal and May to October 2007 (5 mo) for France. Both datasets contain more than a billion calls from 2 million users in Portugal (∼20% of the total population) and 17 million users in France (∼30% of the total population). According to the operators, their penetration rates were uniform over the country at the time. Only calls were considered here; text messages were excluded. MP contracts from companies were removed from both datasets to include only MP contracts of individuals. For each call, the originating and receiving towers and the day the call was made were obtained. In addition, the time the call was made and a user identifier were available for Portugal only. All data used in this study can be obtained for the replication of results by contacting the corresponding author and are subject to the mobile phone carrier's nondisclosure agreement. Census population data were obtained from the National Institute of Statistics of Portugal for 2011 (www.ine.pt; accessed January 30, 2014) and from the National Institute of Statistics and Economic Studies of France for 2007 (www.insee.fr; accessed January 30, 2014). Census population data were matched to administrative units with identifier codes. For both countries, the finest administrative unit level available (ADM-5) was used, which corresponds to “Freguesias” in Portugal (n = 2,882) and “Communes” in France (n = 36,610). The spatial resolution of administrative units is similar in France and Portugal, with average spatial resolutions (i.e., square root of the land area divided by the number of administrative units) of 3.9 km and 5.6 km, respectively. Mapping People Based on MP Data. For each MP tower j in Portugal, we know the total number of different users T j who made or received phone calls from/to that tower. When one makes a phone call, the network usually identifies nearby towers and connects to the closest one (67). The coverage area of a tower j thus was approximated by using a Voronoi-like tessellation (68). The Voronoi polygon associated with tower j is denoted v j . The MP user density of the polygon v j , denoted as σ v j , then is equal to T j / A v j , where A v j is the area of the Voronoi polygon corresponding to tower j. An illustration of these polygons derived from MP towers is given in SI Appendix, section A.2. The estimation of the population density for an administrative unit c i based on the MP user density σ v j is a two-step method. First, the nighttime (i.e., from 8:00 PM to 7:00 AM) MP user density σ c i for c i is computed with the following equation: σ c i = 1 A c i ∑ v j σ v j A ( c i ∩ ​ v j ) , [1] where A c i is the area of administrative unit c i and A ( c i ∩ ​ v j ) is the intersection area of c i and the Voronoi polygon v j . Second, nighttime MP user density values σ c i assigned to each administrative unit were compared with baseline census-derived population densities available in a training set, denoted as ρ c i . Our approach is modeled as follows: ρ c = ασ c β , [2] where ρ c = [ ρ c 1 , ρ c 2 , … , ρ c n ] and σ c = [ σ c 1 , σ c 2 , … , σ c m ] . The parameter α represents the scale ratio and β the superlinear effect of population density ρ c on the nighttime MP user density σ c . This can be transformed to log ( ρ c ) = log ( α ) + β log ( σ c ) , where a standard linear regression model with population-weighted least squares was applied to estimate the two parameters α and β . The variability of α and β was assessed using standard and spatially stratified cross-validation procedures (SI Appendix, section B.1). Nighttime population densities ρ ˜ c of all administrative units were estimated using Eq. 2, and the total population approximation P ^ was extracted. Nighttime population densities ρ ˜ c then were adjusted to make the total estimated population match the census-derived national population P: ρ c = P P ^ ασ c β . [3] Comparison with the RS Method. To assess the accuracy and precision of the MP method described above, we produced a nighttime population map based on a recently developed dasymetric modeling approach that incorporates a wide range of remotely sensed and geospatial data (called the RS method in this paper; SI Appendix, section A.1). Ancillary data layers were used, including the Corine Land Cover 2006 dataset (69), OpenStreetMap-derived infrastructure (70), satellite nightlights (71), and slope (72), among others (19). The method combines data in a flexible “Random Forest” model to generate gridded predictions of population density at ∼100 m spatial resolution (SI Appendix, section A.1) (19). Analyses have shown that this algorithm produces improved mapping accuracies compared with previous approaches (19). The output prediction layer was used as the weighting surface to perform dasymetric redistribution of the census counts at a country level as follows (SI Appendix, section A.2): ρ i R S = w i ∑ j w j P , [4] where ρ i R S is the population density in pixel i estimated by the RS method, w i is the weight assigned to pixel i, and P is the total population. For comparative purposes, the same spatially stratified training dataset (“Norte” region) was used to estimate nighttime population densities in both the MP and RS methods. To assess the precision and the accuracy of the different population downscaling methods, we extracted the average nighttime population density within each of the finest level census units (ADM-5) as estimated by both methods and compared it with the baseline census-derived population densities ( ρ c ) within each unit by using the Pearson product–moment correlation coefficient (r) and rmse. Extrapolation Capacity. To further explore the stability of population density estimates derived from MP data and the capacity of extrapolation to data-scarce countries, the method was applied to the France dataset. Here, only the daily aggregated phone call activity at each tower was used, without any individual information and without the time of phone calls. This approach had two benefits: (i) it ensured that our population density estimation method required only data that were collected readily and stored by network providers for billing purposes and (ii) the privacy of network customers was preserved further. Uncertainties associated with the use of phone call densities instead of user densities and daily-aggregated MP data instead of nighttime MP data are evaluated in SI Appendix, section C.2. Dynamic Mapping of Population Distributions. Temporal dynamics were derived from MP data by using the timestamp associated with each MP call. Daily dynamics were analyzed by dividing the MP data into calls performed during the day (7:00 AM to 8:00 PM) and the night (8:00 PM to 7:00 AM). Weekly dynamics were analyzed by dividing the MP data into calls performed during weekdays (Monday to Friday) and calls performed during weekends (Saturday and Sunday). Seasonal dynamics were analyzed by dividing MP data into calls performed during the holiday period (July and August) and calls performed during working periods (all other months). Predicted population densities for each unit and for both time periods were computed using best-fit α and β estimates, and relative differences between the two time periods were extracted.

Acknowledgments We thank three anonymous referees for their useful comments on an earlier version of this paper. P.D., C.L. and M.G. are supported by the Fonds National de la Recherche Scientifique (FNRS); part of this work was supported by the FNRS (PDR T.0073.13). A.J.T. is supported by funding from the NIH/National Institute of Allergy and Infectious Diseases (U19AI089674), the Bill & Melinda Gates Foundation (OPP1106427,1032350), and the Research and Policy for Infectious Disease Dynamics program of the Science and Technology Directorate, Department of Homeland Security, and Fogarty International Center, NIH. This work forms part of the WorldPop Project (www.worldpop.org.uk) and Flowminder Foundation (www.flowminder.org).

Footnotes Author contributions: P.D., C.L., S.M., M.G., V.D.B., and A.J.T. designed research; P.D. and C.L. performed research; F.R.S. and A.E.G. contributed new reagents/analytic tools; P.D., C.L., and S.M. analyzed data; and P.D., C.L., M.G., and A.J.T. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1408439111/-/DCSupplemental.