Abstract OpenStreetMap, a crowdsourced geographic database, provides the only global-level, openly licensed source of geospatial road data, and the only national-level source in many countries. However, researchers, policy makers, and citizens who want to make use of OpenStreetMap (OSM) have little information about whether it can be relied upon in a particular geographic setting. In this paper, we use two complementary, independent methods to assess the completeness of OSM road data in each country in the world. First, we undertake a visual assessment of OSM data against satellite imagery, which provides the input for estimates based on a multilevel regression and poststratification model. Second, we fit sigmoid curves to the cumulative length of contributions, and use them to estimate the saturation level for each country. Both techniques may have more general use for assessing the development and saturation of crowd-sourced data. Our results show that in many places, researchers and policymakers can rely on the completeness of OSM, or will soon be able to do so. We find (i) that globally, OSM is ∼83% complete, and more than 40% of countries—including several in the developing world—have a fully mapped street network; (ii) that well-governed countries with good Internet access tend to be more complete, and that completeness has a U-shaped relationship with population density—both sparsely populated areas and dense cities are the best mapped; and (iii) that existing global datasets used by the World Bank undercount roads by more than 30%.

Citation: Barrington-Leigh C, Millard-Ball A (2017) The world’s user-generated road map is more than 80% complete. PLoS ONE 12(8): e0180698. https://doi.org/10.1371/journal.pone.0180698 Editor: Mohammad Ali, Johns Hopkins Bloomberg School of Public Health, UNITED STATES Received: August 18, 2016; Accepted: June 20, 2017; Published: August 10, 2017 Copyright: © 2017 Barrington-Leigh, Millard-Ball. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The primary data are publicly available without special privileges from the OpenStreetMap Foundation (https://planet.openstreetmap.org/planet/full-history). Other data sources are also all globally publicly available without special access, and are also included or linked in our Supporting Information files. Supporting information and data for this paper are available at http://sprawl.research.mcgill.ca/publications/PLOS2017roads and http://github.com/cpbl/osm-completeness. Anyone with trouble accessing data or software can contact either author directly. Funding: This work was funded by Social Sciences and Humanities Research Council of Canada Award Number: 435-2016-0531 | Recipient: Christopher Barrington-Leigh; Hellman Fellows Program Award Number: None | Recipient: Adam Millard-Ball; and UC Santa Cruz Committee on Research Award Number: None | Recipient: Adam Millard-Ball. None of these funders had any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction The world’s roads, and their extent and spatial distribution, have enormous implications for economic growth, urban development patterns, access to natural resources, and global climate change. Road transportation accounts for more than 80% of passenger travel [1] and nearly 20% of greenhouse gas emissions from fuel combustion [2]. Moreover, roads represent one of the most permanent commitments to how and where we will live in the future [3]. Accessible, complete, and accurate geospatial data on the world’s road network are therefore valuable not just for trip planning and navigation, but also for understanding questions as diverse as the drivers of deforestation [4] and urban poverty reduction [5]. Yet until recently, no global map nor global accounting of these roads existed. Google Maps and similar proprietary products do not permit geospatial analyses such as calculating road lengths. A new effort to map global roads—the Global Roads Open Access Data Set—meanwhile focuses only on inter-urban roads, and does not cover city streets [6]. Even basic cross-national data on the length of roads are lacking. One review from 1998 notes that the data derived from the International Road Federation (IRF) World Road Statistics [7] and UN statistical yearbooks “are patchy, with frequent gaps and many large changes that are often quickly reversed … it appears impossible to construct data that are consistent either across countries or over time” [8]. The extent to which the data have improved in recent years is unclear, and IRF’s sources for road network length for many countries are missing or incomplete. OpenStreetMap, an ambitious open-data initiative that has emerged and grown rapidly in recent years, promises to fill this gap. Just as Wikipedia provides a volunteer-written encyclopedia, OpenStreetMap (OSM) provides a free, openly licensed, volunteer-contributed repository of geographic information. OSM launched in 2004 with a focus on streets and roads, and has subsequently expanded to map buildings, land uses, points of interest and other geographic features [9]. As of May 2017, ∼3.8 million contributors had created a database with ∼411 million roads, coastlines, administrative boundaries and other linear features known as “ways” [10]. Applications of OSM to date include humanitarian mapping following earthquakes, epidemics and other disasters [11], hydrological modeling [12], downscaling of population estimates to small geographic areas [13], research on diverse subjects from urban morphology to urban farming [14, 15], and even adult coloring books [16]. The usefulness of OSM for these purposes, however, depends on the completeness of the data and other aspects of data quality. As discussed later in this section, research has found that most Western countries which have been assessed appear to have a relatively complete road network in OSM. The picture in low-income countries, however, is much more uncertain. Researchers, policy makers, or citizens who want to make use of OSM road data, therefore, have little information about the extent upon which OSM can be relied. The absence of a global completeness assessment, meanwhile, hampers the use of OSM for research in economics, urban planning, environmental studies and related fields, such as analyses of worldwide patterns of travel behavior or urban development. Moreover, the benefits of OSM may be greatest in low-income countries where completeness is most uncertain, given the relative lack of official or commercial alternative geographic data products. Most quality assessments of OSM and other Volunteered Geographic Information (VGI) datasets perform a comparison with an official government or proprietary reference dataset (e.g. [17, 18, 19, 20]). Normally, the length and position of the features in both datasets are compared, although there are other approaches such as comparing the output of routing algorithms (e.g. [21, 22]; for a more comprehensive review, see [23]). Initially, researchers asked about the completeness of the OSM road network, the positional accuracy of the data, and the accuracy of attributes that indicate the type of road, speed limits, turn restrictions, and other information. Some studies continue to focus on completeness, for example through improving computational techniques that can compare OSM to a reference dataset [20]. However, by 2011, others had already noted that OSM research was shifting away from completeness assessments and towards the accuracy of attribute information, such as the opening times of points of interest [24]. More recently, studies have examined the quality of OSM data on building footprints [25], bicycle or pedestrian infrastructure [21, 26], points of interest [27], place names [28], and the classification of areal features [29]. This shift, however, may be somewhat premature, given that research has focused on Europe and North America, and the completeness of the OSM road network in most of the world is unknown. While early assessments found significant gaps [18, 19, 30, 31], more recent studies of European countries have found that the network is virtually complete, and is comparable to or better than official or proprietary data sources [17, 22]. The same does not appear to be true, however, in other parts of the world, such as China, Tehran and Brazil [32, 33, 34]. The only global effort that sheds light on the completeness in the OSM road network quantifies the number of changes to roads in a geographic area, and identifies where saturation has been reached, as defined by a growth rate of ≤ 3% for three or more years [35]. However, by focusing on the number of changes (new additions or edits), this approach does not distinguish between the addition of new roads versus minor edits that update attributes or make small improvements to positional accuracy, or between major versus minor additions. Moreover, this approach can say little about the completeness of areas that have not reached saturation. The definition of saturation in [35] is also restrictive; the authors find that only 11% of Europe (by land area) has reached saturation, even though the country-level studies noted above imply that completeness is likely to be much greater. In addition to the lack of information on the level of completeness, there is little evidence that helps explain the considerable heterogeneity in completeness, and other aspects of data quality, between and within countries. Some countries and regions are better mapped than others, but the reasons are still unclear. In one U.S study, there is no detectable relationship between OSM data quality and demographic variables, possibly because such a small percentage of the population contributes to OSM, and because many edits are done by users who do not live locally [36]. European-focused studies have noted that dense areas appear to be better mapped in OSM [17, 19], presumably because there are more potential contributors with local knowledge. However, local contributions are only one manner through which the OSM database expands. Imports from official or proprietary data sources, and responses to humanitarian crises help to promote completeness [35]. OSM contributors also gather at “mapping parties” and other social events to make focused updates [37]. The most frequent contributors to OSM have contributed edits in more than one country, perhaps through tracing aerial imagery, or as a result of a vacation or other trip abroad [38]. In this paper, our objective is first to assess the completeness of the OSM road network, worldwide. We provide country-level estimates of completeness that are derived from two independent data sources. Note that we restrict our attention to completeness, a fundamental measure of geographic data quality, and do not assess positional accuracy or other measures commonly employed in the literature. Second, we aim to shed light on the reasons for the global heterogeneity in completeness, and help explain why some geographic regions are more complete than others. Third, we provide new estimates of the total length of road for each country in the world, and offer a comparison between the OSM-derived roadway stock and official statistics and World Bank data.

Methods The simplest way to assess completeness, and the method used by most OSM completeness studies to date, is to compare the OSM database to a comparison dataset from an authoritative source. At the global scale of our analysis, however, no comparison dataset of real roads exists. Most lower-income countries have no readily available data from a national cartographic agency or similar organization. Commercial mapping products such as Google Maps have restrictive licenses, and may not be complete themselves in parts of the world. We therefore assess OSM completeness through two complementary approaches—(i) a visual comparison with aerial imagery, and (ii) fitting parametric models to the historical growth of the OSM street network. Armed with our estimates of completeness, we then estimate the length of road network in each country, through dividing the existing length of mapped roads in OSM by our estimated fraction complete. Visual assessment Sampling and assessment procedures. Our visual assessment is based on a stratified and probability-weighted sample of 45 points in each country. We implement our own sampling algorithm in the QGIS geographic analysis software to (i) select a random point and (ii) overlay streets in the OSM database against aerial or satellite imagery provided by Google through the OpenLayers plugin, at a scale of 1:5000. The number of missing street edges in the visible area (i.e., the screen view centered around the sampled point) is manually counted, and the script automatically counts the number of street edges already present in the OSM database. Here, we use the term “edge” in its graph-theoretic sense to denote the portion of a street between two nodes (intersections). The imagery includes streets from Google Maps, which aids in identifying roads where the image was low-resolution or obscured by trees. However, the main source is the actual aerial image, given that our observations indicate that the Google Maps data themselves are not complete in many parts of the world. An example is shown in Fig 1. In a small number of these cases we supplemented the Google imagery with the imagery from Bing, which is also available through the OpenLayers plugin. In order to focus our sampling efforts, we exclude 56 small dependencies, principalities and unrecognized countries, such as American Samoa, Greenland, Palestine and North Cyprus, which account for 0.2% of the global population. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 1. Example visual assessment. Street data from OSM is overlaid on satellite imagery of Kuwait City, Kuwait. Here, the network is 99% complete, with 2 out of 300 edges missing. The red lines indicate street edges in the OSM database. The green lines (highlighted with a white oval) are missing edges. While Google imagery was used in our actual inspection procedure of streets, lower-resolution public-domain imagery is shown in the figure. A version of this figure with the original imagery is available from the authors. Satellite imagery source: Landsat. Road network data source: OpenStreetMap. https://doi.org/10.1371/journal.pone.0180698.g001 Due to the geographic projection used, the size of the area included in a single observation varies with distance to the equator, but is approximately 1.2 km2 at 45 degrees latitude in France. In total, we assess 8370 observations, with a mean of ∼72 street edges in the OSM database and ∼10 missing edges per observation. The visual assessment was based on a February 2015 version of OSM. Therefore, we update the fraction complete to account for additions between February 2015 and January 2016. For example, if our February completeness estimate for a country was 60%, and road length grew by 10% between February and January, our updated estimate would be 66%. Most of the land area of most countries is sparsely populated, but most roads are in urban areas. A simple random sample would be likely to exclude urban areas, while a sample limited to urban areas would ignore the lower-density areas where OSM may be less complete. Therefore, we adopt a two-part sampling approach with the aim of reducing the variance in our estimates. The first sub-sample consists of a probability-weighted random sample of 25 points from each country, with selection probabilities proportional to the natural log of population density of each point. Population and density estimates are taken from the 2013 Landscan population distribution dataset, and population and density refer to that of the 30-arc second (∼1 km2) grid cell within which the sampled point lies [39]. Points with zero population are ignored. For the second sub-sample, we take a simple random sample of 20 points, restricted to densely populated areas where the point exceeds a country-specific density threshold. The population density of rural areas varies considerably throughout the world, as does the definition of “urban.” In Canada, the United States and India, for example, places defined as urban must have a density of at least 400 persons km-2, while the density threshold is 150 persons km-2 in Malta, 500 in the Philippines and 1500 in China [40]. Therefore, we approximate the urban density threshold d* using , where P is the population of a country, f is the fraction of population that is urbanized in that country (using World Bank data), p i and d i are the population and density of each point i obtained from the 2013 Landscan population distribution dataset [39], and I(⋅) is an indicator function. In the United States, for example, d* = 1165 persons km−2, while in India, d* = 11400 persons km−2. Given our complex sampling design, we estimate the completeness of each country based on the inverse sampling probability-weighted totals of (i) OSM street edges (the numerator), and (ii) OSM plus missing street edges (the denominator). Confidence intervals are estimated via a nonparametric bootstrap procedure. We focus on the number of edges rather than road length, partly for feasibility of counting, and partly because edges are the natural units of additions to the OSM network. Since missing edges tend to be shorter than those already present in the OSM database (see Section 2 of the S1 Appendix), our results for “edge completeness” will underestimate the “length completeness” of OSM. The analysts worked with a set of guidelines to ensure consistency in the definition of a road, and thus which edges were counted as missing. For example, driveways were ignored, as were unpaved paths leading to fields, and roads that are platted but have not yet been constructed. However, some degree of judgment was inevitable in the visual assessment. Although an exact match is not possible, the aim was to be as consistent as possible with the set of roads considered in the parametric modeling discussed below. When counting which edges were already included in the OSM database, only those tagged with the following highway tags were considered: motorway, motorway_link, trunk, trunk_link, primary, primary_link, secondary, secondary_link, tertiary, residential, road, unclassified, or living_street. For example, driveways (excluded from the visual assessment) are generally tagged in OSM as service and would be excluded from the set of roads that we consider in the main analysis. Similarly, unpaved paths are generally tagged as track and would be similarly excluded. Multilevel estimates of visual assessments. The bootstrapping procedure gives wide confidence intervals, because of the limited sample size within each country, and the wide variation in the number of edges and completeness across a country. To improve precision, we use a multilevel regression and poststratification (MRP) model [41], which draws on information from similar countries to provide tighter and more accurate confidence bounds than is possible when considering a country-level sample in isolation. Data are partially pooled across countries based on country-level covariates such as GDP and Internet access. The MRP model has found particular relevance within political science and survey research, where its estimates are characterized by less error, higher correlations and lower variance [41, 42]. The MRP model has two further advantages beyond its statistical properties. It allows us to estimate the impacts of grid-cell density and the country-level covariates on the completeness of the OSM database. It also enables us to make out-of-sample estimates of completeness at the grid-cell level, not just at the country-level, and to illustrate the intra-country heterogeneity. The first step of MRP is the multilevel regression, as in [43]. At the local (30-arc second grid cell) level, our predictor is population density. At the country level, our four predictor variables are GDP per capita (at purchasing power parity), Internet penetration (proportion of Internet users), population size, and the World Bank’s “voice and accountability” governance indicator, which “captures perceptions of the extent to which a country’s citizens are able to participate in selecting their government, as well as freedom of expression, freedom of association, and a free media” [44]. Population and GDP enter in log form. All country-level data are from the World Development Indicators and Worldwide Governance Indicators published by the World Bank [45], with imputation for countries with missing data. The full data set is provided as supplementary information. Formally, for each observation i in country j ∈ {1, …, m}, we observe the number S ij of road edges in the OSM database, and the real number of edges T ij , and estimate the following. At the first level: (1) (2) where d ij is the local population density and f is the logit link function. At the second level, the coefficients are drawn from a distribution as in Eq 3. Importantly, the coefficients are not a deterministic function of the country-level covariates, but rather are drawn from a distribution that is centered on those covariates: (3) where β j , α are vectors of length 6 (given that there are six grid-cell coefficients for each country j, β j1 … β j6 ); γ, Z j are m×6 matrices of coefficients and country-level covariates; and Ω is the variance-covariance matrix. The model is estimated in a Bayesian framework using the open-source PyStan software [46]. We run the model for 10,000 iterations spread across ten independent chains. Half of the iterations are used for burn-in and the remainder are thinned to every fifth iteration, giving us a usable sample of 1,000 iterations. The Bayesian framework is primarily used for computational reasons, and our weak priors (Cauchy(0,2) based on standardized coefficients) are designed to help convergence rather than to incorporate prior information. Almost identical results are obtained from a weaker Cauchy(0,5) prior. The second step of the MRP process is to apply the estimates out-of-sample to the entire globe. Based on the grid-cell level Landscan densities and the country-level coefficients β j1 … β j6 , we estimate the number of road edges and the fraction complete in each 30-arc second grid cell. The country-level completeness estimates are then calculated as the mean completeness of each grid cell within that country, weighted by the estimated number of edges. Saturation of contributions We employ a second, novel method of estimating completeness which relies only on details in the underlying OSM database itself. The total length of road mapped in a given region has a natural maximum. That is, the summed length of all roads in a region must converge to the actual extant length. Postulating that growth in road length in OSM is characterized in each country by growing interest at the beginning and saturation at the end, we approximate the time series of contributed length with a sigmoid shape. From the asymptote of the sigmoid, we infer the actual length of all roads. We are not the first to use a saturation criterion for the rate of changes; however, previously [35] an arbitrary threshold rate was used to indicate saturation (≤ 3% for each time interval over three or more years), while we allow for country-specific saturation levels to emerge from the model. The OSM history dataset [47] provides a record of each version of each object in the OSM database, including objects that were subsequently deleted. The exceptions are objects whose original contributor did not agree to a license change in 2012; about 1% of data was lost as a result [48]. We use a custom Python script to extract every version in the contribution history of every node (i.e., each geolocated point) and every way (a linear sequence of nodes) that is tagged “highway,” which is a generic attribute for a roadway, including pedestrian paths and trails. We obtain the time stamp of each roadway (including its deletion date, if applicable), calculate its length, and identify the country where it is located using a spatial query against boundary data [49]. In this way, we build up a time series of the total road length rendered in each region. For the length calculation and the country lookup, we use a PostgreSQL/PostGIS spatial database. We provide our Python code under an open-source license (see S1 Appendix), allowing interested readers to replicate and/or update our findings. In the main analysis in this paper, we restrict ourselves to roadways that are intended for vehicle circulation; these ways are further tagged motorway, motorway_link, trunk, trunk_link, primary, primary_link, secondary, secondary_link, tertiary, residential, road, unclassified, or living_street. However, we also show the growth in non-vehicle roadways, which largely consist of pedestrian paths. For clarity, we refer to “roads” and “other paths” in the remainder of this paper, where other paths are defined as roadways that do not have one of the above tags. In order to estimate the growth and saturation of street coverage, we fit parametric models to the road length time series. While mostly monotonic, additions to road length are occasionally sudden, as opposed to steady. This is likely due to various kinds of bulk data imports (e.g. US government TIGER road data), the release of new aerial imagery which OSM contributors can trace [50], and “mapping parties” targeting localized areas. In order to accommodate these jumps, we use nonlinear least squares optimization to fit flexible functional forms which include up to four jumps superposed on a smooth sigmoid shape. From several such shapes as well as a linear growth model, we choose the best fitting functional form for each country, as measured by a mean-squared error criterion. These models are specified in detail in the S1 Appendix. We follow the same process for two types of sub-national information. We fit parametric models to the road length time series at (i) the highest sub-national administrative level from GADM, such as U.S. states, German Länder and South African provinces; and (ii) each country-specific quintile of the distribution of grid-cell densities. We choose the best-fitting sigmoid functional form for each sub-national administrative unit and quintile. Incorporating subnational information in this way provides an independent check on the parametric fitting, in the sense that the sub-national asymptotes, as estimated from the fits, should add up to the country-level asymptote. Combining the estimates We have two estimates of completeness for each country. The visual assessment is likely to be accurate but imprecise, while the parametric fit is precise but may detect a false saturation level (for example, due to a temporary hiatus in additions to the OSM database). We therefore combine the estimates as follows. In the 61 countries where the estimates match (i.e., the parametric estimate lies within the 95% confidence interval of the multilevel estimate, or where the difference between the estimates is 0.05 or less), we use the parametric fit. In the other 124 countries for which both estimates exist, we use the multilevel estimate derived from the visual assessment. We also use the parametric fit in a further 68 countries, accounting for 0.3% of the global population, where no multilevel estimate is available. This is normally because we did not conduct the visual assessment for the reasons discussed in Visual assessment. Our combined completeness estimates, coupled with the existing length of roads in the OSM database, provide the opportunity to make new estimates of the total length of the road network in each country, as in Eq 4. We exclude countries (all 5 of which are small-island states) where completeness is estimated at < 0.05. (4)

Discussion and conclusions As the capabilities of Geographic Information Systems grow, and more spatial data becomes available through GPS receivers and official sources, there becomes an even greater need for publicly available sources of base data layers, particularly roads. While proprietary systems such as Google Maps may be suitable for trip planning and similar applications, they cannot be used for most research and analytic purposes. Our results show that in many parts of the world, OpenStreetMap (OSM) already fills that niche, and that about 42% of countries in the world are more than 95% complete. In other parts of the world, the OSM database is growing rapidly. At the global level, we find that the world’s road network is ∼83% complete (81%-84% with 95% confidence). Our results show that in many places, researchers and policymakers can rely on the completeness of OSM, or will soon be able to do so. In other regions, our results help to bracket the uncertainty. Our results can be used to assess the fitness for purpose of OSM in individual countries, a contribution that is especially important given that there is wide variation even within low-income countries. At one extreme, we estimate that less than one-third of the streets in China, Egypt and Pakistan are in the OSM database, compared to more than 95% in Cuba, Ecuador and Syria as well as most European and North American countries. Moreover, our methods can be used to track the country-by-country saturation of contributions, and identify the point at which more countries become complete. Because in many places OSM may now be the most authoritative data available even to local governments and agencies, better knowledge of its completeness is essential if it is to be relied on for planning and development purposes. In addition, knowledge of the completeness of the existing data can indicate where further mapping efforts should be directed, for example in emergency situations where humanitarian agencies already make significant use of OSM. For researchers, sufficient meta-knowledge including data completeness is necessary when using OSM road data for modeling of urban automobility and travel/transportation behavior, and local and climate-related emissions, among other outcomes in places where government or authoritative data are not readily available. More broadly, our findings demonstrate a technique which may be generally useful for assessing the development and saturation of Volunteered Geographic Information and crowd-sourced data [35, 36, 51, 52]. Flexibly modeling a modified sigmoid curve can capture a variety of processes typical of user contributions, such as business listings, genetic databases, or encyclopedia and dictionary entries. Equally importantly, we provide a new country-level dataset of road length that, unlike IRF’s World Road Statistics, is fully transparent and easy to update. Despite their advertised limitations, the IRF data are the basis for dozens of empirical papers in development economics [53], transportation [54, 55] and energy policy [56], and also appear to underlie other statistical compilations. In its World Development Indicators series, the World Bank sources the data on road length to IRF. The CIA World Factbook [57] does not cite individual sources, but the data are very similar to those published by the IRF. Thus, until now, there has been no obvious alternative to the IRF data. Particularly in the poorest countries, we find that road supply is nearly 40% larger than suggested by IRF. In the world as a whole, our findings indicate a total road length of 39.7×106 km or nearly 6 m per capita. Road length and road length per capita have important applications in the global study of economic development, transportation patterns, and pollution. For instance, using values of 5–25 kg CO 2 e year−1 km−1 [58] for life cycle emissions from petroleum-based and cementatious road surfaces, global annual emissions associated with the construction and maintenance of roads amortized over the lifetime of the road is on the order of 100–500 MT CO 2 e year−1. In places where completeness is already very high, changes in the OSM road database may even be used to indicate new road development. Our findings also shed light on the factors that support the development of a crowd-sourced geographic database. Contributing to OSM requires access to the Internet, sufficient general resources such as education, geospatial expertise and leisure time to be able to contribute, and laws which permit the creation of non-government maps. In addition, the availability of open and accessible government information may facilitate importation of existing data to OSM. For example, most of the US road network was originally imported from the US Census Bureau TIGER files [59]. As expected, the most dense parts of the world have a relatively complete OSM network, likely because the most dense cities are home to many potential mappers. More surprisingly, we find a ∪-shaped relationship, with the best-mapped areas found at both ends of the density spectrum. In other words, not just the most dense but also the least dense areas are well mapped—perhaps because interurban roads are easy to trace from satellite imagery, or are already available through other sources. Consistent with intuition, we also find that countries scoring high on governance indicators and those with good Internet access tend to be more complete, and that small countries tend to be more complete than large ones. The open governance indicator may relate to the availability of geographic data, and even the ability of private citizens to undertake mapping efforts. China, for example, restricts private surveying and the publication of geospatial information. Surprisingly, we find that income does not have a strong, independent effect on OSM completeness. There are also some notable outliers, such as Haiti and Nepal, where intense mapping efforts followed humanitarian disasters. Overall, however, the use of satellite imagery means that OSM contributors can be located in far-flung locations, and many contributions, in particular to sites targeted by humanitarian aid, are made from remote locations [37]. Thus, country-level factors have only limited predictive power and the wide confidence intervals as well as endogeneity concerns mean that our results here should not necessarily be given a causal interpretation. A complete road network is only the first step in the development of an openly licensed geographic database. For some applications, the usability of OSM will depend on other aspects of data quality, such as positional accuracy, and the presence of tags that indicate road names, speed limits, and other attributes. In principle, our methods can be applied to these other metrics of data quality as well. For example, the percentage of streets that are named and have other attribute information should saturate over time. The same is true for OSM data on buildings, pedestrian paths and points of interest, and the reliability of the fitted curves can be complemented with a visual assessment. By quantifying the completeness of voluntary contributions to geographic information, the effort of the nearly 4 million contributors can be harnessed for broader purposes.

Supporting information S1 Appendix. Further details on methods, results, and source data. A separate PDF outlines all associated resources. These resources are permanently available at https://alum.mit.edu/www/cpbl/publications/PLOS2017roads. https://doi.org/10.1371/journal.pone.0180698.s001 (PDF)

Acknowledgments We are grateful for excellent research assistance from Tabitha Fraser, Brandon Nyo, Matthew Tenney and Pam Rittelmeyer, and for helpful suggestions from Renee Sieber, Chris Warshaw, Mikel Maron, and the OSM community, as well as two anonymous reviewers. Most importantly, we thank the ∼3.8 million contributors to the OSM dataset. This article uses the LandScan 2012 global population data set from Oak Ridge National Laboratory. See http://web.ornl.gov/sci/landscan/datasets/LS2012.ris for restrictions and legal notifications regarding these data.