From rivers, roads and pipelines to electrical and telecommunication lines, many geographical systems are composed of collections of elements that are connected in space. Therefore, from studying and understanding human activities1 to studying the relationship between different urban areas2, conducting these studies through the lens of graphs and networks can shed light on many complex characteristics of the elements that are built upon them3,4. This approach is rooted in the origins of the field of Graph Theory developed in the 18th century by Euler and his Seven Bridges of Königsberg5, and it has been applied widely ever since6–13. The use of graphs is further reinforced in this era of ‘Big Data’, where countless sources of organizational or crowd-sourced data such as locational social media (Twitter geolocations, Facebook check-ins) and geographical data (road networks and public transportation systems through OpenStreetMaps (OSM) (http://www.openstreetmap.org), and businesses location data through Yelp and Google) have become available14.

Nevertheless, when focusing on geographical networks, and specifically on road systems, raw datasets often contain issues that make them unsuitable to conduct proper graph studies. Specifically, there are two main problems; first achieving a topologically correct dataset that represents the actual status of the street network as accurately as possible (topology problem), and second is developing a graph file format that is ready to be analyzed with available software and libraries (file format problem). In direct response to these problems, the main objective of this work is threefold; first to offer a formal protocol to convert Geographic Information Systems (GIS) data into a workable network format; second, to develop a tool to apply this protocol; and third to use the tool on a significant number of road systems and make the results available. For the purpose of this study we applied the tool to the road systems of some 80 most populated urban areas in the world (using available OSM data), and hope to supplement this database with the road networks of more cities in the future.

In the following section, we describe the existing tools and datasets that focus on network analysis of road networks followed by our methodology where, we offer a description of the protocol to effectively address the two problems mentioned earlier (topology and file format). Moreover, we offer a summary of the dataset that we have made publicly available. This is followed by a discussion of common issues that researchers might encounter when utilizing these datasets. Finally, we discuss how researchers can produce this data for any road network they might be interested in. The tool can be used with the commercial software package ArcGIS. The OSM tool takes into account roads that cross but do not intersect such as bridges and ramps when the information is present already in OSM. We have made a second version of the tool available that transforms any line feature that does not include elevation and intersection information (e.g., pipes, rivers, rails) into a spatial planar graph (e.g., a graph with no intersecting edges15). The latest version of the tools can be found on the Data & Tools page of the Complex and Sustainable Urban Networks (CSUN) Laboratory at http://csun.uic.edu/data.html#GISF2E (Accessed Jan 19 2016). Moreover, permanent versions of the tools along with the results for the 80 cities are permanently stored on Figshare (Data Citation 1, Data Citation 2).

There are several existing tools and software that enable researchers to conduct network and graph analyses. ArcGIS Network Analyst and QGIS Network Analysis Library are two popular toolsets, both of which create network datasets from road network files easily. However, the tools only allow users to conduct certain studies, such as shortest path calculations from a series of points to any other points, similar to origin destination matrices. Yet they do not provide a method to measure the whole system through a graph analysis and to calculate various graph metrics such as betweenness and closeness centralities16. Although ArcGIS Network Analyst allows some degrees of topology correction within the software’s ecosystem, there is no straightforward method to convert the network datasets to a workable graph format such as an edge list (i.e., list of edges/links) or an adjacency matrix (i.e., square matrix of all nodes, containing 0 or 1 s when two nodes are connected).

DepthMapX (https://varoudis.github.io/depthmapX/) which comes in the form of standalone software, as well as a plugin for QGIS, allows the user to calculate various network metrics for road systems, but only works for a certain type of graph analysis, Space Syntax, as developed by Bill Hillier17. DepthMapX works with axial maps, which are a specific type of spatial graphs18 as opposed to regular road maps, and takes the input data in many formats including AutoCAD (DWG format).

In contrast to the lack tools to convert a GIS line feature into a network, there are an abundance of libraries and software packages to conduct graph analyses, all varying in how much expertise is required to run them. Gephi19 is a graph analysis software with a simple and intuitive graphical user interface. NodeXL20 enables users to conduct graph analysis from Microsoft Excel. NetworkX21 and igraph22 are libraries for python that enable users to conduct graph analyses with minimal programming background. All of the mentioned libraries and software packages can input a series of standard graph file format such as an edge list and an adjacency matrix as described above.

In regards to the road system graph data, there are some datasets available, to name a few, the Stanford Large Network Dataset Collection (https://snap.stanford.edu/data/#road) consists of many ready graph datasets, which include road graph files for three American states. As well, the school of computing at the University of Utah also has a series of graph files for roads networks available as edge lists23. However, none of these datasets can be imported back in a GIS environment, and no information could be found on how topologically correct they are.

Our toolset and dataset bridge the gap between semi-enclosed ecosystems such as ArcGIS and QGIS, and graph analysis libraries such as Gephi and igraph. This is achieved by providing both shapefiles/feature classes and network edge lists that are connected to each other with unique identifiers. Our dataset enables GIS users to easily conduct graph analyses for road systems of the 80 most populated urban areas in the world, by providing accurate data that can be easily inputted into the various graph analysis libraries listed above. The results can then be imported into a GIS environment to conduct geographic analyses and visualizations24. The provided toolset will enable users to create topologically correct graph edge lists from OpenStreetap (OSM), and planar graph edge lists from any road network shapefile that lacks the required information. The toolset can in fact process any line features, from roads and rail systems, to water conduits, electrical systems and even rivers.