Walkable neighborhoods are great for health, happiness and economic growth. Cities around the world that want to draw a talented young workforce increasingly focused on creating a good pedestrian experience. How could we measure and map walkability using data science tools?

This blog suggests an approach drawing on Pandana, an excellent Python library developed by Fletcher Foti.

What questions can this answer?

We will be dealing with proximity analysis along a road network. Measuring the density of amenities like shops, office and bus stations ‘as the crow flies’ is trivial; a density map can be produced in GIS software. But how about mapping the walk or drive time from each part of the city to its nearest amenity? Or better still, answering questions like “can most daily errands be completed on foot?” For this, we require network-constrained distances: number of meters along the city streets to reach your destination.

First step: Create a street network object

The branch of math dealing with networks, graph theory, was famously developed by Leonard Euler when he attempted to model routes across the seven bridges of Konigsburg. We can model road networks using the same terminology of nodes (in our case, street intersections) and edges (streets).

Road network analysis in Python has become much easier due to several researchers, notably Foti and Geoff Boeing, creating automated methods to transform OpenStreetMap (OSM) roads into valid graph objects. Their tools Pandana and OSMNx both download and clean up OSM road data through steps like removing points that don’t represent actual intersections (hence are not nodes in the graph theory sense).

Let’s pull in the street network for Casablanca, Morocco:

Network object for central Casablanca

Second step: locate objects of interest

Say we’re developing a healthcare project and want to see which neighborhoods lack access to a clinic with primary care services. We’d need a list of all those sites, with their locations. Let’s make a list of clinics together with some other objects of interest:

Points of interest downloaded from OpenStreetMap for Casablanca (first five records)

Number of POIs by category, downloaded from OSM

Third step: compute distances

This is where conducting geographical analysis in Python really shines as opposed to sitting in front of a GIS package. Pandana is built for speed. First, we’ll pass it a maximum search distance. This allows a key step that speeds up future enquiries: Pandana will build a condensed representation of the network (implemented in C++), allowing rapid calculations within a defined radius of each node. We’ll build a table of distances to the nearest 5 points of interest from a couple of intersections.

Distance in meters to five nearest amenities (two sample intersections).

This done, accessibility analyses for different selected amenities can be done in under a second. (Under the bonnet, two algorithms make this possible: contraction hierarchies and kd-trees.)

Fourth: produce accessibility maps

Next, we can produce accessibility maps. Let’s make some!

As you can see, there are some zones where you have to walk more than 2 kilometers to reach the nearest school — whereas downtown Casablanca has walking distances of less than 400 meters on average.

That’s just for schools though. We could plot the same for clinics, banks or another amenity that’s tagged in OSM.

But what captures the overall concept of ‘walkability’? To me, this returns to the question of ‘can I accomplish most daily tasks on foot’. Unpacking what qualifies as most daily tasks is a complex question: we could build a weighted index of amenities depending on how crucial they are for daily life (eg. access to healthcare is more important than access to nightlife); or we could use empirical data about what amenities people visit most often.

Either way, our workflow using Pandana accommodates this question. Simply list the amenities of interest, and build a weighted index of them. Indeed, the company WalkScore uses similar methods to provide a benchmark of neighborhood quality across the United States (you’ll see it when browsing Zillow).

Accessibility scores can quickly be constructed to answer a given question: whether it’s access to essential services, or walkable neighborhoods that appeal to young workers. For now, let’s weight all amenities equally, and visualize distance to the fifth nearest amenity.

plot_nearest_amenity('all',5)

From network distances to walk time

Let’s make the same outputs for a second city: Minneapolis-St Paul. (The people are nice there, and the gridded streets visualize well.)

Here, we see that plotting a compound measure of accessibility — distance to the fifth nearest amenity — gives a clearer picture of which neighborhoods are most walkable, compared with plotting just the distance to the single nearest cultural amenity (cafe, restaurant or school).

To make the results more interpretable, let’s build a grid of 250 square meter cells behind our road network, attach each cell to its closest node, and translate the distances into walk, drive or public transit time. Minneapolis is not a very walkable city, though some downtown neighborhoods fare well.

Minneapolis: Minutes walk to fifth-nearest amenity

Extensions

The library used here was developed as part of the UrbanSIM project at UC Berkeley, supporting a range of urban planning and real-estate use cases. Good quality, locally produced GIS data can be used instead of OSM (and indeed is preferable).

An important extension: this method is a great input for property price modelling or building-level predictive models. Hedonic price models assume that, for example, home buyers pay more for a home where they can easily walk to a Starbucks or drive to a hospital. Snapping tax lot boundaries (or OSM building footprints) to the network, the same way as we just attached grid squares to it, allows us to derive this data for any given property.

Likewise, machine learning models to predict crime or fire risk require as many attributes as possible to characterize each building’s place in the urban fabric.

Rapid network-constrained queries through this kind of framework can answer many questions — starting with the walkability metrics discussed above.