Creating a Choropleth Map of the World in Python using GeoPandas

There are different ways of creating choropleth maps in Python. In a previous notebook, I showed how you can use the Basemap library to accomplish this. More than 2 years have passed since publication and the available tools have evolved a lot. In this notebook I use the GeoPandas library to create a choropleth map. As you'll see the code is more concise and easier to follow along.

To allow for better comparison of the 2 approaches I'll again use a World Bank dataset, but to make it a bit more interesting the map will show the number of Individuals using the Internet (% of population) based on the International Telecommunication Union, World Telecommunication/ICT Development Report and database.

Setup

Load the necessary modules and specify the files for input and output, set the number of colors to use, the size of the figure in inches (width, height) and meta information about what is displayed.

%load_ext signature %matplotlib inline import os import matplotlib.pyplot as plt import pandas as pd import geopandas as gpd from helpers import slug datafile = os.path.expanduser('~/data/worldbank/API_IT.NET.USER.ZS_DS2_en_csv_v2.csv') shapefile = os.path.expanduser('~/data/geo/naturalearthdata.com/ne_10m_admin_0_countries_lakes/ne_10m_admin_0_countries_lakes.shp') colors = 9 cmap = 'Blues' figsize = (16, 10) year = '2016' cols = ['Country Name', 'Country Code', year] title = 'Individuals using the Internet (% of population) in {}'.format(year) imgfile = 'img/{}.png'.format(slug(title)) description = ''' Individuals who have used the Internet from any location in the last 3 months via any device based on the International Telecommunication Union, World Telecommunication/ICT Development Report and database. Data: World Bank - worldbank.org • Author: Ramiro Gómez - ramiro.org'''.strip()

Create a GeoDataFrame from the Admin 0 - Countries shapefile available from Natural Earth Data and show a sample of 5 records. We only read the ADM0_A3 and geometry columns, which contain the 3-letter country codes defined in ISO 3166-1 alpha-3 and the country shapes as polygons respectively.

gdf = gpd.read_file(shapefile)[['ADM0_A3', 'geometry']].to_crs('+proj=robin') gdf.sample(5)

ADM0_A3 geometry 52 CPV (POLYGON ((-2309170.347710123 1587430.58249633... 100 HTI (POLYGON ((-6860253.524149654 1936562.75358100... 253 ZMB POLYGON ((2934183.839812623 -918610.8758938238... 241 VAT POLYGON ((1073492.780621549 4478198.522616305,... 219 SYR POLYGON ((3738892.582840674 3968612.091458879,...

Next read the datafile downloaded from the World Bank Open Data site and create a pandas DataFrame that contains values for Country Code , Country Name and the percentages of Internet users in the year 2016 .

df = pd.read_csv(datafile, skiprows=4, usecols=cols) df.sample(5)

Country Name Country Code 2016 144 Macao SAR, China MAC 81.642985 157 Malta MLT 77.289395 204 Sudan SDN 28.000000 165 Mauritius MUS 53.226178 11 Australia AUS 88.238658

Next we merge the data frames on the columns containing the 3-letter country codes and show summary statistics as returned from the describe method.

merged = gdf.merge(df, left_on='ADM0_A3', right_on='Country Code') merged.describe()

2016 count 201.000000 mean 51.417892 std 28.569949 min 1.177119 25% 25.366301 50% 54.000000 75% 76.409085 max 98.240016

Mapping the data

The merge operation above returned a GeoDataFrame. From this data structure it is very easy to create a choropleth map by invoking the plot method. We need to specify the column to plot and since we don't want a continuous color scale we set scheme to equal_interval and the number of classes k to 9 . We also set the size of the figure and show a legend in the plot.

ax = merged.dropna().plot(column=year, cmap=cmap, figsize=figsize, scheme='equal_interval', k=colors, legend=True)

This is pretty nice already, but before publishing this map, there remains some work to be done. As is often the case, some data is missing. You may or may not have noticed it, but the corresponding countries are not shown at all, look for North Korea. The call to dropna() right before the plot() call removed these records from the plotted GeoDataFrame.

We could just leave it like that, because we simply don't know the values, but I'm sure that would put off some people. So let's draw these countries and fill them with a light gray and a striped pattern as in this D3.js based map.

Moreover, the image taken by itself provides no clue about what is shown, so we'll add a title and an annotation. Also we to turn off the axes, cut off some space in the far west and east, and move the legend to the lower left of the figure, because there is more empty space.

merged[merged.isna().any(axis=1)].plot(ax=ax, color='#fafafa', hatch='///') ax.set_title(title, fontdict={'fontsize': 20}, loc='left') ax.annotate(description, xy=(0.1, 0.1), size=12, xycoords='figure fraction') ax.set_axis_off() ax.set_xlim([-1.5e7, 1.7e7]) ax.get_legend().set_bbox_to_anchor((.12, .4)) ax.get_figure()

I think this map is fine for publication and the code is pretty easy to follow, but there is some room for improvement as far as I'm concerned. For example reducing the left and right margins, turning off axis display and specifying a color and/or pattern for missing data via parameters to the plot method would be nice to have. In any case, I think the GeoPandas project is headed in a good direction and hope it will continue to evolve as a library for analyzing and mapping geographic data in Python.

%signature

Author: Ramiro Gómez • Last edited: March 23, 2018

Linux 4.13.0-37-generic - CPython 3.6.4 - IPython 6.2.1 - matplotlib 2.2.2 - numpy 1.14.2 - pandas 0.22.0

Shirts for Python Programmers