Creating Map Visualizations in <10 lines of Python

13-10-07 Update: Please see the Vincent docs for updated map plotting syntax. A new post about maps (with improved examples!) can be found here.

import vincent world_countries = r'world-countries.json' world = vincent . Map ( width = 1200 , height = 1000 ) world . geo_data ( projection = 'winkel3' , scale = 200 , world = world_countries ) world . to_json ( path )

One of my goals when I started building Vincent was to streamline the creation of maps as much as possible. There are some excellent Python map libraries out there- see Basemap and Kartograph for more fun with maps. I highly recommend both of those tools, as they are both quite good and very powerful. I wanted something a little simpler, that relies on the power of Vega and allows for simple syntax- point to geoJSON files, specify a projection and scale/size, output the map.

For instance, layering sets of map data in order to create more complex maps:

vis = vincent . Map ( width = 1000 , height = 800 ) #Add the US county data and a new line color vis . geo_data ( projection = 'albersUsa' , scale = 1000 , counties = county_geo ) vis + ( '2B4ECF' , 'marks' , 0 , 'properties' , 'enter' , 'stroke' , 'value' ) #Add the state data, remove the fill, write Vega spec output to JSON vis . geo_data ( states = state_geo ) vis - ( 'fill' , 'marks' , 1 , 'properties' , 'enter' ) vis . to_json ( path )

Additionally, choropleth maps were begging for a binding to the Pandas DataFrame, with data columns mapping directly to map features. Assuming a 1:1 mapping from geoJSON features to column data, the syntax is very straightforward:

#'merged' is the Pandas DataFrame vis = vincent . Map ( width = 1000 , height = 800 ) vis . tabular_data ( merged , columns = [ 'FIPS_Code' , 'Unemployment_rate_2011' ]) vis . geo_data ( projection = 'albersUsa' , scale = 1000 , bind_data = 'data.id' , counties = county_geo ) vis + ([ "#f5f5f5" , "#000045" ], 'scales' , 0 , 'range' ) vis . to_json ( path )

This isn't without a little data wrangling and transformation- the user needs to ensure that there is a 1:1 mapping of keys in the geoJSON to row keys in the Pandas DataFrame. Here is what was required to get a clean DataFrame for mapping for the previous example: our county data is a csv with FIPS code, county name, and our economic data (column names withheld):

00000 , US , United States , 154505871 , 140674478 , 13831393 , 9 , 50502 , 100 01000 , AL , Alabama , 2190519 , 1993977 , 196542 , 9 , 41427 , 100 01001 , AL , Autauga County , 25930 , 23854 , 2076 , 8 , 48863 , 117.9 01003 , AL , Baldwin County , 85407 , 78491 , 6916 , 8.1 , 50144 , 121 01005 , AL , Barbour County , 9761 , 8651 , 1110 , 11.4 , 30117 , 72.7

And our county polygons in a geoJSON with FIPS codes as the id's (thanks to the folks at Trifacta for this data). The actual polygons have been truncated here for brevity, see the example data for the complete dataset:

{ "type" : "FeatureCollection" , "features" :[ { "type" : "Feature" , "id" : "1001" , "properties" :{ "name" : "Autauga" } { "type" : "Feature" , "id" : "1003" , "properties" :{ "name" : "Baldwin" } { "type" : "Feature" , "id" : "1005" , "properties" :{ "name" : "Barbour" } { "type" : "Feature" , "id" : "1007" , "properties" :{ "name" : "Bibb" } { "type" : "Feature" , "id" : "1009" , "properties" :{ "name" : "Blount" } { "type" : "Feature" , "id" : "1011" , "properties" :{ "name" : "Bullock" } { "type" : "Feature" , "id" : "1013" , "properties" :{ "name" : "Butler" } { "type" : "Feature" , "id" : "1015" , "properties" :{ "name" : "Calhoun" } { "type" : "Feature" , "id" : "1017" , "properties" :{ "name" : "Chambers" } { "type" : "Feature" , "id" : "1019" , "properties" :{ "name" : "Cherokee" }

We need to match the FIPS codes and ensure that the matches are exact, or Vega won't zip the data properly:

import json import pandas as pd #Map the county codes we have in our geometry to those in the #county_data file, which contains additional rows we don't need with open ( county_geo , 'r' ) as f : get_id = json . load ( f ) #Grab the FIPS codes and load them into a dataframe county_codes = [ x [ 'id' ] for x in get_id [ 'features' ]] county_df = pd . DataFrame ({ 'FIPS_Code' : county_codes }, dtype = str ) #Read into Dataframe, cast to string for consistency df = pd . read_csv ( county_data , na_values = [ ' ' ]) df [ 'FIPS_Code' ] = df [ 'FIPS_Code' ] . astype ( str ) #Perform an inner join, pad NA's with data from nearest county merged = pd . merge ( df , county_df , on = 'FIPS_Code' , how = 'inner' ) merged = merged . fillna ( method = 'pad' ) >>> merged . head () FIPS_Code State Area_name Civilian_labor_force_2011 Employed_2011 \ 0 1001 AL Autauga County 25930 23854 1 1003 AL Baldwin County 85407 78491 2 1005 AL Barbour County 9761 8651 3 1007 AL Bibb County 9216 8303 4 1009 AL Blount County 26347 24156 Unemployed_2011 Unemployment_rate_2011 Median_Household_Income_2011 \ 0 2076 8.0 48863 1 6916 8.1 50144 2 1110 11.4 30117 3 913 9.9 37347 4 2191 8.3 41940 Med_HH_Income_Percent_of_StateTotal_2011 0 117.9 1 121.0 2 72.7 3 90.2 4 101.2

And now we can quickly generate different choropleths:

vis . tabular_data ( merged , columns = [ 'FIPS_Code' , 'Civilian_labor_force_2011' ]) vis . to_json ( path )

That's not telling us much other than that LA and King counties are both very large and very populous. Let's look at median household income:

vis . tabular_data ( merged , columns = [ 'FIPS_Code' , 'Median_Household_Income_2011' ]) vis . to_json ( path )

Certainly some high income areas on the east coast and in other high density areas. I bet this would be more interesting on the city level, but that will have to wait for a future post. Lets quickly reset the map and look at state unemployment:

#Swap county data for state data, reset map state_data = pd . read_csv ( state_unemployment ) vis . tabular_data ( state_data , columns = [ 'State' , 'Unemployment' ]) vis . geo_data ( bind_data = 'data.id' , reset = True , states = state_geo ) vis . update_map ( scale = 1000 , projection = 'albersUsa' ) vis + ([ '#c9cedb' , '#0b0d11' ], 'scales' , 0 , 'range' ) vis . to_json ( path )

Maps are a passion of mine- this is one area where I really want to make Vincent very strong, including the ability to easily add points, markers, etc. If you the reader have any features you would like to see with regarding to mapping, please post an issue on Github