Mapping “world cities” in R







Here at Sharp Sight, we make a lot of maps.

There are a few reasons for this.

First, good maps are typically ‘information dense.’ You can get a lot of information at a glance from a good map. They are good visualization tools for finding and communicating insights.

Second, it’s extremely easy to get data that you can use to make a map. From a variety of sources, you’ll find data about cities, states, counties, and countries. If you know how to retrieve this data and wrangle it into shape, it will be easy to find data that you can use to make a map.

Finally, map making is just good practice. To create a map like the one we’re about to make, you’ll typically need to use a variety of data wrangling and data visualization tools. Maps make for excellent practice for intermediate data scientists who have already mastered some of the basics.

With that in mind, this week we’ll make a map of “world cities.” This set of cities has been identified by the Globalization and World Cities (GaWC) Research Network as being highly connected and influential in the world economy.

We’re going to initially create a very basic map, but we’ll also create a small multiple version of the map (broken out by GaWC ranking).

Let’s get started.

First, we’ll load the packages that we’ll need.

#============== # LOAD PACKAGES #============== library(tidyverse) library(ggmap) library(forcats)

Next, we’ll input the cities by hard coding them as data frames. To be clear, there is more than one way to do this (e.g., we could scrape the data), but there isn’t that much data here, so doing this manually is acceptable.

#=================== # INPUT ALPHA CITIES #=================== df_alpha_plus_plus Now, we'll create a new variable called rating . This will contain the global city rating. Notice that this is a very straightforward use of dplyr::mutate() , one of the tidyverse functions you should definitely master. #======================= # ADD GLOBAL CITY RATING #======================= df_alpha_plus_plus % mutate(rating = 'alpha++') df_alpha_plus % mutate(rating = 'alpha+') df_alpha % mutate(rating = 'alpha') df_alpha_minus % mutate(rating = 'alpha-') Next, we'll combine the different data frames into one using rbind() . #====================================== # COMBINE DATAFRAMES INTO ONE DATAFRAME #====================================== alpha_cities Now that the data are combined into a single data frame, we'll get the longitude and latitude using geocode() . #======== # GEOCODE #======== latlong Once we have the longitude and latitude data, we need to combine it with the original data in the alpha_cities data frame. To do this, we will use cbind() . #============================ # BIND LAT/LONG TO CITY NAMES #============================ alpha_cities % rename(long = lon) alpha_cities #names(alpha_cities) Now we have the data that we need, but we'll need to clean things up a little. In the visualization we'll make, we will need to use the faceting technique from ggplot2 . When we do this, we'll facet on the rating variable, but we will need the levels of that variable to be ordered properly (otherwise the facets will be out of order). To reorder the factor levels of rating , we will use fct_relevel() . #================================================ # REORDER LEVELS OF GLOBAL CITY RATING # - the global city ratings should be ordered # i.e., alpha++, then alpha+ .... # - to do this, we'll use forecats::fct_relevel() #================================================ alpha_cities Because we will be building a map, we'll need to retrive a map of the world. We can get a world map by using map_data("world") . #============== # GET WORLD MAP #============== map_world Ok. We basically have everything we need. Now we will make a simple first draft. #================ # FIRST DRAFT MAP #================ ggplot() + geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) + geom_point(data = alpha_cities, aes(x = long, y = lat), color = 'red')



... and now we'll use the faceting technique to break out our plot using the rating variable. #========================== # CREATE SMALL MULTIPLE MAP #========================== ggplot() + geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) + geom_point(data = alpha_cities, aes(x = long, y = lat), color = 'red') + #facet_grid(. ~ rating) #facet_grid(rating ~ .) facet_wrap(~ rating)



Once again, this is a good example of an intermediate-level project that you could do to practice your data wrangling and data visualization skills. Having said that, before you attempt to do something like this yourself, I highly recommend that you first master the individual tools that we used here (i.e., the tools from ggplot2 , dplyr , and the tidyverse). Sign up now, and discover how to rapidly master data science To master data science, you need to master the essential tools. And to make rapid progress, you need to know what to learn, what not to learn, and you need to know how to practice what you learn. Sharp Sight is dedicated to teaching you how to master the tools of data science as quickly as possible. Sign up now for our email list, and you'll receive regular tutorials and lessons. You'll learn:

What data science tools you should learn (and what not to learn)

How to practice those tools

How to put those tools together to execute analyses and machine learning projects

... and more

If you sign up for our email list right now, you'll also get access to our "Data Science Crash Course" for free .

SIGN UP NOW