Civil Statistician » R, and kindly contributed to Want to share your content on R-bloggers? [This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

(Video link here, in case the embedded player doesn’t work for you.)

Idea: see if I can mimic the idea behind Ben Schmidt’s lovely video of ocean shipping routes, and apply it to another dataset. But which?

“Hmm… what’s another interesting dataset about some competitors traveling around a mostly-fixed area at the same time?… Hey friends, stop giving me election news, I need to think of an idea… Oh.”

Get data: scraped from the Washington Post’s interactive graphic of 2012 Presidential Campaign Stops. (Initially I asked the Post for the data, but they are unable to share it, under their agreement with the data provider. However, it turns out you can find it by poking around inspecting the website elements…)

However, this data doesn’t tell me whether the candidates went home after all the day’s events or kept traveling. To keep it simple, I’m only plotting the travel between event locations — but in reality, Obama should have many more zips to and from DC, and likewise for Romney and Boston.

Clean and prepare the data: urgh.

Use RJSONIO to read the JSON-format data into R and parse it. Subset down just to Obama and Romney, remove tele-events, etc.

to read the JSON-format data into R and parse it. Subset down just to Obama and Romney, remove tele-events, etc. Create an initial dataframe with a row for each event, and columns for: candidate , date , place , eventofday (whether it was the day’s 1st, 2nd, etc. event), and nrevents (total number of events that day).

There were about 450 total events at time of data-scraping, so this dataframe has 450 rows.

, , , (whether it was the day’s 1st, 2nd, etc. event), and (total number of events that day). There were about 450 total events at time of data-scraping, so this dataframe has 450 rows. Geocode the locations of each event (see below), storing the latitude and longitude.

Create a new dataframe for each candidate, with a row for each timestamp that we’ll use in our images. I decided to use 24 frames per day, and represent each trip in 6 frames. (I admit I deleted one event, since there was exactly one day that had 5 events in it, and my code’s much simpler if all days have no more than 4 events so each trip takes 6 frames…) So, for each candidate and each day: If the candidate doesn’t travel that day, add 24 rows that leave the dot where it is. On days when the candidate does travel, create 6 new rows for each trip. Use gcIntermediate() from the geosphere package to find the great-circle route (hat tip: Nathan Yau’s great-circles tutorial). If they take fewer than 4 trips, add extra rows at the last location so there are 24 total rows for that day.

The lubridate package is very helpful for parsing dates, incrementing the date by one day, etc.

package is very helpful for parsing dates, incrementing the date by one day, etc. Finally, add a column for candidate , and another for faraway so that we can flag Romney’s trip overseas — we’ll show the dot moving but we won’t lay down path on that trip since it’ll shrink the rest of the map. See around 0:30 to 0:40 in the video.

, and another for so that we can flag Romney’s trip overseas — we’ll show the dot moving but we won’t lay down path on that trip since it’ll shrink the rest of the map. See around 0:30 to 0:40 in the video. rbind() the two candidates’ timestamped dataframes into a combined one that we’ll pass into ggplot(). Each the timestamped dataframes has about 3600 rows (24 hours * 150-day period).

Geocoding the data: I basically used the geocoding example code from r-chart.com, and looped through all the event locations, except that

I added Sys.sleep() to pause every few requests, and

to pause every few requests, and for some locations, Google returns multiple approximate lat-long pairs, so I made sure to take just the first instead of keeping all of them

Plot the data: based on the code from Ben Schmidt’s comment here. Instead of using a for-loop to generate all the images, I’m using saveHTML() in the animation package. This lets me see what the animation will look like without having to go through ffmpeg (see below), so I can test out changes on just the first 100 frames and preview the revised animation.

Animate the plots: using ffmpeg, the same free tool Ben Schmidt used. (Here’s the Windows download in case you have trouble finding it. It’s a great tool, but I wish the site layout and documentation were more newbie-friendly.)

I put a copy of ffmpeg.exe in the directory where saveHTML() put all the images. And I also put a copy of the mp3 there. Then at the command line, I run (all on one line):

ffmpeg -y -f image2 -r 40500/1001 -i "travelplot%d.png" -i BumbleBee.mp3 -sameq TravelMaps.mpg

Tweak the numerator in 40500/1001 to change the frame rate (hence speed and length) of the video. Remove “ -i BumbleBee.mp3 ” if you have no music.

Upload to Vimeo: Vimeo.com has nice video upload and editing features, although be prepared to wait a while — after the upload is complete, you have to wait for them to convert it to their own format before it’s actually posted & viewable.

Ideas for improvement:

I’m not sure why the last few video frames are missing after the Vimeo upload. Perhaps I need to add several extra copies of the final frame, i.e. make a few extra copies of the last image before going through ffmpeg. I’ll try this when I update it with the final week’s data.

Put both candidates’ info in the title at the top, but keep color-coding. However, as far as I can tell, the title has to be all one color in ggplot2. There may be some complicated way to do this with grid? Let me know if you have suggestions.

Color-code the states over time too, as they are visited by each candidate: blue after Obama has campaigned there, red if Romney has, purple if both have, yellow if neither? I suspect this would call for another big data-munging effort.