Empowering Apple Mobility Trends Reports with BigQuery and Data Studio

Let’s super-power Apple Mobility reports’ data: Decision makers around the world are already using BigQuery and Data Studio to fight the current crisis. With the tools in this post we are going to help them get the best out of this dataset and our tools.

Update 2020–04–24: Step-by-step video

Check out my live chat with Yufeng G for the best practices for BigQuery with Data Studio:

Sample results

The official Apple dashboard for this dataset shows results like this:

Here we are going to build a Data Studio dashboard, where we can see the same data like this:

You can see:

Less noise in the trends (with a 7-day avg).

A map highlighting the shown countries and cities.

A table with a summary of the data shown, including the first day when each region had a reduction of >25% of traffic.

Interactive controls that allow you to visualize other transportation types (driving, transit, walking).

Please note that all my analysis about the curves in these charts is for our amusement only. For real advice and analysis of what’s happening around the world, go find a real public health expert. The goal of this post is how we get the best out of the tools available to us, so we can then help domain experts do their work.#vizresponsibly

Comparing Singapore, Tokyo, Milan, SF, and NY

dashboard

Of these 5 cities, Milan was the first to have a sharp drop off on the Apple Mobility reports: Feb 28.

Singapore had an earlier drop off (Feb 9), but way less drastic than Milan’s. This wasn’t until late in March and April — when the government had to apply more drastic measures.

Tokyo didn’t have a noticeable drop off in traffic — the curve goes down only after the situation worsened in early April.

San Francisco and New York exhibit similar curves, with SF leading NY in noticeable changes by 5 days for walking, and by 2 days for driving.

Comparing US cities

SF and NY have similar curves (with SF dropping it some days earlier).

Seattle, Chicago, and Miami shared similar curves.

Traffic in Atlanta has dropped less than these other cities.

Cities that haven’t seen a sharp drop in traffic

Some cities have not seen any meaningful (7-day avg <75%) drop in traffic.

We can see some of these cities here (with San Francisco for comparison):

Latin America vs Barcelona

dashboard

Barcelona’s drop was earlier (March 15) and deeper than any of these cities in LatAm. Still ongoing.

You can see Buenos Aires having the deepest reaction in LatAm.

Santiago, Buenos Aires, and Sao Paulo saw the >25% drop on the same day: March 19.

Colombia shows the least deep reaction (but you should consider we are comparing cities to a country here).

Mexico City was the last one to react: March 20.

Santiago and Sao Paulo are leading the “recovery to normal”. How will this pay off?

—

Let’s review how this was built.

Background

In sharing this dataset, Apple has done some great things:

You can see how cities and countries around the world have reacted to the crisis. You can compare these numbers with the actual measures and cases to see how effective the reactions have been.

They release a daily CSV, while also providing an interactive dashboard for anyone looking for quick results.

Opportunities for improvement:

We would like to JOIN this dataset with other tables . Google’s public dataset program already features an increasing collection of COVID-19 related datasets in BigQuery.

. Google’s public dataset program already features an increasing collection of COVID-19 related datasets in BigQuery. Their official dashboard shows a lot of daily noise . Can we work with 7-day averages instead?

. Can we work with 7-day averages instead? Can we use SQL to rank cities? For example, which were the first cities to enact a stay-at-home policy? Which cities have not done that yet? Which cities have experienced a noticeable rebound in policies?

For example, which were the first cities to enact a stay-at-home policy? Which cities have not done that yet? Which cities have experienced a noticeable rebound in policies? Can we make the geo information clearer? Each row identifies a geographic place by the name of the city only — that doesn’t help when we want to geo-locate the data. We can improve this with some SQL.

Each row identifies a geographic place by the name of the city only — that doesn’t help when we want to geo-locate the data. We can improve this with some SQL. It’s not easy to find the URL to download the daily CSV.

Their data is not tidy. Having one column per day is not the best way to work with datasets, so that will need some cleaning. My previous post shows how to improve these tables with an UNPIVOT UDF in SQL.

How-to

(github repo with the latest)

Loading and updating the daily files in BigQuery

With some curl and jq magic — since there’s no straightforward way to get the download url:



url=$( echo

`)

curl -s $url -o /tmp/latestAppleCovidData.csv

bq load --autodetect --replace fh-bigquery:temp.latestAppleCovidData /tmp/latestAppleCovidData.csv x=$(curl -s https://covid19-static.cdn-apple.com/covid19-mobility-data/current/v1/index.json url=$( echo https://covid19-static.cdn-apple.com`jq -n "$x" |jq -r '.basePath'``jq -n "$x"|jq -r '.regions."en-us".csvPath'`)curl -s $url -o /tmp/latestAppleCovidData.csvbq load --autodetect --replace fh-bigquery:temp.latestAppleCovidData /tmp/latestAppleCovidData.csv

Transform into tidy data



`fh-bigquery.public_dump.applemobilitytrends_initial`

OPTIONS (

description="

) AS CREATE OR REPLACE TABLE`fh-bigquery.public_dump.applemobilitytrends_initial`OPTIONS (description=" https://www.apple.com/covid19/mobility ) AS SELECT geo_type, region, transportation_type, unpivotted.*

FROM `fh-bigquery.temp.latestAppleCovidData` a

, UNNEST(fhoffa.x.cast_kv_array_to_date_float(

fhoffa.x.unpivot(a, '_2020'), '_%Y_%m_%d')) unpivotted

Transform into tidy data, and add useful columns



`fh-bigquery.public_dump.applemobilitytrends`

OPTIONS (

description="

) AS CREATE OR REPLACE TABLE`fh-bigquery.public_dump.applemobilitytrends`OPTIONS (description=" https://www.apple.com/covid19/mobility ) AS WITH data AS (

SELECT geo_type, region, transportation_type, unpivotted.*

, LEAST(-1 + value/100, 0.2) percent

FROM `fh-bigquery.temp.latestAppleCovidData` a

, UNNEST(fhoffa.x.cast_kv_array_to_date_float(

fhoffa.x.unpivot(a, '_2020'), '_%Y_%m_%d')) unpivotted

), annotated_data AS (

SELECT *

, -1+EXP(AVG(LOG(1+percent)) OVER(PARTITION BY geo_type, region, transportation_type ORDER BY date

rows between 6 preceding and current row)) avg7day

, geo_type||transportation_type||region series_id

FROM data

) SELECT *, ROW_NUMBER() OVER(ORDER BY current_percent) rank

FROM (

SELECT *

, (SELECT percent

FROM annotated_data

WHERE a.series_id=series_id

AND date=(SELECT MAX(date) FROM annotated_data)

) current_percent

, (SELECT MIN(date) FROM annotated_data WHERE a.series_id=series_id AND avg7day<-.25) first_drop_date

FROM annotated_data a

)

Some interesting tricks in this query:

7-day rolling average with OVER(... rows between 6 preceding and current row) .

. Squared mean instead of a traditional average to round outliers with -1+EXP(AVG(LOG(1+percent)) .

. Some values have significant jumps above 100% on the earlier day. We are reigning them in with LEAST(-1 + value/100, 0.2) .

. A sub-query with a self-join gets us the first day when traffic was less than 75% of the base traffic.

We create a series id for easier self joins with geo_type||transportation_type||region .

Important considerations

Why is Apple giving access to this data?

From their Terms of Use:

You may use Mobility Trends Reports provided on the Site, including any updates thereto (collectively, the “Apple Data”), only for so long as reasonably necessary to coordinate a response to COVID-19 public health concerns (including the creation of public policy) while COVID-19 is defined as a pandemic by the World Health Organization.

Make sure to read the whole terms on their page. My goal in writing this post is to help anyone using the data released by Apple to coordinate and improve their communities’ response to this crisis. Many groups around the world are already using BigQuery and Data Studio for this purpose, and here I’m doing my best to save them time and get better results.

#vizresponsibly

Before creating any new COVID-19 related dashboard, make sure to read Amanda Makulec “Ten Considerations Before You Create Another Chart About COVID-19":

Interactive dashboard

Check out the interactive Data Studio dashboard.

Next steps

Pending:

A good join to disambiguate cities and countries.

Interesting joins with other datasets.

Learn my best practices for Data Studio and BigQuery

This Friday at noon PST, live on YouTube. Learn some best practices for working with Data Studio and BigQuery. With me as the guest on the Yufeng G show:

(12PM PST, 3PM EST, 8PM London)

Want more?

Check Google’s public dataset program, featuring an increasing collection of COVID-19 related datasets in BigQuery:

I’m Felipe Hoffa, a Developer Advocate for Google Cloud. Follow me @felipehoffa, find my previous posts on medium.com/@hoffa, and all about BigQuery on reddit.com/r/bigquery.