We don’t want multiple columns, each for one date. We want to have (date, value) pairs. Since this problem seems so common, I wrote two BigQuery persistent UDFs to solve this:

fhoffa.x.unpivot()

fhoffa.x.cast_kv_array_to_date_float()

Let’s review how they work.

Unpivot with fhoffa.x.unpivot()

Just give unpivot() a full row, and the regex of how the name of each of the columns to unpivot look.

With the Apple tables:

SELECT a.geo_type, region, transportation_type, unpivotted

FROM `fh-bigquery.public_dump.applemobilitytrends_20200414` a

, UNNEST(fhoffa.x.unpivot(a, '_2020')) unpivotted

Unpivotting Apple Mobility Trends

With the JHU tables:

SELECT province_state, country_region, unpivotted

FROM `bigquery-public-data.covid19_jhu_csse.confirmed_cases` a

, UNNEST(fhoffa.x.unpivot(a, '_[0-9]')) unpivotted

Unpivotting JHU tables

That’s so much better, but we are not done. How do we transform these values into dates and numbers?

Cast arrays with cast_kv_array_to_date_float()

We can re-cast our unpivotted columns with cast_kv_array_to_date_float() .

What’s even more “infuriating” when casting these columns to dates is that they use different formats for encoding dates. You don’t have to worry as the UDF can take the date format as an input too.

For example, with the Apple tables:

SELECT a.geo_type, region, transportation_type, unpivotted.*

FROM `fh-bigquery.public_dump.applemobilitytrends_20200414` a

, UNNEST(fhoffa.x.cast_kv_array_to_date_float(fhoffa.x.unpivot(a, '_2020'), '_%Y_%m_%d')) unpivotted

Unpivotting and casting Apple Mobility Trends

And with the JHU tables:

SELECT province_state, country_region, unpivotted.*

FROM `bigquery-public-data.covid19_jhu_csse.confirmed_cases` a

, UNNEST(fhoffa.x.cast_kv_array_to_date_float(fhoffa.x.unpivot(a, '_[0-9]'), '_%m_%d_%y')) unpivotted

Unpivotting and casting the JHU tables

See? These results look way tidier than the starting tables.

Bonus: The covid19_usafacts.confirmed_cases table

Once we have these 2 UDFs, applying them to other tables becomes really easy:

SELECT county_fips_code, county_name, state, state_fips_code, unpivotted.*

FROM `bigquery-public-data.covid19_usafacts.confirmed_cases` a

, UNNEST(fhoffa.x.cast_kv_array_to_date_float(fhoffa.x.unpivot(a, '_[0-9]'), '_%m_%d_%y')) unpivotted

Unpivotting the`covid19_usafacts.confirmed_cases` table

How-to

Check my previous post about Persistent UDFs in BigQuery:

The source code for these 2 UDFs is:



AS ((

# https://medium.com/@hoffa/how-to-unpivot-multiple-columns-into-tidy-pairs-with-sql-and-bigquery-d9d0e74ce675

SELECT

ARRAY_AGG(STRUCT(

REGEXP_EXTRACT(y, '[^"]*') AS key

, REGEXP_EXTRACT(y, r':([^"]*)\"?[,}\]]') AS value

))

FROM UNNEST((

SELECT REGEXP_EXTRACT_ALL(json,col_regex||r'[^:]+:\"?[^"]+\"?') arr

FROM (SELECT TO_JSON_STRING(x) json))) y

)); CREATE OR REPLACE FUNCTION fhoffa.x.unpivot(x ANY TYPE, col_regex STRING)AS ((SELECTARRAY_AGG(STRUCT(REGEXP_EXTRACT(y, '[^"]*') AS key, REGEXP_EXTRACT(y, r':([^"]*)\"?[,}\]]') AS value))FROM UNNEST((SELECT REGEXP_EXTRACT_ALL(json,col_regex||r'[^:]+:\"?[^"]+\"?') arrFROM (SELECT TO_JSON_STRING(x) json))) y));

AS ((

# https://medium.com/@hoffa/how-to-unpivot-multiple-columns-into-tidy-pairs-with-sql-and-bigquery-d9d0e74ce675

SELECT ARRAY_AGG(STRUCT(SAFE.PARSE_DATE(date_format, key) AS date, SAFE_CAST(value AS FLOAT64) AS value))

FROM UNNEST(arr)

)); CREATE OR REPLACE FUNCTION fhoffa.x.cast_kv_array_to_date_float(arr ANY TYPE, date_format STRING)AS ((SELECT ARRAY_AGG(STRUCT(SAFE.PARSE_DATE(date_format, key) AS date, SAFE_CAST(value AS FLOAT64) AS value))FROM UNNEST(arr)));

The secret motor behind this function: Transforming a whole row into JSON with TO_JSON_STRING() and then doing a REGEXP_EXTRACT_ALL over it.

Historical note

My previous solution to UNPIVOT in BigQuery has received than 5k views on Stack Overflow:

Next steps

Once I write documentation for these functions, and we settle on their definitive name — I’ll submit them to our shared repository with community UDFs ( bqutil ).

Want more?

Check Google’s public dataset program, featuring an increasing collection of COVID-19 related datasets in BigQuery:

I’m Felipe Hoffa, a Developer Advocate for Google Cloud. Follow me on @felipehoffa, find my previous posts on medium.com/@hoffa, and all about BigQuery on reddit.com/r/bigquery.