Investigating Global Temperature Trends with BigQuery and Tableau

Using the NOAA GHCN and GSOD datasets

The analysis in today’s Today I Learned with BigQuery was performed by @savio_lawrence using BigQuery, Tableau, and the NOAA GHCN and GSOD datasets to see what observations we can make about changes in the average temperatures recorded at weather observation stations around the world.

Before we dive-in, note that while interesting, this analysis by itself is insufficient to draw conclusions for something as complex as climate change or global warming.

Scientific organizations and researchers work hard to account for the challenges we’ll highlight throughout the analysis, as well as combining datasets like these with remote sensing observations to obtain global and consistent coverage of the Earth.

One of the beauties of having access to such vast datasets is opportunity to explore some of the data that underpins scientific research to find areas for potential future exploration and analysis.

Using BigQuery and Tableau to Analyze Climate Data

Using BigQuery we processed and summarized 196 tables (74GB+ with over 2.5 billion records) from the GHCN and GSOD public datasets.

NOAA’s Global Historical Climatology Network (GHCN) is an integrated database of climate observations subjected to a common suite of quality assurance reviews.

The NOAA Global Surface Summary of the Day (GSOD) dataset includes weather observations from a much larger set of stations (over 9,000), but these haven’t undergone any review. As a result they tend to be less reliable than the GHCN observations.

We used the SQL statement below to perform the following functions, and applied it to both datasets.

Find the yearly average temperature per station using the aggregate function.

function. Group stations into latitude bands.

Group years into decades using conditional statements and the like operator.

operator. Finding the number of readings per year for each station, using the count function.

SELECT

ID AS STN,

PERIOD_TS,

NUMOFDAYS,

AVERAGE_TMP_C,

LATITUDE,

LONGITUDE,

NAME,

STATE,

CASE

WHEN LATITUDE >= 66 AND LATITUDE <=90 THEN

"ARCTIC CIRCLE"

WHEN LATITUDE >= 23 AND LATITUDE < 66 THEN

"NORTHERN TEMPERATE ZONE - BETWEEN TROPIC OF CANCER & ARCTIC CIRCLE"

WHEN LATITUDE >= 0 AND LATITUDE < 23 THEN

"NORTHERN TROPICS - BETWEEN TROPIC OF CANCER & EQUATOR"

WHEN LATITUDE >= -23 AND LATITUDE < 0 THEN

"SOUTHERN TROPICS - BETWEEN TROPIC OF CAPRICORN & EQUATOR"

WHEN LATITUDE >= -66 AND LATITUDE < -23 THEN

"SOUTHERN TEMPERATE ZONE - BETWEEN TROPIC OF CAPRICORN & ANTARCTIC CIRCLE"

WHEN LATITUDE >= -90 AND LATITUDE < -66 THEN

"ANTARCTIC CIRCLE"

END AS STN_ZN,

IF (NUMOFDAYS >= 300, "GREATER THAN 300", "LESS THAN 300") AS NUMOFDAYS_IND,

CASE

WHEN YEAR LIKE "192%" THEN "20S"

WHEN YEAR LIKE "193%" THEN "30S"

WHEN YEAR LIKE "194%" THEN "40S"

WHEN YEAR LIKE "195%" THEN "50S"

WHEN YEAR LIKE "196%" THEN "60S"

WHEN YEAR LIKE "197%" THEN "70S"

WHEN YEAR LIKE "198%" THEN "80S"

WHEN YEAR LIKE "199%" THEN "90S"

WHEN YEAR LIKE "200%" THEN "2000S"

WHEN YEAR LIKE "201%" THEN "2010S"

END AS DECADES

FROM (

SELECT

A.ID,

A.PERIOD_TS,

COUNT(*) AS NUMOFDAYS,

ROUND(AVG(A.VALUE/10),1) AS AVERAGE_TMP_C,

B.LATITUDE,

B.LONGITUDE,

B.NAME,

B.STATE,

A.YEAR

FROM (

SELECT

*,

TIMESTAMP_TRUNC(TIMESTAMP(date), YEAR) AS PERIOD_TS,

CAST((EXTRACT(YEAR from DATE)) as String) AS YEAR

FROM

`bigquery-public-data.ghcn_d.ghcnd_*` ) A

INNER JOIN

`bigquery-public-data.ghcn_d.ghcnd_stations` B

ON

A.ID = B.ID

WHERE

A.ELEMENT = 'TAVG'

GROUP BY

B.LATITUDE, B.LONGITUDE,

B.NAME, B.STATE, A.ID, A.PERIOD_TS, A.YEAR)

Once we summarized the data — along with the derived attributes including latitude zone, decade, and station observation count — we used Tableau to visualize the results.

Tableau allows us to provide a SQL statement to use as an input into a dashboard, and provides multiple visualization techniques including:

Map view with colored labeling to help visualize temperature trends by station location.

A bar chart with reference lines to plot average temperatures by decades.

Line charts to plot temperature averages by year.

Using this approach, we created simple dashboards that made it easy to understand our results, and perform more detailed analysis.

Both sets of data indicate a trend of global temperature increase since the 1970s

The graphs below show a gradual increase in net average global temperature each decade using both the GSOD and GHCN station data.