Building the data pipeline

BigQuery hosts a variety of different data sets that can be accessed publicly. It’s a wonderful resource for data exploration. In this repository, I found a dataset from NASA that provided data on the locations of global wildfires.

The dataset covered the locations (latitude and longitude), and various attributes of the fire it had detected from the past week. For those wishing to dive a bit deeper, I’ve plagiarised the blurb from the NASA website below. I could not have said it any better.

The Visible Infrared Imaging Radiometer Suite (VIIRS) 375 m (VNP14IMGTDL_NRT) active fire product is the latest product to be added to FIRMS. It provides data from the VIIRS sensor aboard the joint NASA/NOAA Suomi National Polar-orbiting Partnership (Suomi NPP) and NOAA-20 satellites. The 375 m data complements Moderate Resolution Imaging Spectroradiometer (MODIS) fire detection; they both show good agreement in hotspot detection but the improved spatial resolution of the 375 m data provides a greater response over fires of relatively small areas and provides improved mapping of large fire perimeters. The 375 m data also has improved nighttime performance. Consequently, these data are well suited for use in support of fire management (e.g., near real-time alert systems), as well as other science applications requiring improved fire mapping fidelity. Recommended reading: VIIRS 375 m Active Fire Algorithm User Guide

The size of the dataset was approximately 40mb. This is laughably small compared to the terabyte and petabyte-scale datasets Bigquery is capable of handling.

I decided to copy all of the data into a project with a simple SQL statement.

CREATE OR REPLACE TABLE

`as-ghcn.nasa_wildfire.past_week` AS

SELECT

ST_GEOGPOINT(longitude,

latitude) AS longlat,

*

FROM

`bigquery-public-data.nasa_wildfire.past_week`