Let’s try to repeat the following analysis in BigQuery and see how easy it is to do geospatial analysis in BigQuery GIS and BigQuery public datasets.

We’ll follow the general path of this article, with modifications as needed.

First, we’ll skip all the installation steps, nothing needs to be installed! OK, I’m cheating a bit here — if you want to use bq command line to upload data, you need to install Google Cloud SDK. If you use UI — you can skip it too.

Second, the article describes loading shape file. We can do this — see https://medium.com/@mentin/loading-large-spatial-features-to-bigquery-geography-2f6ceb6796df but we’ll skip it for this article, as BigQuery public dataset already has the tables we need, bigquery-public-data.geo_us_boundaries.zip_codes for zip codes, and bigquery-public-data.geo_us_census_places.places_washington for city boundaries:

SELECT place_geom

FROM `bigquery-public-data.geo_us_census_places.places_washington`

WHERE place_name = 'Seattle'

To find out zip codes within Seattle area, we’ll join and compute area of intersection, just like the original article:

CREATE TABLE demo.seattle_zip_codes AS

SELECT zip_code, zip_geom_within_seattle,

ST_Area(zip_geom_within_seattle)/1e6 AS area_km2

FROM (

SELECT

zip_code,

ST_Intersection(place_geom, zip_code_geom)

AS zip_geom_within_seattle

FROM `bigquery-public-data.geo_us_census_places.places_washington` p

JOIN `bigquery-public-data.geo_us_boundaries.zip_codes` z

ON ST_Intersects(place_geom, zip_code_geom)

WHERE place_name = 'Seattle')

Third, we’ll need pet licenses data — download from https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb as CSV, and upload to BigQuery with UI or with the following command:

bq load --autodetect --skip_leading_rows=1 --schema="Issue_Date, License_Number, Animal_Name, Species, Primary_Breed, Secondary_Breed, ZIP" demo.pets Seattle_Pet_Licenses.csv

Let’s examine our uploaded data, the particular column of interest is ZIP — interestingly it was uploaded as STRING! Let’s figure out why --autodetect did not load it as INT64:

SELECT CAST(ZIP AS INT64) FROM demo.pets;

We’ll get an error (your specific error may vary):

Bad int64 value: 98105-3010

Aha, some zip codes in the pets database were entered in ZIP+4 form. We’ll only need first part, so let’s do a simple ETL with pets data:

CREATE TABLE demo.pets2 AS

SELECT *, split(ZIP, '-')[ordinal(1)] as zip5

FROM demo.pets;

Finally, we are ready to join this data and visualize it in BQ GeoViz:

select

zip5,

cnt,

SAFE_DIVIDE(cnt, area_km2) as cnt_per_km2,

zip_geom_within_seattle geom

FROM

(SELECT zip5, count(*) AS cnt

FROM demo.pets2 GROUP BY zip5) p

JOIN

demo.seattle_zip_codes z

ON p.zip5 = z.zip_code

We now get a picture similar to the one obtained in the article: