Last time we observed how planar-to-spherical conversion can lead to a slightly inconsistent conversion of related shapes, and how this can lead to unexpected result of spatial predicates. E.g. starting with flat map geometries a and b satisfying ST_Covers(a, b) , when we translate them to spherical geographies a' and b' , the expression ST_Covers(a', b') might be false.

Another common case when ST_Covers or ST_Contains gives unexpected result is inconsistent data preparation. E.g. the datasets for states and counties could have been independently prepared by different organizations, or with different precision, and then ST_Covers(state, county) might be false when we expected true, regardless of planar or spherical version.

Let’s discuss some ways to restore spacial hierarchy from fuzzy data.

First, let’s prepare demo data: like in the previous post we’ll use perfect squares (on flat map), but they are distorted when translated to the globe.

We cannot use ST_MakePolygon, since it operates on sphere, let’s create GeoJson version using BigQuery GeoJson translator:



points ARRAY<STRUCT<x int64, y int64>>) AS (



'{"type": "Polygon", "coordinates": [[',

ARRAY_TO_STRING(

ARRAY(SELECT FORMAT('[%d, %d]', x, y) FROM UNNEST(points)),

','),

']]}'))); create function demo.geojson_polygon(points ARRAY >) AS ( ST_GeogFromGeoJson (CONCAT('{"type": "Polygon", "coordinates": [[',ARRAY_TO_STRING(ARRAY(SELECT FORMAT('[%d, %d]', x, y) FROM UNNEST(points)),','),']]}')));

We can now create outer shapes:

create or replace table demo.a AS

SELECT

x + 100 * y AS id,

demo.geojson_polygon([STRUCT(x,y), STRUCT(x+30,y),

STRUCT(x+30,y+30), STRUCT(x,y+30), STRUCT(x,y)]) AS geog

FROM unnest(generate_array(0, 50, 30)) x,

unnest(generate_array(0, 50, 30)) y

This creates four polygons, with an approximate width of 30 degrees in both lon and lat, sitting between 0 and 60 degrees.

demo.b was created by replacing 30 with 10 in the script above to get 36 polygons of width ~10 degrees in the same space.

Let’s check if determining the hierarchy using ST_Covers works:

select count(*) from demo.a a, demo.b b

where st_covers(a.geog, b.geog) -- 18

Ouch, this missed half of our rows in table b . Bonus points if you can guess which 18 we missed :).

Can we use ST_Intersects instead?

select count(*) from demo.a a, demo.b b

where st_intersects(a.geog, b.geog) -- 64

This returns too much. But some of this is due to nested polygon just touching outer polygon. Let’s exclude these:



where st_intersects(a.geog, b.geog) and

not st_touches(a.geog, b.geog) -- 48 select count(*) from demo.a a, demo.b bwhere st_intersects(a.geog, b.geog) and-- 48

Closer, but we want 36 rows, when each shape in b joins only once.

One idea is to sort potential parents by intersection area and take the largest. ARRAY_AGG with LIMIT clause does the trick:

select

b.id,

array_agg(a.id

order by st_area(st_intersection(a.geog, b.geog)) desc

limit 1)[offset(0)] as parent_id,

any_value(b.geog) geog

from demo.a a, demo.b b

where st_intersects(a.geog, b.geog)

group by b.id

After the JOIN, we aggregate by id of b shape, take all the shapes in a intersecting our b , ordering them by intersection area, and take largest one.

If the query above looks too complex, I agree. For simple cases like this, we can take the potential parent if the intersection area is more than half of the child:

select a.id as parent_id, b.id, b.geog from demo.a a, demo.b b

where st_intersects(a.geog, b.geog) and

st_area(st_intersection(a.geog, b.geog)) > 0.5 * st_area(b.geog)

This returns 36 rows. Let’s check them in BQ GeoViz. I’ll draw all b.geog and use parent_id to color them. Perfect now: