Let’s discuss how to find an item from some dataset nearest to each item in another dataset. Say we have set of people locations and set of restaurants, and we want to find nearest restaurant for each person.

BigQuery does not have a dedicated Nearest Neighbor TVF, so we’ll use regular spatial join. We cannot use intersection for this problem, so the only option is to use ST_DWithin condition. This means we’ll have to define some bounding search radius, in the example above this might mean we only search for restaurants closer than 10 km. The smaller the limit, the faster the algorithm runs, but returns more points with no neighbor. Once we found all neighbors within search radius, we need to sort them by distance and take the nearest one.

SELECT

a.id,

ARRAY_AGG(b.id ORDER BY ST_Distance(a.geog, b.geog) LIMIT 1)

[ORDINAL(1)] as neighbor_id

FROM people_table a JOIN restaurant_table b

WHERE ST_DWithin(a.geog, b.geom, 100) -- 100 is search radius

GROUP BY a.id

Note this is regular JOIN, so it will drop all points in a for which it does not find any close-enough neighbor. Don’t try to make it LEFT JOIN, BQ GIS doesn’t have spatial optimizations for outer join yet.

Some more hints:

If you need more fields from table b , replace expression b.id inside ARRAY_AGG with STRUCT(b.id, b.extra_field, b.one_more_field)

, replace expression inside with If you need more fields from table a , especially in case of GEOGRAPHY that you cannot use as aggregation key, use ANY_VALUE aggregation function:

SELECT ... , ANY_VALUE(a.geog) ... GROUP BY a.id

If needed, we can then repeat this query with points missed by original search, now using larger search radius, and append it to original result.