Advent of code: SQL + BigQuery

Tens of thousands of coders are working out the Advent of Code challenges in their favorite programming languages. What if I could use this opportunity to highlight the power of SQL and the latest BigQuery features?

Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels.

What if you could use SQL and BigQuery to write a compiler, intersect paths, recursively run calculations, and crack passwords? #adventofcode2019, here we go!

Problem 1: Recursive iterations

Problem, BigQuery solution

Running a recursive function in BigQuery hasn’t been easy — but now we can do loops and general scripting. This is the main cycle I used to solve the second part of the problem :

LOOP

IF (SELECT SUM(x) FROM UNNEST(extra) x)<=0 THEN

LEAVE;

END IF;

SET weight= weight+(SELECT SUM(x) FROM UNNEST(extra) x);

SET extra = (SELECT ARRAY_AGG(fuel(x)) x FROM UNNEST(extra) x);

END LOOP;

And a handy SQL UDF:

CREATE TEMP FUNCTION fuel(x ANY TYPE) AS (

GREATEST(0,FLOOR(x/3)-2)

);

The most interesting problem I had to solve here: I wasn’t able to have a cursor to go row by row, calculating the fuel for each component. So I had to figure out a way of working with every rocket component (input line) in parallel — which made the query way more efficient.

Problem 2: Data as code

Problem, BigQuery solution

Can I write a compiler using SQL and BigQuery? Are we Turing complete yet?

Again, using a scripting LOOP allows me to iterate over an input array — which defines code to be executed as read from the loop:

LOOP

IF(input[OFFSET(code_pos)]=99)

THEN LEAVE;

END IF;

SET input = (SELECT parse(input, code_pos));

SET code_pos = code_pos+ 4;

END LOOP;

The most interesting part is how I had to use ARRAY_AGG() and CASE to rewrite the contents of each array as instructed:

SELECT ARRAY_AGG(k) FROM (

SELECT

CASE n

WHEN x[OFFSET(off+3)] THEN

CASE x[OFFSET(off+0)]

WHEN 1 THEN x[OFFSET(x[OFFSET(off+1)])]+x[OFFSET(x[OFFSET(off+2)])]

WHEN 2 THEN x[OFFSET(x[OFFSET(off+1)])]*x[OFFSET(x[OFFSET(off+2)])]

END

ELSE xx

END k

FROM UNNEST(x) xx WITH OFFSET n

ORDER BY n

And then the most difficult problem I had to face — how to do this 99*99 times to find the solution to part 2? It’s not pretty, but it involves arrays of arrays and doing a slow search through the (luckily) linear space.

Warning: It takes more than 2 minutes to search through 200 possible solutions. But since the particular statement is linear, you can jump to the right result.

Problem 3: Where will our paths meet

Problem, BigQuery solution

This is a fun map traversal problem — for the solution I used the ability to generate SQL arrays detailing each step:

FROM [...], UNNEST(GENERATE_ARRAY(x-move.x, x, IF(move.x>0,1,-1))) arr_x WITH offset n1, UNNEST(GENERATE_ARRAY(y-move.y, y, IF(move.y>0,1,-1))) arr_y WITH OFFSET n2

The join between paths to find intersections is pretty elegant:

USING(arr_x, arr_y)

A handy function parses the input and generates the 2-d movement instructions:

CREATE TEMP FUNCTION parse(x ANY TYPE) AS (

CASE REGEXP_EXTRACT(x, '^(.)')

WHEN 'D' THEN STRUCT(0 AS x,-parse_n(x) AS y)

WHEN 'R' THEN (parse_n(x),0)

WHEN 'U' THEN (0,parse_n(x))

WHEN 'L' THEN (-parse_n(x),0)

END

);

I’m proud of the ability to re-factor the path generating code, to make the solution easy to look at:

SELECT ABS(arr_x) + ABS(arr_y) distance, (a.rn + b.rn) wire

FROM (

SELECT * FROM UNNEST(route('L996,D167,R633,...'))

) a

JOIN (

SELECT * FROM UNNEST(route('R995,U982,R941,...'))

) b

USING(arr_x, arr_y)

WHERE a.rn>0 AND b.rn>0

ORDER BY wire

LIMIT 1

Problem 4: The password is…

Problem, BigQuery solution

The code here ended up being pretty short! I wish I had written less manual array positions, but doing so took me less time that figuring the general case:

SELECT *

FROM (

SELECT CAST(FORMAT('%i%i%i%i%i%i',n0, n1, n2, n3, n4, n5) AS INT64) number

FROM (

SELECT n0, n1, n2, n3, n4, n5

FROM UNNEST(GENERATE_ARRAY(1, 5)) n0

, UNNEST(GENERATE_ARRAY(1, 9)) n1

, UNNEST(GENERATE_ARRAY(1, 9)) n2

, UNNEST(GENERATE_ARRAY(1, 9)) n3

, UNNEST(GENERATE_ARRAY(1, 9)) n4

, UNNEST(GENERATE_ARRAY(1, 9)) n5

WHERE n1>=n0

AND n2>=n1

AND n3>=n2

AND n4>=n3

AND n5>=n4

AND (

(n1=n0 AND n1!=n2)

OR (n2=n1 AND n1!=n0 AND n2!=n3)

OR (n3=n2 AND n2!=n1 AND n3!=n4)

OR (n4=n3 AND n3!=n2 AND n4!=n5)

OR (n5=n4 AND n4!=n3)

)

)

)

WHERE number BETWEEN 109165 AND 576723

As you can see GENERATE_ARRAY() helped create the space to look for a solution.

Day 5

As I write these thoughts, the problem still hasn’t been posted…

Well… now it has, but I’m going home. Can you solve it in the meantime?

Update: The problem is a continuation of day 2. Building this “Turing machine” was fun on day 2, but I don’t feel like doubling down on it.

Day 6: Graph traversal

Problem, BigQuery solution

Going through the graph didn’t take much effort with the ability to script — but it highlights that BigQuery can be slow when a function has to go through many iterations. In this case it took ~20 minutes to process 547 statements. As a reminder: BigQuery is incredible at processing Terabytes of data, but not necessarily at solving general computing problems.

LOOP

SET steps = steps+1

;

CREATE OR REPLACE TEMP TABLE planets AS

SELECT DISTINCT planet

FROM (

SELECT origin planet FROM t1 WHERE dest IN (SELECT planet FROM planets)

UNION ALL

SELECT dest planet FROM t1 WHERE origin IN (SELECT planet FROM planets)

)

;

IF 'SAN' IN (SELECT * FROM planets )

THEN LEAVE;

END IF;

END LOOP

Now, if this problem wasn’t a one-off, I wouldn’t traverse the graph each time, but pre-compute a better structure with all paths to instantly find the intersections (since this is a tree, not a graph).

Final thoughts

Would I recommend SQL as a general programming problem tool?

No!

… but this has been a fun way for me to try the limits of the SQL and BigQuery. Scripting is a new feature, and I’m still getting used to its syntax and limitations. So far, so good.

Take the Advent of Code challenge!

As a bonus: Use these tricks to impress your friends!

Thanks Aja Hammerly, Mano Marks, Ben Bleything, … for the early encouragement.

(You can join the BigQuery SQL leaderboard with code 763369-9f36786d)