What Means “Semi-structured”, Anyway?

It means we can define a complex nested structure, but it has to be consistent throughout the whole table. To create it we need two things: structs and arrays. One way to define a struct is using the struct() function:

The JSON representation of the result looks as follows:

Defining arrays is fairly easy as well. Note, when defining structs you only need to define field names in the first instance. All other structs inherit the names.

A couple of arrays … and arrays in an array

As you can see in structs2 , in order to list arrays in an array you need to put them into a struct. That way they can have a name and also can get addressed later.

But How are These Arrays Pre-joined Tables?

Think of it this way: Usually information is distributed over multiple tables to save storage and only combined when needed like in this example:

Now, in nested data the same scenario would look more like this:

Nested result with two rows

We combined two tables in one — the data sits together, we save storage and no id’s are required.

Ok, But How do I Query Arrays?

Fair question — let’s look at the options …

Unnest Arrays

The magical function that allows us to query arrays is called UNNEST() . It takes an array as input and provides its contents as table rows as output. If the array consists of structs it conveniently turns them into columns.

There are two ways to make use of UNNEST() : flatten the table or use sub-queries. This very much depends on your use case.

flattening increases the size of the table substantially

sub-queries aggregate arrays

This means, if you can avoid flattening, then by all means avoid it!

Sub-query Unnested Arrays

If you want to aggregate the array, you should use sub-queries. Simply wrap your query in brackets and select FROM UNNEST() :

Go ahead and add your own sub-query!

Like a for-loop nested in a for-loop the sub-query gets executed per row. You can use most things SQL offers. Be it ordering, grouping, windowing or even joining with other arrays in the current row.

You can even prepare your data and create your own arrays by feeding the sub-query output back into the function ARRAY() . If you want to fill it with structs for multiple columns simply SELECT AS STRUCT .

Try it in BigQuery! Add your own sub-field!

Write your own sub-tables!

Sub-queries are also fantastic to have a quick look into very big tables. Try this query on Google Analytics data — it shows only selected fields from specific sessions.

Did I mention you can run sub-queries in the WHERE -clause? Don’t get confused by which WHERE applies to the whole table and which to UNNEST(hits) . Every WHERE has its FROM ! Mind the scope!

Flatten Tables by Using Joins on Unnested Arrays

Since UNNEST() provides us with table rows, we can not only query them but also use them in JOINs !

This is what happens:

Take the array and turn it into table rows cross join them with the arrays parent row

We effectively repeat the parent row for every row coming from the unnested array.

But this also means that, technically, we’re repeating the array as well. So, if we don’t select the right columns the whole thing starts to look a bit messy. Just try and replace meal, i with * :

Not so flat anymore …

If you’re more advanced with “flattening” tables you can make great use of this fact, though! For example in this query where we use the information of a flattened sub-row to look something up in the array this row came from!

In relational tables this would be the equivalent of joining two tables t1 and t2 but then still write a sub-query to check something in t2.

Efficient Queries with Unnesting Arrays

Let’s review:

Sub-queries are good to aggregate arrays

Cross joins are good to lengthen tables by the size of the arrays they contain

But cross joins also mean a lot of computation — so if you can avoid them: avoid them.

As a general guideline:

If you need to extend your table by values from an array, use cross join . E.g. you want to group by the ingredients in our example table.

If you only need aggregated values from the array, use sub-queries. Of course, you can use them in aggregations as well:

Aggregate the aggregate

That means you should avoid this version of the query to save money, resources and time:

SELECT

SUM(length(i)) AS sumLenIngred

FROM t1 cross join unnest(ingredient) i

I know it is shorter — but only for you to read. Especially for BigQuery this version is the longer way to the result, because it has to extend the table first and only then aggregates it. While in the sub-query version it simply aggregates without the extra cross join step.

I hope this is helpful and gets you started with SQL in nested data! Let me know in case you have questions or something needs more elaboration!