Oh, dear. That’s… a really weird error message.

The problem here is that event_params is essentially an array (actually in BigQuery parlance it’s a “repeated record”, but you can think of it as an array). So while it might contain several rows that themselves have a key field, it doesn’t have one itself.

This is where the UNNEST function comes in. It basically lets you take elements in an array and expand each one of these individual elements. You can then join your original row against each unnested element to add them to your table.

To use a simpler example, let’s imagine we have a table full of spaceships. And crew is an array, much like event_params was in our analytics tables.

Right now, if I wanted to find spaceships that contain a crew member named Zoe, I couldn’t do that with a line like this.

SELECT * FROM `spaceships` WHERE crew = "Zoe"

But if I were to write something like this…

SELECT * FROM `spaceships` CROSS JOIN UNNEST(crew) as crew_member

What BigQuery will do is take every individual member in my crew array, and add it on as a new column in my row called crew_member It will repeat my original row as needed to accompany each new value. Kinda like in this nifty animation:

Once I’ve done that, it’s easy to select spaceships with crew members named Zoe.

SELECT * FROM `spaceships`

CROSS JOIN UNNEST(crew) as crew_member

WHERE crew_member = "Zoe"

You’ll find that in practice, though, most BigQuery developers will replace the CROSS JOIN with a comma, like so:

SELECT * FROM `spaceships`,

UNNEST(crew) as crew_member

WHERE crew_member = "Zoe"

It still does the same thing, it just looks cooler. (BigQuery developers are all about looking cool.)

So going back to our example, we essentially have an array (or repeated record) of event_params . So if I were to say:

SELECT * FROM `my_analtyics_table`,

UNNEST(event_params) as param

…what BigQuery will do is take each individual event parameter and expand them out into a new column called param , and repeat the original row as needed. Like this:

You’ll notice that param itself is still a nested object. But it’s a single record, not a repeating one. (Again, kinda like a struct.) So we can query for things like param.key or param.value.int_value . Specifically, we can now look just for rows where param.key is equal to “value” .

Since, in practice, it’s a little weird to have the full repeated event_params object alongside each unnested parameter, I usually drop it and just look at the param object instead.

So with all that in mind, let’s go back to the task of analyzing the average score for users who complete a quickplay level. I can now grab just those events along with the unnested value parameter.

SELECT event_name, param

FROM `firebase-public-project.analytics_153293282.events_20180915`,

UNNEST(event_params) AS param

WHERE event_name = "level_complete_quickplay"

AND param.key = "value"

And then, it’s really easy to grab the actual value of the value parameter by looking at param.value.int_value .

SELECT event_name, param.value.int_value AS score

FROM `firebase-public-project.analytics_153293282.events_20180915`,

UNNEST(event_params) AS param

WHERE event_name = "level_complete_quickplay"

AND param.key = "value"

From there, I can perform interesting tasks like calculating the mean, getting some quantiles, and figuring out the standard deviation…

SELECT AVG(param.value.int_value) AS average,

APPROX_QUANTILES(param.value.int_value, 2) AS quantiles,

STDDEV(param.value.int_value) AS stddev

FROM `firebase-public-project.analytics_153293282.events_20180915`,

UNNEST(event_params) AS param

WHERE event_name = "level_complete_quickplay"

AND param.key = "value"

Hey, maybe while I’m at it, let’s make a quick histogram of all the values!

SELECT param.value.int_value AS score, COUNT(1) AS count

FROM `firebase-public-project.analytics_153293282.events_20180915`,

UNNEST(event_params) AS param

WHERE event_name = "level_complete_quickplay"

AND param.key = "value"

GROUP BY 1

ORDER BY 1

So… this is interesting. There’s a pretty big peak around 21 or 22 moves… but if you keep looking, it seems like there’s also a couple of smaller peaks around 29 moves and in the 34–36 range.

The reason for that is the other parameter I was talking about — the board parameter. While the vast majority of games are being played on a small board, there’s a number of games being played on medium and large boards too, and those probably account for those smaller peaks.

In fact, we can kinda see that by looking at both the board and the value parameters for our event.

SELECT param

FROM `firebase-public-project.analytics_153293282.events_20180915`,

UNNEST(event_params) AS param

WHERE event_name = "level_complete_quickplay"

AND (param.key = "value" OR param.key = "board")

Sure enough, it seems like the “value” parameter is higher when it follows a board of M or L .

But how can we actually prove this out in BigQuery? Right now, making this analysis is kinda difficult because we have different values in different rows.

Now, there is a way we can bunch these rows together. If we add back the pseudo_user_id (essentially, a unique ID assigned to each app instance) and event_timestamp to each of our columns, we can then group together parameters for the same event based on those two values.

SELECT MAX(if(param.key = "value", param.value.int_value, NULL)) AS score,

MAX(if(param.key = "board", param.value.string_value, NULL)) AS board_type

FROM (

SELECT user_pseudo_id, event_timestamp, param

FROM `firebase-public-project.analytics_153293282.events_20180915`,

UNNEST(event_params) AS param

WHERE event_name = "level_complete_quickplay"

AND (param.key = "value" OR param.key = "board")

)

GROUP BY user_pseudo_id, event_timestamp

And then we can analyze score by board type…

SELECT ANY_VALUE(board_type) as board, AVG(score) as average_score

FROM (

SELECT MAX(if(param.key = "value", param.value.int_value, NULL)) AS score,

MAX(if(param.key = "board", param.value.string_value, NULL)) AS board_type

FROM (

SELECT user_pseudo_id, event_timestamp, param

FROM `firebase-public-project.analytics_153293282.events_20180915`,

UNNEST(event_params) AS param

WHERE event_name = "level_complete_quickplay"

AND (param.key = "value" OR param.key = "board")

)

GROUP BY user_pseudo_id, event_timestamp

)

GROUP BY board_type

But this seems like a pretty awkward way of doing things. Wouldn’t it be better if we could find a way of more easily getting multiple values in the same row? Well it turns out you can, and we’ll find out how… in my next post! (Woo! Cliffhanger ending!!)