Big Important Update: As of August 1st, 2019, you can now upgrade your Firebase project to the full Google Analytics experience which means that you can create closed funnels for your Firebase project in the GA console without ever having to do anything in BigQuery. Still, this post might come in handy if you want to do more advanced analysis, or are just trying to figure out to use BigQuery.

Lesson 1: Let’s build a two-step funnel!

So there are a lot of nice reports that you get out of the box using Google Analytics for Firebase. Everything from the Dashboard to the real-timey StreamView to their latest release, the… uh… Latest Release report.

But sometimes you need to dig deeper into the data than what you get from the Firebase Console. And for that, the folks at Firebase have always recommended linking your Analytics data to BigQuery, so you can slice-and-dice all the raw event data to your heart’s content. Of course, that’s all well and good, but going from “Clicking on a link to see pretty graphs” to “Writing a SQL query that will parse gigs of data” is a bit of a step up in difficulty.

So let’s see if we can learn a little bit of BigQuery querying by covering some common tasks you might want to perform, and we’ll start with one of the most common: Creating a Closed Funnel.

Open vs Closed Funnels

If you look at the funnel reports in that you get today from the Firebase console, they’re all open funnels, meaning that a user who triggers event B in the funnel will still be counted, even if they never triggered event A. So I tend to think of them less as funnels and more like a bunch of event counts that are conveniently displayed next to each other.

Now, in many cases, these are still quite useful. If you’ve got a “Tutorial started” and “Tutorial complete” event, it’s probably going to be impossible for a user to trigger tutorial_complete without triggering the tutorial_started event first. So your results will be the same whether or not you have an open or closed funnel, and creating an open funnel from the Firebase console is a great solution.

But in some cases, they can be a little less intuitive. Let’s take a look at one example from our public demo project, a game called Flood-it.

She might not look like much, but she’s got it where it counts, kid.

One day, our producer comes in and wants to know how many times users encounter a “no more extra steps” dialog (marked by the no_more_extra_steps event), and then decide to spend virtual currency ( spend_virtual_currency ).

Now if you went ahead and tried building this funnel in the Firebase console, you’d get results that look a little like this:

Clearly, this report is misleading. The issue is that we’re counting every time a user spent virtual currency, whether or not they did it after seeing the “No more extra steps” dialog. We need to only count events where users first encountered the dialog, and then spent money.

So let’s see if we can delve into BigQuery and figure this out.

For this example, I’m going to be running all of our queries against our public Analytics data set. You can go ahead and make these same queries yourself against this same data set, or run a modified version against your own app’s data if you want something more relevant to you.

Note: All of the queries in this blog post are run using Standard SQL, which is the default in the fancy new UI. If you’re still using the old UI, make sure you uncheck “Use legacy SQL” from the options.

Let’s start by taking a look at our user’s journey as they use this app. Go ahead and run this query, and browse the results.

SELECT event_name, user_pseudo_id, event_timestamp

FROM `firebase-public-project.analytics_153293282.events_20180720`

ORDER BY 2,3

LIMIT 1000

What you get is something like this:

user_pseudo_id is a unique ID created for any instance of an app running on a particular device. It remains the same for all gameplay sessions, but changes from app to app and from device to device. You can think of it like a user ID that’s been built in a privacy-respecting way that still lets you group events together that belong to the same user.

By browsing this data, ordered first by this user ID, then the time the event occurred, you can kind of get a sense of what our users are up to, and the events that they trigger as they play our game.

Note: As you’re browsing our sample data, you might notice that the order of events don’t always seem that logical. This is because the sample data is the “Sanitized for public consumption” version of the data. And that means we’ve deleted some values, scrambled others, and, in the case of our timestamps, added a few thousand milliseconds of “fuzz” to make our lawyers happy. So while I promise these code samples are correct, the results in the public sample data can sometimes be a little wonky.

This is the start of something helpful, but let’s focus a bit more on building our funnel. Run this query to look at just the two events we’re interested in.

SELECT event_name, user_pseudo_id , event_timestamp

FROM `firebase-public-project.analytics_153293282.events_20180720`

WHERE (event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency")

ORDER BY 2,3

So this is a good start — we can still see the individual user journeys through this table, but now we’re focused on just these two events. In fact, it looks like our good friend user 2A50F2A460... hit a “no more extra steps” event, then spent some virtual currency shortly afterwards in rows 4 and 5.

But browsing through our table manually like this isn’t going to scale. How can we get BigQuery to do this for us?

Well, let’s take a look at my favorite function for building funnels, LEAD OVER . Give this query a try:

SELECT event_name, user_pseudo_id , event_timestamp,

LEAD(event_name, 1) OVER (ORDER BY user_pseudo_id, event_timestamp) AS next_event

FROM `firebase-public-project.analytics_153293282.events_20180720`

WHERE (event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency")

ORDER BY 2,3

What LEAD OVER is doing is taking your current results and saying, “Okay, if I were to order these results by user_pseudo_id and event_timestamp , what would the value of event_name be if I looked at the row one space ahead of the current one?” Well, it turns out that order is the same order that I’m using in my main results, so we basically get to see the name of the event that occurs one row down.

As you can see here in the output, it’s basically telling you what event the user triggers next as they’re using your app.

This works nicely, except that it kinda fails when we switch users. User 15955C277.. in row 1, for instance, doesn’t trigger spend_virtual_currency followed by spend_virtual_currency . That second event belongs to a completely different user.

Luckily, LEAD OVER has a way to fix this with the PARTITION BY argument. Try out this query instead:

SELECT event_name, user_pseudo_id , event_timestamp,

LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event

FROM `firebase-public-project.analytics_153293282.events_20180720`

WHERE (event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency")

ORDER BY 2,3

This says, “Okay, if I were to take these results and split them up into lots of tiny little tables, broken up by user_pseudo_id , then order those tables by their timestamp, what’s the value of event_name one row ahead?”

So you can see in the results that we now have a table that lists every no_more_extra_steps or spend_virtual_currency event the user encountered, along with the one immediately after that, as long as it belongs to the same user.

Well, if you think about it, this is basically all the information we need to build our funnel. If we consider step 1 of our funnel “Every no_more_extra_steps event”, and step 2 of our funnel “Every spend_virtual_currency event that was preceded by a no_more_extra_steps event”, we can essentially get that info in our table with a simple if clause. Give this query a try: