In a nutshell, we built a software tool and we’d like to share its story with the larger community. But given that our audience might be of varied backgrounds, we’d like to start you off with a flowchart, because flowcharts are fun!

The view from The Cloud™

Here at Thinking Machines, we do a lot of work with huge amounts of data. So huge, in fact, that it won’t fit on our local machines. How then do we get any analytics work done? Easy! Enter: Google Cloud Platform (GCP).

With GCP, a lot of our big data problems are easily solved, especially because it provides a managed data analytics warehousing solution known as BigQuery. It’s just like your traditional relational database that runs on SQL except it has been optimized to hell and back for complex queries over very large datasets. (It is not recommended for use as a transactional database, however. If you are expecting frequent modifications to existing data, please consider CloudSQL instead.)

We use BigQuery as both a data warehouse and an analytics platform. Once all the data has been loaded, we can swim through it using SQL queries that can be saved as views. These views can reference other views as well, allowing us to write a lot of reusable logic in SQL. Lastly, these views can be consumed by external visualization tools such as Google Data Studio, Tableau, and PowerBI.

With BigQuery comes big responsibility

Views can grow to be the accumulation of logic that was built over a long period of time by multiple people. However, views are saved in BigQuery alone unless we explicitly save the SQL source code elsewhere ourselves. This can be dangerous in case someone accidentally deletes or rewrites one or more views, especially if other views depend on them. With traditional code development workflows, this wouldn’t be a problem since we can expect code to first get committed to a version control system before going into production. This isn’t the case, however, when your team has a cloud-first workflow that involves a lot of experimentation for analyzing and exploring the shape and scale of the data you’re working with. On the other hand, asking everyone who writes views in BigQuery to save their own backups adds a significant amount of work and eats up a lot of valuable time, not to mention cognitive overhead when your workflow turns from the one on the left to the one on the right.