As we prepare for our workshop “A Big Data Adventure on the Google Cloud Platform” at Crunch Data Engineering and Analytics Conference, we have started to collect some tips and tricks when it comes to using Google’s Big Data platform. This first collection of advice shows how to control your costs when using BigQuery (BQ), discusses the kinds of built-in tools you have access to, and explains the related best practices. In the second blog post, we show how you can boost your BigQuery performance. Read this blog post to find out how to take control of your Bigquery costs.

What is the difficulty?

The pricing of BQ is usage-based, you pay for the amount of data you store, query, and insert by streaming. This seems fair, although it has some consequences. When you have practically unlimited resources — as in the case of BQ — this pricing policy becomes a double-edged sword: it is easy to spend a lot if you are not cautious enough. When it comes to budget planning, you should know how much data you will store and process in the future and this can make it challenging to estimate your costs. Fortunately, there are some tools and best practices that can help you to control your costs.

Getting started

Probably the most difficult is to plan the costs if you’ve never actually used BQ. Google provides a pricing calculator tool that helps in these situations. It estimates your future costs based on the parameters we’ve already mentioned: storage, streaming inserts, queries. Of course, to get a result, you will still have to estimate the size of the datasets involved.

How to prevent overspending

BQ provides a couple of built-in options to continuously control your spending and stop you going over budget.

1. Query-level cost control

Check the amount of data BQ will process during the query before actually running that query. Simply click on the green exclamation mark (Validator) in the right bottom of the window.

If you also add the BQ Mate extension to your Chrome browser, you will see how much the query will cost, in USD.

2. Project-level cost control

Set the project-level soft limit by requesting a billing alert. When you exceed the threshold you have set for the current month, the billing administrators will receive an email notification.

Set the project-level hard limit by maximizing the number of bytes processed per day within the project. If you enable the BQ custom quotas feature, you can specify this limit in 10 TB increments. To activate this feature or to modify the value, submit the BQ Custom Quota Request form. When you exceed the quota, queries will return errors.

3. User-level cost control

Set the user-level hard limit within a project by maximizing the number of bytes processed per day by a given user. The method is the same as in case of the project-level hard limit, except that the error message is different when you exceed the quota.

4. Billing-account level cost control

If you have multiple projects, and you want to monitor the overall budget usage, set billing alerts for your entire billing account. The process is the same as point 2a.

How to monitor the costs

There are also options to monitor your spending:

Follow your monthly data usage on your project’s Cloud Console dashboard on the Billing panel. Be aware though, that the costs will appear only within a couple of days of usage, not in real time. Under the Billing/History menu on the Cloud Console, you can get a more detailed view of your spending. Export your detailed BigQuery audit logs, and visualize your spendings in Data Studio as Mike and Ryan suggest in this great post.

Is that all?

Not really. Beyond these built-in features, there are several best practices that can help to reduce your costs.