Data are critical for companies. Many of them have more data than the capability to process them. And, sadly, a lot of values sleep in databases.

BigQuery is a Google Cloud Platform product which address this topic. BigQuery is a petabyte scale data warehouse and you can query terabytes of data in seconds and you can unlock the power of your sleeping data.

Pricing model and partitioning concern

BigQuery comes with 2 pricing model: on-demand and flat-rate

With the flat-rate model, you commit a number of “slots” (unit of computational capacity) and you know in advance the bill that you will pay. The first flat-rate commitment is quite high ($10k per month), it’s recommended for big data companies.

With the on-demand model, you are charged on the volume of data that you scan. This model is well adapted for startup, medium and small companies. Because of “more data you scan, more you pay”, for limiting the costs, the data have to be optimized for reducing the read data. For this, BigQuery allows to partition the data for narrowing the volume of data scanned.

Note: it’s important to differentiate the volume of data scanned and the volume of data returned. A limit at the end of a query limit the result not the volume scanned.

Partition and clustering

The partition and clustering are 2 features that allow you to narrow the volume of data that you scan in your database.

Until now, partitioning was only possible of date: either on a timestamp field or by ingestion time; both with a day granularity. Like this, a partition like a “sub-table”, was created for each day. When you looked for a data, simply specify the date (or the date range) for query only the interesting partition and scan only the relevant data.

Clustering is a finer grain optimization inside the partition, like composite indexes in relational database. Felipe Hoffa (Google Cloud Developer Advocate) has released a great articles on this

Integer Range Partitioning

In December 2019, Google has released a new partition capability: Integer range partitioning. This feature allow you to store all the values of a same range in the same partition

You have to define the min and max values, and the range size. That’s all, the sharding is made for you! You have user ID, zip code, geo coordinates, (…) the partitioning works for you!