2018-10: BigQuery doesn't support materialized views, but you can use this approach:

Let's say you want a table with the latest info for each row, and you want to keep it updated - so anyone querying can easily access the latest row without having to scan the whole append-only table.

For this example I'll use my Wikipedia clustered logs - and I'll create a table with the latest rows of all English pages that start with 'A'. These restrictions make my queries faster and smaller for this demo purposes.

Let's create the table first:

CREATE TABLE `wikipedia_vt.just_latest_rows` AS SELECT latest_row.* FROM ( SELECT ARRAY_AGG(a ORDER BY datehour DESC LIMIT 1)[OFFSET(0)] latest_row FROM `fh-bigquery.wikipedia_v3.pageviews_2018` a WHERE datehour BETWEEN "2018-10-18" AND "2018-10-21" AND wiki='en' AND title LIKE 'A%' GROUP BY title )

And now I want to update it with all the new rows received since that date:

MERGE `wikipedia_vt.just_latest_rows` T # our "materialized view" USING ( SELECT latest_row.* FROM ( SELECT ARRAY_AGG(a ORDER BY datehour DESC LIMIT 1)[OFFSET(0)] latest_row FROM `fh-bigquery.wikipedia_v3.pageviews_2018` a WHERE datehour > TIMESTAMP_SUB(@run_time, INTERVAL 1 DAY ) # change to CURRENT_TIMESTAMP() or let scheduled queries do it AND datehour > '2000-01-01' # nag AND wiki='en' AND title LIKE 'A%' GROUP BY title ) ) S ON T.title = S.title WHEN MATCHED THEN # if the row is there, we update the views and time UPDATE SET views = S.views, datehour=S.datehour WHEN NOT MATCHED BY TARGET THEN # if the row is not there, we insert it INSERT (datehour, wiki, title, views) VALUES (datehour, wiki, title, views)

Now you should set up a process to run this query periodically. To keep querying costs down, make sure the process changes the starting date for updates.

A simple way to set up this process is to use the new BigQuery Scheduled Queries, which will replace @run_time with the current timestamp.

To create a view that combines this approach with a real-time view of the latest records, see: