Usually, you would like to analyze your daily buckets storage size. Running gsutil du is out of the question and currently Stackdriver Monitoring can be somewhat inaccurate for this task.

We will use the Storage Logs and a Cloud Function (CF) triggered by a Pub/Sub topic to automatically load the storage consumption log into BigQuery, where you can query the bucket size and visualize it. Storage logs that were loaded will then be moved to another bucket. Logs that were not loaded successfully or other objects will be moved to an “errors” bucket.

Create a bucket in which you’re going to store Storage & Access logs for other buckets. Switch to your designated project and create a new bucket to store the logs

gcloud config set project PROJECT-ID

gsutil mb gs://LOGS-BUCKET

2. We would also like to separate storage logs from usage logs and a bucket to store logs that failed to be loaded into BigQuery. The logs will be moved from gs://LOGS-BUCKET to the appropriate bucket after they were handled by the CF.

gsutil mb gs://PROCCESSED-LOGS-BUCKET

gsutil mb gs://ERRORS-LOGS-BUCKET

gsutil mb gs://USAGE-LOGS-BUCKET

Note: You can also use different folders and use only one bucket, but this will require you to modify the CF code a bit.

3. Allow Google’s Cloud Storage Analytics service account to write to our new bucket:

gsutil acl ch -g cloud-storage-analytics@google.com :W gs://LOGS-BUCKET

4. Enable bucket notifications to Pub/Sub. The following command will create a notification configuration for gs://LOGS-BUCKET meaning that every time an object is created, changed, deleted or archived, a message will be pushed to a Pub/Sub topic with the relevant information. Since we only care about when are objects created (e.g. when the logs objects are created), we will watch only the OBJECT_FINALIZE event.

gsutil notification create -e OBJECT_FINALIZE -f none gs://LOGS-BUCKET

The -f flag specifies the payload information for the message. The available options are either ‘ json ’ or ‘ none ’. We are not using any bucket metadata information in our CF so we chose ‘none’.

flag specifies the payload information for the message. The available options are either ‘ ’ or ‘ ’. We are not using any bucket metadata information in our CF so we chose ‘none’. If you check the Pub/Sub page in the developer console, you will see that the command above created a topic named projects/PROJECT-ID/topics/LOGS-BUCKET . Messages for new objects will be published to this topic.

5. Create a new BigQuery dataset and table

bq mk MY_DATASET bq mk —-schema project_id:string,bucket:string,storage_byte_hours:integer,bytes:integer,date:date,update_time:timestamp,filename:string -t MY_DATASET.MY_TABLE

6. Create a staging bucket for the function we are about to deploy

gsutil mb gs://CF-STAGING-BUCKET

7. Clone the Cloud Function code



cd gcs-stats git clone https://github.com/doitintl/gcs-stats.git cd gcs-stats

Important: Open config.js with any text editor and edit the values according to the previous steps.

8. After editing the config file, deploy the function

gcloud beta functions deploy gcs-stats \

— project PROJECT-ID \

— entry-point gcsStatsHandler \

— stage-bucket gs://CF-STAGING-BUCKET \

— source . \

— trigger-topic LOGS-BUCKET \

— memory 128 \

— timeout 10s

9. Finally we can enable Access & Storage logs on any bucket we want to analyze. The target bucket for the logs is of course the bucket we created in the first step: gs://LOGS-BUCKET .

We would like to have a column for the project ID of each bucket. Since the logs don’t include this information, we will use a custom prefix for the logs file name. The prefix we chose is PROJECT_[ANALYZED-PROJECT-ID]_BUCKET_[ANALYZED-BUCKET]

Important: if you already have logging enabled on the bucket, executing the following command will stop the existing logging configuration.

gsutil logging set on -b gs://LOGS-BUCKET -o PROJECT_[ANALYZED-PROJECT-ID]_BUCKET_[ANALYZED-BUCKET] gs://ANALYZED-BUCKET

For example, if you have a bucket called gs://my-enormous-bucket and it resides in the project my-storage-project , the prefix should be PROJECT_my-storage-project_BUCKET_my-enormous-bucket . If decide not to use the custom prefix, the log objects will have the default name prefix and the project_id column in the BigQuery table will contain null values.

10. Enable logs for any other bucket you want to monitor as explained in the previous step.