Google Data Studio is an amazing free tool allowing anyone to quickly build nice and easy to share dashboards out of pretty much any kind of data sources. It has one frustrating limitation though : reports are not real time.

The first reader loading your report will get the latest data but then data is cached for up to 12 hours even if newer data is available in the data source and subsequent readers will not get the updated data. Only report editors can refresh the cache by manually clicking the refresh button in the toolbar :

I don’t like you, little manual refresh button

But have you met Puppeteer ? From the Chrome DevTool team, it is “a node library which provides a high-level API to control headless Chrome“. Basically it allows to do most things that you can do manually in the browser from a Node.js app running on a headless server. What if we could just push this picky little manual refresh button from some task scheduler to get rid of the 12 hours cache limitation ? Let’s do just that !

HCaS (Headless Chrome as a Service)

Don’t google it, I just made that up 😆. In order to host our little Data Studio monkey refresher application we need a headless Chromium-compatible server environment, which is not that trivial.

We could setup our own VM and install everything by hand (good luck with Chromium dependencies in a headless environment), OR we can leverage a PaaS provider with headless Chromium support so everything is already setup, and we don’t have to manage the server ourselves. Luckily GCP offers official support for headless Chrome since august 2018.

Developers can now very easily :

take screenshots of web pages

do server-side rendering

generate PDFs from web pages or JS

do end-to-end performance and UI testing

develop 🐒 scripts to click on picky manual refresh buttons :)

Node.js application

We start by creating a Node.js application, using express to create a HTTP server listening for job requests, winston for logging and puppeteer for Chrome automation. As we will use GCP PaaS services for hosting, I also throw in @google-cloud/loggin-winston to output logging to GCP Stackdriver logging service. Just create a package.json file like the one bellow and run npm install.

Then we use Puppeteer to automate the opening of Data Studio reports pages and the click on the refresh button. One problem at this stage is to get the headless Chromium to login as a user to Data Studio using Google SSO.

The solution is to manually login on the development machine using the launch flag headless:false , or to copy the userDataDir from a local Chrome instance to the userDataDir of the project. The login cookie will be used for subsequent launch so that the headless Chrome will not have to pass the login page. You could also automate the login on the login page, but this would involve more work. Anyway, then the script looks like this :

Finally wrap everything in a HTTP server and implement proper error handling in app.js (click to see the gist).

Deploy on GCP App Engine

To host our application on GCP, we define a YAML file for our app :

and to schedule the execution at a regular interval, a cron.yaml :

Then run gcloud app deploy data-studio-refresher.yaml --version dev && gcloud app deploy cron.yaml and voilà ! We have our headless Chrome automation application running serverless, with nice application lifecycle managment, task scheduler and logging services available in the GCP console.

UPDATE : As pointed by Jonathan Lin in the comments, you can also deploy to Google Cloud Function and save even more money !

Thanks for reading ! If you enjoyed don’t hesitate to share it.

I am learning everyday and don’t consider myself an expert GCP or Node.js developer. So, if I made any mistake please feel free to correct me and put your suggestions in the comment section.