After a quick research, I came to the following list of platforms to monitor:

eBay Kleinanzeigen

ImmobilienScout24

Immowelt

Nestpick

A few hours later, I have a Go binary that does everything I need to run the application locally. It uses a web scraping framework called Colly to browse all the platforms listings, extract basic attributes, and export to CSV files in the local filesystem.

Since I didn’t want to maintain the application running locally, my first choice would be to get a cheap instance at Google Cloud. Once I had this rented virtual machine, I could write a startup script to compile the app from GitHub, and set up a crontab to scrape the platforms on a daily basis.

Probably the best decision for this specific project, but could I use this personal problem as an opportunity to explore the integration of Google Cloud services?

Since, in the past, I was involved in multiple projects involving some sort of scraping application, I believed it was worth the effort. I could easily reuse this setup in the future.

My architecture started with a few premises:

It should use Google Cloud services.

It should support data collection every few minutes, even though I would start collecting only once a day.

It should be as cost-effective as a cheap droplet at DigitalOcean (US$ 5).

It should be easy to deploy. Ideally, it should implement Continuous Deployment.

It should support to trigger a data collection process over demand — e.g., after an HTTP POST request.

My hypothesis was that I didn't need a virtual machine running 24/7; thus, it should not cost the same as a full month price. In fact, my application was able to download all the properties I was interested in under 3 minutes, so I expected something significantly lower.

The architecture