Client: The system we are building is collecting data from all the users of sites that are using Greta, building up a large number of data flowing into our system.

HTTP(S) Load Balancing: The users of Greta are literally distributed all around the world, which requires global presence. We achieve this by using the globally distributed http load balancer from Google. This gives us very quick and reliable data ingestion from clients all over the world.

Container Engine: We run all our services in Kubernetes with the help from Container Engine.

Ingestion: We currently have 4 clusters, EU west, US west, US east and Asia east, where we are ingesting data. We constantly evaluate where to add new presence. This is where we consume http-requests and execute some data enhancements before publishing to the queue.

Cloud Pub/Sub: Is used to move and distribute data. This service is hosted and runned by Google. To us the most important part is that it’s globally deployed so we can use it to move data from all the ingestion points around the world back to EU for streaming data processing.

Apache Beam/Cloud Dataflow: Allows for realtime and scalable data processing. We use beam’s functionality for windowing, grouping and a number functionalities which requires global presence. We achieve this by using the globally distributed http load balancer from Google. This gives us the possibility of collecting a number of different output values, fast and easy add new values and new computations on the stream of data.

BigQuery: We write all the data that flows on to Dataflow in to BigQuery which allows us to perform analytics and query the data at scale. We use BigQuery extensively to extract data and optimize data delivery for our customers, and give them insights to very specific parts of their data delivery network.

OpenTSDB: Is a time series database that has a number of functions built in, allowing us to query data very effectively. We can very rapidly iterate on queries and use the multiple aggregations, downsampling, expressions, tags and more to visualize the data, giving insights in how the data distribution is preforming.

Cloud Pub/Sub to OpenTSDB writer: This is a small static writer that subscribes to Cloud Pub/Sub and sends a write operation to openTSDB. The biggest benefit of doing it this, is that it’s enabled us to continue writing to the different data sources and to recover the data as long as it is remains in cloud pub/sub.

BigTable: We run openTSDB with BigTable as the storage backend, which allows for basically linearly scaling in writes and reads. It is also completely managed and performing operations really quickly.

Read API: We run an API that does authenticate the users and allows our front end to query openTSDB for data, that will then be displayed in our dashboard.

All in all, it’s a lot of parts but it makes the data processing very scalable and resilient. We will go in to the different parts of the system in different blog posts, to eventually provide a good in-depth overview of the system.