

At the inaugural Prometheus London meetup, I gave a talk about how Weaveworks uses Prometheus to monitor Weave Cloud, which runs on a Kubernetes cluster in AWS. In this series of blog posts I’ll expand on some of the practices we’ve developed over the last 6 months, and hopefully show a few cool hacks that have allowed us to scale Weave Cloud.

This is not a new topic and there are several recent excellent blog posts from members of the community:

A Little Background

In May, the Cloud Native Computing Foundation (CNCF) accepted Prometheus as the second hosted project after the Kubernetes, its first hosted project. Shortly after announcing version 1.0 of Prometheus, Björn Rabenstein (a core developer and an engineer at SoundCloud) described in an interview with JaxEnter that Prometheus and Kubernetes share a “spiritual ancestry”.

Both Prometheus and Kubernetes are inspired by internal Google technologies (Borgmon and Borg respectively). But the match goes deeper – there are several key design influences that make Prometheus the best monitoring system for both your infrastructure and applications deployed onto Kubernetes.

Pulling and Discovery

Prometheus is a pull-based monitoring system, which means that central Prometheus servers discover and pull metrics from your services. The discovery and pull system fits well with a dynamic, cloud native environment such as Kubernetes, where Prometheus integrates well with Kubernetes to discover and enumerate the services you have running. As you scale up a service, Prometheus automatically starts pulling metrics from the extra replicas. Similarly as nodes fail and pods are restarted on different ones, Prometheus automatically notices and scrapes them. In our setup, the same Prometheus configuration is used for both our development and production environments, greatly simplifying testing.

Labels

Prometheus and Kubernetes share the concept of ‘labels’ (key-value pairs) that can be used to select objects in the system. Prometheus uses these labels to identify time series and can use sets of label matchers in the query language (PromQL) to select the time series to aggregate over.

Kubernetes uses labels in many places. For example, to select the Pods that make up a service or to enable more advanced workflows like canarying. By sharing this common concept, using a combination of Prometheus and Kubernetes results in lower cognitive load for your developers.

Exporters and Pods

Prometheus’ best practices are to natively instrument your services (as the Kubernetes components are). But for non-natively-instrumented services (such as Memcached, Postgres, etc.) it is possible to use an exporter. An exporter is a process that runs alongside your service and translates metrics from the service into the format Prometheus understands.

Kubernetes has the concept of Pods (collections of containers) that form an atomic unit of management and scheduling. Pods consist of multiple containers that share a common network namespace, which allows them to address each other by using the loopback address (localhost/127.0.0.1). Pods give you the perfect abstraction for co-deploying your exporters with the service you want to monitor.

Prometheus and Kubernetes: A Perfect Match

These three characteristics of Prometheus and Kubernetes make the monitoring of services deployed on Kubernetes with Prometheus not only incredibly easy, but also pleasingly coherent.

In future blog posts, we will cover the tradeoffs and lessons learned when deploying Prometheus on Kubernetes, we will describe how we monitor both our services and our infrastructure with Prometheus, and also how we use Prometheus to accelerate the developer journey.

For additional reading on Prometheus, check out my other blog posts in this series:



