The Prometheus Operator: Managed Prometheus setups for Kubernetes

• By Fabian Reinartz

Note: The instructions in this post are out of date. To try out the Prometheus Operator, view the latest Prometheus docs for an up-to-date guide to get started.

Today, CoreOS introduced a new class of software called Operators and are also introducing two Operators as open source projects, one for etcd and another for Prometheus. In this post, we'll outline the importance of an Operator for Prometheus, the monitoring system for Kubernetes.

An Operator builds upon the basic Kubernetes resource and controller concepts but includes application domain knowledge to take care of common tasks. They ultimately help you focus on a desired configuration, not the details of manual deployment and lifecycle management.

Prometheus is a close cousin of Kubernetes: Google introduced Kubernetes as an open source descendent of their Borg cluster system and Prometheus shares fundamental design concepts with Borgmon, the monitoring system paired with Borg. Today, both Prometheus and Kubernetes are governed by the Cloud Native Computing Foundation (CNCF). And at a technical level Kubernetes exports all of its internal metrics in the native Prometheus format.

The Prometheus Operator: The best way to integrate Kubernetes and Prometheus

The Prometheus Operator is simple to install with a single command line, and enables users to configure and manage instances of Prometheus using simple declarative configuration that will, in response, create, configure, and manage Prometheus monitoring instances.

Once installed the Prometheus Operator provides the following features:

Create/Destroy Easily launch a Prometheus instance for your Kubernetes namespace, a specific application or team easily using the Operator.

Simple Configuration : Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource.

Target Services via Labels: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn of learning a Prometheus specific configuration language.

Note that the Prometheus Operator is under heavy development, follow the project on GitHub for latest information.

How it Works

The core idea of the Operator is to decouple deployment of Prometheus instances from the configuration of which entities they are monitoring. For that purpose two third party resources (TPRs) are defined: Prometheus and ServiceMonitor .

The Operator ensures at all times that for each Prometheus resource in the cluster a set of Prometheus servers with the desired configuration are running. This entails aspects like the data retention time, persistent volume claims, number of replicas, the Prometheus version, and Alertmanager instances to send alerts to. Each Prometheus instance is paired with a respective configuration that specifies which monitoring targets to scrape for metrics and with which parameters.

The user can either manually specify this configuration or let the Operator generate it based on the second TPR, the ServiceMonitor . The ServiceMonitor resource specifies how metrics can be retrieved from a set of services exposing them in a common way. A Prometheus resource object can dynamically include ServiceMonitor objects by their labels. The Operator configures the Prometheus instance to monitor all services covered by included ServiceMonitor s and keeps this configuration synchronized with any changes happening in the cluster.

The Operator encapsulates a large part of the Prometheus domain knowledge and only surfaces aspects meaningful to the monitoring system's end user. It's a powerful approach that enables engineers across all teams of an organization to be autonomous and flexible in the way they run their monitoring.

Operator workflow and relationships

Prometheus Operator in Action

We are going to walk through a full demonstration of the Prometheus Operator by creating a Prometheus instance and some services to monitor. Let's start by deploying our first Prometheus instance.

First, you need a running Kubernetes cluster v1.3.x or v1.4.x with alpha APIs enabled (note that v1.5.0+ clusters will not work with the version of the Prometheus Operator used in this blog post; see the Prometheus Operator documentation and kube-prometheus for the latest information on how to run on newer Kubernetes releases). If you don't already have a cluster, follow the minikube instructions to quickly get a local cluster up and running.

Note: minikube hides some components of Kubernetes, but it is the fastest way to setup a cluster to work with. For a more extensive and production-like environment have a look into setting up a cluster using bootkube.

Managed Deployments

Let's start by deploying the Prometheus Operator in our cluster:

$ kubectl create -f https://coreos.com/operators/prometheus/latest/prometheus-operator.yaml deployment "prometheus-operator" created

Verify that it is up and running and has registered the TPR types with the Kubernetes API server.

$ kubectl get pod NAME READY STATUS RESTARTS AGE prometheus-operator-1078305193-ca4vs 1/1 Running 0 5m $ until kubectl get prometheus; do sleep 1; done # … wait ... # If no more errors are printed, the TPR types were registered successfully.

A simple definition of a Prometheus TPR that deploys a single Prometheus instance looks like this:

apiVersion: monitoring.coreos.com/v1alpha1 kind: Prometheus metadata: name: prometheus-k8s labels: prometheus: k8s spec: version: v1.3.0

To create it in the cluster, run:

$ kubectl create -f https://coreos.com/operators/prometheus/latest/prometheus-k8s.yaml prometheus "prometheus-k8s" created service "prometheus-k8s" created

This also creates service to make the Prometheus UI accessible for the user. For the purpose of this demo, a service exposing it on NodePort 30900 is created.

Immediately afterwards, observe the Operator deploying a Prometheus pod:

$ kubectl get pod -w NAME READY STATUS RESTARTS AGE prometheus-k8s-0 3/3 Running 0 2m

We can now reach the Prometheus UI by going to http://:30900 run $ minikube service prometheus-k8s when using minikube.

In the same manner we can easily deploy further Prometheus servers and use advanced options in our Prometheus TPR to let the Operator handle version upgrades, persistent volume claims, and connecting Prometheus to Alertmanager instances.

You can read more on the full capabilities of the managed Prometheus deployments in the repository's documentation.

Cluster Monitoring

We successfully created a managed Prometheus server. However, it is not monitoring anything yet as we did not provide any configuration. Each Prometheus deployment mounts a Kubernetes ConfigMap named after itself, i.e. our Prometheus server mounts the configuration provided in the "prometheus-k8s" ConfigMap in its namespace.

We want our Prometheus server to monitor all aspects of our cluster itself like container resource usage, cluster nodes, and kubelets. Kubernetes chose the Prometheus metric format as the canonical way to expose metrics for all its components. So, we only need to point Prometheus to the right endpoints to retrieve those metrics. This works the same across virtually any cluster and we can use the predefined manifests in our kube-prometheus repository.

# Deploy exporters providing metrics on cluster nodes and Kubernetes business logic $ kubectl create -f https://coreos.com/operators/prometheus/latest/exporters.yaml deployment "kube-state-metrics" created service "kube-state-metrics" created daemonset "node-exporter" created service "node-exporter" created # Create the ConfigMap containing the Prometheus configuration $ kubectl apply -f https://coreos.com/operators/prometheus/latest/prometheus-k8s-cm.yaml configmap "prometheus-k8s" configured

Shortly after Kubernetes will update the configuration in the Prometheus pod and we can see targets showing up on the "Targets" page. The Prometheus instance is now ingesting metrics and ready to be queried in the UI or by dashboards and to evaluate alerts.

"Targets" page of prometheus-k8s

Service Monitoring

On top of monitoring our cluster components, we also want to monitor our own services. Using the regular Prometheus configuration, we have to deal with the concept of relabeling to discover and configure monitoring targets properly. It is a powerful approach allowing Prometheus to integrate with a variety of service discovery mechanisms and arbitrary operational models. However, it is very verbose and repetitive and thus not generally suitable to be written manually.

The Prometheus Operator solves this problem by defining a second TPR to express how to monitor our custom services in a way that is fully idiomatic to Kubernetes.

Suppose all our services with the label tier = frontend serve metrics on the named port web under the standard /metrics path. The ServiceMonitor TPR allows us to declaratively express a monitoring configuration that applies to all those services, selecting them by the tier label.

apiVersion: monitoring.coreos.com/v1alpha1 kind: ServiceMonitor metadata: name: frontend labels: tier: frontend spec: selector: matchLabels: tier: frontend endpoints: - port: web # works for different port numbers as long as the name matches interval: 10s # scrape the endpoint every 10 seconds

This merely defines how a set of services should be monitored. We now need define Prometheus instance that includes this ServiceMonitor into its configuration. ServiceMonitor s belonging to a Prometheus setup are selected, once again, based on labels. When deploying said Prometheus instance, the Operator configures it according to the matching service monitors.

apiVersion: monitoring.coreos.com/v1alpha1 kind: Prometheus metadata: name: prometheus-frontend labels: prometheus: frontend spec: version: v1.3.0 # Define that all ServiceMonitor TPRs with the label `tier = frontend` should be included # into the server's configuration. serviceMonitors: - selector: matchLabels: tier: frontend

We create the ServiceMonitor and the Prometheus object by running:

$ kubectl create -f https://coreos.com/operators/prometheus/latest/servicemonitor-frontend.yaml servicemonitor "frontend" created $ kubectl create -f https://coreos.com/operators/prometheus/latest/prometheus-frontend.yaml prometheus "prometheus-frontend" created service "prometheus-frontend" created

Visiting http://:30100 (run $ minikube service prometheus-frontend when using minikube) we can see the UI of our new Prometheus server. As there's no service the ServiceMonitor applies to, the "Targets" page is still empty.

The following command deploys four instances of an example application exposing metrics as defined by our ServiceMonitor and matches its tier = frontend label selector.

$ kubectl create -f https://coreos.com/operators/prometheus/latest/example-app.yaml

Going back to the web UI, we can see the new pods immediately appearing on the "Targets" page and we can query the metrics it exposes. Service and pod labels of our example application, as well as the Kubernetes namespace, are automatically attached as labels to the scraped metrics.This allows us to aggregate and filter along them in our Prometheus queries and alerts.

"Targets" page of prometheus-frontend

Prometheus will automatically pick up new services having the tier = frontend label and adapt to their deployments scaling up and down. Additionally, the Operator will immediately reconfigure Prometheus appropriately if ServiceMonitor s are added, removed, or modified.

The image below visualizes how the controller manages Prometheus deployment by watching the state of our Prometheus and ServiceMonitor resources. The relationships between the resources are expressed through labels and any changes take immediate effect at runtime.

Future Directions

With Operators introduced today we showcase the power of the Kubernetes platform. The Prometheus Operator extends the Kubernetes API with new monitoring capabilities. We have seen how the Prometheus Operator helps us with dynamically deploying Prometheus instances and managing their life cycle. Additionally, it provides a way to define custom service monitoring purely expressed in Kubernetes idioms. Monitoring truly becomes part of the cluster itself and all implementation details of a distinct system being used are abstracted away.

While it's still in an early stage of development, the Operator already handles several aspects of a Prometheus setup that are beyond the scope of this blog post, such as persistent storage, replication, alerting, and version updates. Check out the Operator's documentation to find out more. The kube-prometheus repository contains a variety of essentials to get your cluster monitoring up and running in no time. It also provides out-of-the-box dashboarding and alerting for cluster components.

Stay tuned for more features of the Prometheus Operator and additional operators to equally easily run the Prometheus Alertmanager and Grafana inside of your cluster.

Join CoreOS at KubeCon

We're hosting a number of events at the Kubernetes conference, KubeCon in Seattle, November 8 and 9, 2016. Join us, especially at the Prometheus keynote on Wednesday, November 9 at 3:30 p.m. PT, which will dive in deeper on the Prometheus Operator.

Be sure to check out the full schedule of CoreOS KubeCon events, then stop by and visit our engineers at the CoreOS booth with your Kubernetes and container questions, or request an on-site sales meeting with a specialist.