Trying Prometheus Operator with Helm + Minikube

An introduction to Prometheus Operator, how to deploy it in Minikube (with helm) and configure alert notifications for Slack.

TL;DR

If you are not patient and wants to skip the better part of the learning, here is your commands/files:

# Minikube setup

$ minikube start --kubernetes-version=v1.13.4 \

--memory=4096 \

--bootstrapper=kubeadm \

--extra-config=scheduler.address=0.0.0.0 \

--extra-config=controller-manager.address=0.0.0.0 # Helm Initialization

$ kubectl create serviceaccount tiller --namespace kube-system $ kubectl create clusterrolebinding tiller-role-binding --clusterrole cluster-admin --serviceaccount=kube-system:tiller $ helm init --service-account tiller # Installing Prometheus Operator

$ helm install stable/prometheus-operator --version=4.3.6 --name=monitoring --namespace=monitoring --values=values_minikube.yaml

Introduction

An introduction about terms, tools, Prometheus components and the architecture of the monitoring stack.

Prometheus

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system.

It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

Prometheus supports two types of rules which may be configured and then evaluated at regular intervals:

Recording rules : Allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series.

: Allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series. Alerting rules : Alerting rules allow you to define alert conditions based on Prometheus expression language expressions and to send notifications about firing alerts to an external service.

Prometheus includes a local on-disk time series database to store collected metrics, but also optionally integrates with remote storage systems.

Prometheus official Architecture Overview. Source: Prometheus Project

Alertmanager

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or Slack.

The following describes the core concepts the Alertmanager implements:

Grouping : Grouping categorizes alerts of similar nature into a single notification. Grouping of alerts, timing for the grouped notifications, and the receivers of those notifications are configured by a routing tree in the configuration file.

: Grouping categorizes alerts of similar nature into a single notification. Grouping of alerts, timing for the grouped notifications, and the receivers of those notifications are configured by a routing tree in the configuration file. Inhibition : Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing. Inhibitions are configured through the Alertmanager's configuration file.

: Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing. Inhibitions are configured through the Alertmanager's configuration file. Silences : Silences are a straightforward way to simply mute alerts for a given time. Silences are configured in the web interface of the Alertmanager.

Alert Manager Overview. Source: en.fabernovel.com

An alert can have the following states:

Inactive : The state of an alert that is neither firing nor pending.

: The state of an alert that is neither firing nor pending. Pending : The state of an alert that has been active for less than the configured threshold duration.

: The state of an alert that has been active for less than the configured threshold duration. Firing : The state of an alert that has been active for longer than the configured threshold duration.

Prometheus Operator

The Prometheus Operator makes the Prometheus configuration Kubernetes native and manages and operates Prometheus and Alertmanager clusters.

Once installed, the Prometheus Operator provides the following features:

Create/Destroy : Easily launch a Prometheus instance for your Kubernetes namespace, a specific application or team easily using the Operator.

: Easily launch a Prometheus instance for your Kubernetes namespace, a specific application or team easily using the Operator. Simple Configuration : Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource.

: Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource. Target Services via Labels: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn a Prometheus specific configuration language.

The Operator creates and acts on the following Kubernetes custom resource definitions (CRDs):

A custom resource is an extension of the Kubernetes API that is not necessarily available in a default Kubernetes installation. It represents a customization of a particular Kubernetes installation.

Prometheus: Which defines a desired Prometheus deployment. The Operator ensures at all times that a deployment matching the resource definition is running.

Which defines a desired Prometheus deployment. The Operator ensures at all times that a deployment matching the resource definition is running. ServiceMonitor: Which declaratively specifies how groups of services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the definition.

Which declaratively specifies how groups of services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the definition. PrometheusRule: Which defines a desired Prometheus rule file, which can be loaded by a Prometheus instance containing Prometheus alerting and recording rules.

Which defines a desired Prometheus rule file, which can be loaded by a Prometheus instance containing Prometheus alerting and recording rules. Alertmanager: Which defines a desired Alertmanager deployment. The Operator ensures at all times that a deployment matching the resource definition is running.

To learn more about these CRDs have a look at the design doc.

The diagram below illustrates the interactions between all components:

Prometheus Operator Architecture. Source: www.nicktriller.com

Helm

Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources. Helm has two parts:

Client ( helm ): The helm binary itself.

): The helm binary itself. Server ( tiller ): Tiller runs inside of your Kubernetes cluster, and manages releases (installations) of your charts.

The Helm project has a repository for official charts, which includes a stable chart for Prometheus Operator.

Prometheus Operator Helm Chart

The stable/prometheus-operator helm chart includes multiple components and is suitable for a variety of use-cases. The default installation deploys the followings components:

Service monitors to scrape internal kubernetes components:

kube-apiserver

kube-scheduler

kube-controller-manager

etcd

kube-dns/coredns

With the default installation, the chart also includes dashboards and alerts (the next sections will cover where the defaults come from).

The Prometheus Operator is the project from CoreOS.

The stable/prometheus-operator is the chart managed by Helm community, that installs Prometheus Operator along with other components.

From now on, we’ll use this naming convention to avoid confusion.

Installing the k8s cluster (Minikube + Helm)

Minikube is a tool that makes it easy to run a single-node Kubernetes locally.

Before proceeding, make sure to have the following binaries installed (follow the links for the official instructions):

kubectl (although I recommend this read)

minikube

helm

Once you have the binaries installed, start the minikube cluster:

$ minikube start --kubernetes-version=v1.13.4 \

--memory=4096 \

--bootstrapper=kubeadm \

--extra-config=scheduler.address=0.0.0.0 \

--extra-config=controller-manager.address=0.0.0.0

Them, initialize helm to install the tiller (the server side component):

$ kubectl create serviceaccount tiller --namespace kube-system $ kubectl create clusterrolebinding tiller-role-binding --clusterrole cluster-admin --serviceaccount=kube-system:tiller $ helm init --service-account tiller

Your kube-system namespace should look like this:

$ kubectl get po --namespace=kube-system

NAME READY STATUS RESTARTS AGE

coredns-86c58d9df4-2lk4w 1/1 Running 0 23m

coredns-86c58d9df4-ss7bl 1/1 Running 0 22m

etcd-minikube 1/1 Running 0 22m

kube-addon-manager-minikube 1/1 Running 0 22m

kube-apiserver-minikube 1/1 Running 0 22m

kube-controller-manager-minikube 1/1 Running 0 22m

kube-proxy-fcb27 1/1 Running 0 22m

kube-scheduler-minikube 1/1 Running 0 22m

storage-provisioner 1/1 Running 0 23m

tiller-deploy-6cf89f5895-vpst6 1/1 Running 0 30m

Installing the stable/prometheus-operator

Use the helm repo up command to get the latest information about charts from the official chart repositories.

$ helm repo up Hang tight while we grab the latest from your chart repositories...

...Skip local chart repository

...Successfully got an update from the "stable" chart repository

Update Complete. ⎈ Happy Helming!⎈

Information is cached locally, where it can be used by commands like helm search . This is useful to search for available versions for a given stable chart:

# Search for stable/prometheus-operator higher/equal than 4.3

$ helm search stable/prometheus-operator --versions --version=">=4.3" --col-width=20

NAME CHART VERSION APP VERSION DESCRIPTION

stable/prometheus... 4.3.6 0.29.0 Provides easy mon...

stable/prometheus... 4.3.5 0.29.0 Provides easy mon...

stable/prometheus... 4.3.4 0.29.0 Provides easy mon...

stable/prometheus... 4.3.3 0.29.0 Provides easy mon...

stable/prometheus... 4.3.2 0.29.0 Provides easy mon...

stable/prometheus... 4.3.1 0.29.0 Provides easy mon...

stable/prometheus... 4.3.0 0.29.0 Provides easy mon...

To install the stable/prometheus-operator:

$ helm install stable/prometheus-operator --version=4.3.6 --name=monitoring --namespace=monitoring

If all goes as expected, you should have the following pods running/ready in your monitoring namespace:

$ kubectl get po --namespace=monitoring

NAME READY STATUS RESTARTS AGE

alertmanager-monitoring-prometheus-oper-alertmanager-0 2/2 Running 0 53m

monitoring-grafana-698c577785-x486r 2/2 Running 0 53m

monitoring-kube-state-metrics-55978bb47f-2ggbv 1/1 Running 0 53m

monitoring-prometheus-node-exporter-mg4fj 1/1 Running 0 53m

monitoring-prometheus-oper-operator-69fbbb6bd5-gccqg 1/1 Running 0 53m

prometheus-monitoring-prometheus-oper-prometheus-0 3/3 Running 1 53

Trivia: If you need to inspect the charts for a given version, use the helm fetch command to download the chart. This is useful for fetching charts to inspect, compare versions, modify, or repackage before installing:

# Download the chart at a specific version, then unpack it

$ helm fetch stable/prometheus-operator --untar --version=4.3.6

Accessing Prometheus Services

Use the kubectl proxy to access the following relevant services:

Prometheus

Here you can query on the metrics, see all the predefined alerts and Prometheus status and targets.



$ kubectl port-forward -n monitoring prometheus-monitoring-prometheus-oper-prometheus-0 9090 # URL: http://localhost:9090 $ kubectl port-forward -n monitoring prometheus-monitoring-prometheus-oper-prometheus-0 9090

You may notice some firing alerts and unreachable targets (including the kube-controller-manager and the kube-schedule. We will fix this in the next section).

Alertmanager

In the Alertmanager UI you can view alerts received from Prometheus, sort alerts by labels and silence alerts for a given time.



$ kubectl port-forward -n monitoring alertmanager-monitoring-prometheus-oper-alertmanager-0 9093 # URL: http://localhost:9093/#/alerts $ kubectl port-forward -n monitoring alertmanager-monitoring-prometheus-oper-alertmanager-0 9093

Grafana

Here you can look at the dashboards. Grafana has a datasource ready to query on Prometheus.



# Default credentials

# User: admin

# Pass: prom-operator

$ kubectl port-forward $(kubectl get pods --selector=app=grafana -n monitoring --output=jsonpath="{.items..metadata.name}") -n monitoring 3000 # URL: http://localhost:3000 # Default credentials# User: admin# Pass: prom-operator$ kubectl port-forward $(kubectl get pods --selector=app=grafana -n monitoring --output=jsonpath="{.items..metadata.name}") -n monitoring 3000

Grafana Dashboard

Fixing the Prometheus unreachable targets

You may notice that some alerts are firing due to unreachable targets errors.

The stable/prometheus-operator deploys some services that are used by the ServiceMonitors to scrape the metrics. The default selectors configured in this services may not match the labels of your cluster (which is the minikube case):

Note the selector/label mismatch for core-dns, kube-controller-manager, etcd-server and kube-scheduler

To fix this we will upgrade our release, specifying this time a values_minikube.yaml file to override the default helm values:

# values_minikube.yaml

coreDns:

service:

selector:

k8s-app: kube-dns kubeControllerManager:

service:

selector:

k8s-app: null

component: kube-controller-manager kubeEtcd:

service:

selector:

k8s-app: null

component: etcd kubeScheduler:

service:

selector:

k8s-app: null

component: kube-scheduler

Upgrading:

$ helm upgrade monitoring stable/prometheus-operator --version=4.3.6 --values=values_minikube.yaml

Check the selectors again and they should be matching with the minikube labels. Now all targets should be reachable:

Oops, etcd is still unreachable.

I lied… etcd will still be unreachable. By default, it only listens on 127.0.0.1, so Prometheus cannot scrape metrics from it.

This is why you started minikube with extra-config to start controller-manager and the scheduler listening on 0.0.0.0 (all interfaces). Etcd is more complicated, feel free to dig into it 😊.

Where the default rules and dashboards come from?

The default Grafana Dashboards and Prometheus Rules are just a copy from Prometheus Operator and other sources, synced (with alterations) by scripts in hack folder.

The Prometheus Operator, on the other hand, imports the k8s rules/dashboards from the kubernetes-mixin project. To propose any changes or issues, use the kubernetes-mixin project.

Remember that rules are managed by a CRD, so to list them:

$ kubectl get prometheusrules --namespace=monitoring

Dashboards are stored as config maps:

$ kubectl get configmap --selector grafana_dashboard=1 --namespace=monitoring

Adding custom dashboards and rules

Custom dashboards can be imported by using the Grafana Provisioning with sidecar for dashboards.

In short, there is a container that watches all config maps in the monitoring namespace and filters out the ones with the label grafana_dashboard, provisioning them as dashboards. Just add a config map containing your Grafana dashboard:

apiVersion: v1

kind: ConfigMap

metadata:

name: my-cystom-dashboard

labels:

grafana_dashboard: "1"

data:

my-custom-dasbhoard.json: |-

[...]

To add custom Prometheus rules, use the additionalPrometheusRules helm parameter from stable/prometheus-operator.

Configuring Alertmanager notifications

Here we will accomplish three things at once:

Set a Slack Webhook as a default receiver for Alertmanager

Improve the notification templates

Inhibit some undesirable alerts

The next sub-sections will explain each of them in details.

The alertmanager.config helm parameter can be used to declare custom configs for Alertmanager. This config is deployed in a secret, and can be checked with:

# Default config for Alertmanager $ kubectl get secret --namespace=monitoring alertmanager-monitoring-prometheus-oper-alertmanager -o go-template=’{{ index .data “alertmanager.yaml” }}’ | base64 --decode global:

resolve_timeout: 5m

receivers:

- name: "null"

route:

group_by:

- job

group_interval: 5m

group_wait: 30s

receiver: "null"

repeat_interval: 12h

routes:

- match:

alertname: Watchdog

receiver: "null"

Use the following values_minikube.yaml file (replace the slack_configs.api_url and slack_configs.channel ), then upgrade the monitoring release:

$ helm upgrade monitoring stable/prometheus-operator --version=4.3.6 --values=values_minikube.yaml

Once the release is upgraded, you can check the Alertmanager secret again to make sure that it is updated.:

$ kubectl get secret --namespace=monitoring alertmanager-monitoring-prometheus-oper-alertmanager -o go-template='{{ index .data "alertmanager.yaml" }}' | base64 --decode

The Alertmanager will be automatically reloaded in a while. You can check if the configuration is updated by viewing the “Status” tab on the AlertManager UI.

There is a config-reloader sidecar container for the Alertmanager pod. If the config was not updated automatically, check logs for errors:

$ kubectl logs --namespace=monitoring alertmanager-monitoring-prometheus-oper-alertmanager-0 config-reloader

If all goes as expected, you should receive alert notifications in your configured Slack channel:

Alert notification for a pod that is in a crash loop

The Slack receiver

No tricks here, we just used the Alertmanager slack_config to create a receiver named slack. Defining it in global.route.receiver makes it the default receiver.

The notification template

The default Alertmanager template for Slack is too simple, it does not provide any context about the triggered alert (e.g., the instance name, namespaces, ip addresses, etc).

Default Alertmanager template for Slack. No much info here…

In our custom template, we used title to create a generic message and text to range through the alerts, so we can receive multiple alerts of the same type in a single message. More than that, we range through the .Labels.SortedPairs , creating a complete Details section with everything involving the firing alert.

Oh, you can also click in the Graph icon to replicate the query used by Prometheus in the graph interface.

The alerts inhibition

Sometimes you have “known alerts” that you want to silence. It’s the case of minikube and the Overcommit alerts:

“Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.”

Minikube is a single-node Kubernetes cluster for tests purposes, so obviously it is not designed to support node failures. Overcommit will be always firing.

The first option is to silence the alert in the Alertmanager UI. But it is not practical, cannot be replicated across environments, and, even worse, can’t be done via config files.

The second (and tricky) option is to use an inhibition rule. Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing.

The Watchdog is an alert meant to be always firing. So we created an inhibition rule to match all .+Overcommit (regex) alerts with the Watchdog, inhibiting them forever.

Ok, it stinks, but works. Read these StackOverflow answers for more suitable options (or not).