Last update: February 10, 2019

To have a Kubernetes cluster up and running is pretty easy these days. However, when you start to use it and deploy some applications, you might expect some issues over time. Kubernetes being a distributed system is not easy to troubleshoot. You need a proper monitoring solution, and because the Prometheus is CNCF project as Kubernetes, it is probably the best fit. In this post, I will show you how to get the Prometheus running and start monitoring your Kubernetes cluster in 5 minutes.

Prometheus Operator

CoreOS introduced operators as business logic in the first place. I wrote about Elasticsearch operator and how operator works a few months ago so you might check it out. In my opinion, operators are the best way to deploy stateful applications on Kubernetes.

CoreOS team also created Prometheus operator for deploying Prometheus on top of Kubernetes. Here is the official operator workflow and relationships view:

From the picture above you can see that you can create a ServiceMonitor resource which will scrape the Prometheus metrics from the defined set of pods. It instructs Prometheus to watch on a new target. For example, if you have a frontend app which exposes Prometheus metrics on port web , you can create a service monitor which will configure the Prometheus server automatically:

apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: frontend-app labels: app: frontend-app release: prom spec: namespaceSelector: any: true selector: matchLabels: app: frontend-app endpoints: - port: web interval: 10s

NOTE: release: prom is the Helm release name that I used to install the Prometheus operator below!

Installation

For Prometheus installation use the official prometheus-operator Helm chart. This chart has many options, so I encourage you to take a look at default values file and override some values if needed. Among other services, this chart installs Grafana and exporters ready to monitor your cluster.

Let's enable persistent storage for all Prometheus components and also expose Grafana with ingress. I prepared a custom values file to use for installation with Helm:

⚡ cat > custom-values.yaml <<EOF # Depending on which DNS solution you have installed in your cluster enable the right exporter coreDns: enabled: false kubeDns: enabled: true alertmanager: alertmanagerSpec: storage: volumeClaimTemplate: spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 10Gi prometheus: prometheusSpec: storageSpec: volumeClaimTemplate: spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 10Gi grafana: adminPassword: "YourPass123#" ingress: enabled: true annotations: kubernetes.io/ingress.class: nginx kubernetes.io/tls-acme: "true" hosts: - grafana.test.akomljen.com tls: - secretName: grafana-tls hosts: - grafana.test.akomljen.com persistence: enabled: true accessModes: ["ReadWriteOnce"] size: 10Gi EOF

Installing Prometheus operator and Prometheus with all dependencies is just one command now:

⚡ helm install \ --name prom \ --namespace monitoring \ -f custom-values.yaml \ stable/prometheus-operator

NOTE: Kubernetes 1.10+ with Beta APIs and Helm 2.10+ are required!

When you install the Prometheus operator, the new Custom Resource Definition or CRD gets created. You can check for existing CRDs with this command:

⚡ kubectl get crd NAME AGE alertmanagers.monitoring.coreos.com 10m prometheuses.monitoring.coreos.com 10m prometheusrules.monitoring.coreos.com 10m servicemonitors.monitoring.coreos.com 10m

Also, to see what each of those does check official design doc. The Prometheus operator manages all of them.

Wait a few minutes and the whole stack should be up and running. Check for all pods in monitoring namespace:

⚡ kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-prom-prometheus-operator-alertmanager-0 2/2 Running 0 10m prom-grafana-6c7c9cf8fc-szkpv 3/3 Running 0 10m prom-kube-state-metrics-5648859669-lwxmr 1/1 Running 0 10m prom-prometheus-node-exporter-2896z 1/1 Running 0 10m prom-prometheus-node-exporter-2dwcv 1/1 Running 0 10m prom-prometheus-node-exporter-6fk24 1/1 Running 0 10m prom-prometheus-node-exporter-9kwvh 1/1 Running 0 10m prom-prometheus-node-exporter-9w6bc 1/1 Running 0 10m prom-prometheus-node-exporter-g9v4v 1/1 Running 0 10m prom-prometheus-node-exporter-jqpjk 1/1 Running 0 10m prom-prometheus-node-exporter-ph2xz 1/1 Running 0 10m prom-prometheus-node-exporter-q46bn 1/1 Running 0 10m prom-prometheus-node-exporter-tfdzs 1/1 Running 0 10m prom-prometheus-operator-operator-6d645ff4d6-lkc98 1/1 Running 0 10m prometheus-prom-prometheus-operator-prometheus-0 3/3 Running 1 10m ⚡ kubectl get ingress -n monitoring NAME HOSTS ADDRESS PORTS AGE prom-grafana grafana.test.akomljen.com 80, 443 10m

Grafana default dashboards are present when you log in. Of course, you can always update them, or create a completely new dashboard if you need to. In the example below you can see how the node view looks like:

If you want to access other services you can forward the port to localhost, for example:

# Alert manager ⚡ kubectl port-forward -n monitoring alertmanager-prom-prometheus-operator-alertmanager-0 9093 # Prometheus server ⚡ kubectl port-forward -n monitoring prometheus-prom-prometheus-operator-prometheus-0 9090

When you expose Prometheus server to your localhost, you can also check for alerts at http://localhost:9090/alerts . You could also use ingress to expose those services, but, they don't have authentication so you would need something like OAuth Proxy in front.

Summary

It is almost impossible not to experience any issues with Kubernetes cluster once you start to use it. This monitoring setup will help you along the way. Of course, this is only one part of monitoring, and it's mostly cluster related. Many cloud-native applications have Prometheus support out of the box, so getting application metrics should be the next step. I will cover this in some future blog post. Stay tuned for the next one.