These dashboards are very complete but since I started playing with Prometheus and Grafana back then, I adjusted some dashboards I already used and created come customized ones with broader view of the nodes.

These additional dashboards are provided on grafana-dashboard diretory and can be imported after the deployment using the JSON files (Plus sign on the left -> Import -> Upload .json file).

One of the custom modified dashboards

Custom dashboard, pt2

Custom dashboard, pt3

I also added a dashboard to monitor Traefik stats:

And Prometheus stats dashboard:

Also, I’m using Grafana to fire notifications based on alerts created on the dashboards. These alerts are defined on each Grafana panel you want to alert on and the threshold. Grafana then sends a notification via email (in this case) using a email relay deployment(more on this later).

All alerts defined

Defining an alert threshold

To provide all this, the prometheus-operator stack, maintained by CoreOS is composed of:

Prometheus-operator — The element that manages all components

Prometheus — The collecting and time-series database to store the data

node-exporter — Collector to fetch node data

arm-exporter — Collector to fetch board temperature

alertmanager — Element to provide alarm notifications with Prometheus

Grafana — The dashboard GUI

Kube-state-metrics — Collector for Kubernetes cluster stats

To deploy this stack, on the root of the project there is an automated script that should take care of everything. Before running, adjust the parameters listed on the sections below.

./deploy

This should create a monitoring namespace and spawn all deployments and pods to your Kubernetes cluster.

Now a brief overview of some elements.

Prometheus-operator

Prometheus-operator for Kubernetes provides easy monitoring definitions for Kubernetes services and deployment and management of Prometheus instances. This is the responsible for spawning the deployments and automatically configuring Prometheus to fetch new targets based on service-monitor configurations.

Here the service-monitor for Traefik that I created:

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

name: traefik-ingress-lb

labels:

k8s-app: traefik-ingress-lb

spec:

jobLabel: k8s-app

endpoints:

- port: admin

interval: 30s

selector:

matchLabels:

k8s-app: traefik-ingress-lb

namespaceSelector:

matchNames:

- kube-system

With this definition, prometheus-operator will configure Prometheus to look for a traefik service tags and and how to access it’s metrics thru the configured port, all dynamically without need for restarts. Look for all service-monitors in the repository.

Prometheus

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

Prometheus have it’s targets dynamically configured by operator:

Prometheus GUI is exposed on an ingress in the internal network and can be used to query it’s metrics. Adjust the parameters in the for the ingress URL in ./manifests/prometheus/prometheus-k8s-ingress.yaml.

Prometheus also collects metrics from multiple elements in the Kubernetes cluster. For this, it needs access to the Kubelet API and other Kubernetes elements.

If your deployment was made with Kubeadm (like my other article), be sure to do the following changes:

According to the official deployment documentation here, a couple of changes on the cluster are required: We need to expose the cadvisor that is installed and managed by the kubelet daemon and allow webhook token authentication. To do so, we do the following on all the masters and nodes:

sed -e "/cadvisor-port=0/d" -i /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

sed -e "s/--authorization-mode=Webhook/--authentication-token-webhook=true --authorization-mode=Webhook/"

systemctl daemon-reload

systemctl restart kubelet

In case you already have a Kubernetes deployed with kubeadm, change the address kube-controller-manager and kube-scheduler listens on master node in addition to previous kubelet change:

sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-controller-manager.yaml

sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-scheduler.yaml

More information on kube-prometheus site.

Alertmanager

The Alertmanager handles alerts sent by client applications such as the Prometheus server.

Adjust the parameters for the ingress URL UI in ./manifests/alertmanager/alertmanager-ingress.yaml.

Node-Exporter

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels. This component provides all metrics regarding the hardware and the underlying operating system.

Arm-exporter

This component exports only one metric, the temperature of the processor die. This metric can be seen in the additional Grafana dashboard “Kubernetes cluster monitoring (via Prometheus)”.

Kube-state-metrics

Kube-state-metrics is a service that listens to the Kubernetes API server and generates metrics about the state of the objects.

SMTP Relay

To provide email notifications, I added an Email relay service to integrate with a Gmail account. To deploy it, adjust the environment variables on ./manifests/smtp-server/smtp.yaml to add your Gmail credentials.

Grafana

Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus and InfluxDB

Prometheus-operator deploys Grafana and it’s dashboards. To have the additional dashboards, load them from the grafana-dashboards directory.

For email notifications, adjust your source email on ./manifests/grafana/grafana-configmap.yaml and the ingress URL in ./manifests/grafana/grafana-external-ingress.yaml.

To access it go to http://grafana.internal.mydomain.com

Conclusion

This stack makes the deployment and management of the monitoring stack much more complete and dynamic. To check more information, look into prometheus-operator documentation and as usual, read the sources for the manifests.