Monitoring Kafka in Kubernetes

Monitoring Kafka in Kubernetes without Prometheus

TL;DR

This post focuses on monitoring your Kafka deployment in Kubernetes if you can’t or won’t use Prometheus. Kafka exposes its metrics through JMX. To be able to collect metrics in your favourite reporting backend (e.g. InfluxDB or Graphite) you need a way to query metrics using the JMX protocol and transport them. This is where jmxtrans comes in handy. With a few small tweaks it turns out it’s pretty effective to run this as a sidecar in your Kafka pods, have it query for metrics and transport them into your reporting backend. For the impatient: all sample code is available here.

Why to Monitor Kafka

Message passing is becoming more and more a popular choice for sharing data between different apps, making tools like Kafka become the backbone of your architecture. A well-functioning Kafka cluster is able to handle lots of data, but poor performance or a degradation in Kafka cluster health will likely cause issues across your entire stack. Hence, it’s crucial to be on top of this matter and have dashboards available to provide the necessary insights.

Metrics, metrics, metrics

Kafka provides a vast array of metrics on performance and resource utilisation, which are (by default) available through a JMX reporter. It took me a while to figure out which metrics are available and how to access them. It didn’t help that it also has changed a few times with Kafka releases. Confluent provides a nice (and mostly correct) overview of the available metrics in the more recent Kafka versions. Kafka metrics can be broken down into three categories:

Broker metrics

Producer metrics

Consumer metrics

There’s a nice write up on which metrics are important to track per category. For us Under Replicated Partitions and Consumer Lag are key metrics, as well as several throughput related metrics.

Configuring your Kafka deployment to expose metrics

So let’s assume the following Kafka setup on Kubernetes. Kafka pods are running as part of a StatefulSet and we have a headless service to create DNS records for our brokers. Then we would have to configure Kafka to report metrics through JMX. This is done by configuring the JMX_PORT environment variable. We would end up with a YAML file similar to the one below

Again, the important part here is that we set the JMX_PORT environment value to the value of 9010 meaning we will expose Kafka metrics on that port. You can verify that you can connect to this port using a tool like JConsole.

Exporting Kafka metrics to your reporting backend

Great, so we’ve confirmed that Kafka’s metrics are exposed and ready to be exported to your reporting backend. If you happen to use Prometheus you should probably setup Kafka Exporter or JMX exporter and be done with it. You can skip the rest of this post, because Prometheus will be doing the hard work of pulling the metrics in. However, most other reporting backends (e.g. InfluxDB, Graphite) are push based, so you need to extract and load the metrics yourself.

If you don’t want to mess around with (custom) Kafka Metrics Reporters jmxtrans might be interesting for you. Jmxtrans is a tool which is able to query multiple JVM’s for attributes exposed through JMX and outputs the results using a configurable output writer. It has output writers for many popular reporting backends, such as: Amazon CloudWatch, InfluxDB, Graphite, Ganglia, StatsD, etc.

I’ll now show our setup for use with InfluxDB. Here’s a sample jmxtrans configuration for InfluxDB:

{

"servers" : [ {

"port" : "8081",

"host" : "kafka.my-namespace.svc.cluster.local",

"queries" : [ {

"obj" : "java.lang:type=Memory",

"attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],

"resultAlias":"jvmMemory",

"outputWriters" : [ {

"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",

"url" : "http://127.0.0.1:8086/",

"username" : "admin",

"password" : "admin",

"database" : "jmxDB",

"tags" : {"application" : "kafka"}

} ]

} ]

} ]

}

As you can see you specify a list of queries per server in which you can query for a list of attributes. Per query you can specify a list of output writers. I’m not sure why it’s useful to redefine a list of output writers for each query. For the Kafka use case you will end up with a large config file, which contains a lot of repetition.

It would be great if we could use some kind of templating here. My template would look something like

The jmxtrans docker image supports feeding in JSON config files and supports variable substitution by using JVM parameters. So I can use that to inject secrets like ${influxPass} . However, I still need a solution not having to repeat the output writer for each metric. To keep things pragmatic I’m using jq to render a jmxtrans config file template based on a list of metrics. The template needs to be rendered before starting the actual jmxtrans container, so I’m using an Init Container to do this. Init Containers are like regular containers, but run before other containers are started. This is perfect for generating config files. Let’s create an Init Container to generate our jmxtrans config

As you can see the list of metrics are mounted from a ConfigMap and the resulting kafka.json file is written to another volume mount. See the ConfigMap below

Notice that in this ConfigMap we also put a simple bootstrap script to inject the JVM parameters for substitution by jmxtrans itself. The script will act as the entrypoint for the docker container.

Now we only need to add the jmxtrans container descriptor to our existing kafka pod template. I’ll add this as a sidecar so querying JMX will happen inside the pod only.

The important bits are that we mount the folder ( jmxtrans-input ) containing our generated config file, mount the boot.sh script and use this as the docker entrypoint in line 32.