We, at Axual, pride ourselves in running high volume, mission critical Apache Kafka clusters for businesses in various domains like Banking and Energy available as SaaS services (Axual Cloud). In order to spend less time managing the infrastructure and more time building cool new features for our customers, we decided to move our stack into Kubernetes. This article explains how running Apache Kafka in Kubernetes is not difficult if you have the right tools and information.

Over the last few years, Kubernetes has enjoyed a tremendous rise in popularity as the Google Trends graph shows. Not just startups and small companies, even enterprises are now slowly moving towards Kubernetes or an enterprise version of it. Much of this growth is attributable to Containerization with Docker and the buzz of Microservices.

At its very basic, Kubernetes is a scheduler. It is extremely good at scheduling containers in the most optimum way to allow for efficient usage of VM resources. Kubernetes has without doubt proven itself as the orchestrator of choice for Stateless applications. The adoption path is also well established:

Compile and build the artifacts. Alter it as per 12-factor recommendations. Wrap it inside a container with Docker Let Kubernetes orchestrate and schedule your containers.

But the story is very different when it comes to Stateful applications.

Stateful Applications

If your application is anything more than a web application, you probably have to deal with run-time state. However, when does an application become stateful? Ask yourself the question: Can I replace an instance of my application with another? Can I load-balance the request against multiple copies of the application without errors? Before you answer yes, think carefully about what your application is storing in memory or on disk. Is that data changing at run-time? Would your application break if this data were to be lost in the event of replacement by another instance? If so, you have a stateful application.

To properly orchestrate stateful apps, Kubernetes came up with a new resource definition StatefulSet (earlier known as PetSet) in version 1.5 back in late 2016. The StatefulSet, when used correctly can be used to run stateful applications in Kubernetes. In this post, we will try to understand what it takes to run Apache Kafka (a stateful application) in Kubernetes.

Apache Kafka as Stateful Application

Apache Kafka, a real-time streaming solution is a complex stateful application. It has

Brokers with identity. Replacing a broker is not as simple as replacing a pod.

Dependency on ZooKeeper — another stateful application that powers Kafka’s distributed behaviour.

Persistent storage. Due to replication, it is possible to lose this data and recover but it should not be considered standard operation especially if large volumes are being handled.

If you are considering running Apache Kafka in Kubernetes in 2020, you don’t need to set it up yourself! There are options available like Strimzi. Let’s take a look at how Strimzi Kafka Operator makes running Kafka in Kubernetes easy.

Strimzi Operator

Strimzi is Kubernetes Operator for deploying Kafka clusters. It defines a Custom Resource Definition (CRD) called “Kafka” which represents a combination of ZooKeeper and Kafka Cluster. A very simple example of Kafka deployment in Kubernetes would be as below:

The above configuration will setup a 3 node ZooKeeper cluster, 3 broker Kafka cluster with a 100Gi persistent disk attached. This effectively translates to StatefulSets for Kafka and ZooKeeper, headless service to be able to access individual brokers directly (instead of load balanced).

Effective Day 2 Operations

As you can see it is very easy to get started with deploying a Kafka cluster in Kubernetes. But what guarantees stability and resilience when upgrades and maintenance needs to be done on the Kafka cluster?

Kubernetes provides many resources that can be used to deploy any application in a more resilient way. Let’s see how Strimzi makes use of them.

Pod Disruption Budgets

Any application that is considered critical and cannot accept downtime (planned or otherwise) must define a Pod Disruption Budget (PDB). A PDB tells Kubernetes how many pods of this application are allowed to be disrupted. When Kubernetes cluster administrator performs a maintenance action like node draining (kubectl drain), the action will respect all PDBs configured and only remove pods if disruption budget allows it.

Strimzi adds two PDBs each for ZooKeeper and Kafka. Let’s take a look at the Kafka PDB:

The PDB above states that amongst all broker pods of Kafka, only 1 can be unavailable at any given time. So if you have 3 Kafka brokers on 3 distinct nodes, any attempt to drain more than 1 node will be blocked as it goes against the defined PDB. This is customizable to a higher value for large Kafka clusters.

Network Policies

Network Policies can be used to describe rules that determine which applications are allowed to connect to other applications on a higher abstraction level. Strimzi defines network policies for ZooKeeper (to be only accessed by Kafka) and for Kafka (different listener ports accessible to different entities).

Below is a default network policy defined by Strimzi for Kafka:

The replication port 9091 is only accessible to other Kafka brokers and operators defined within Strimzi to perform administration tasks. The plain text port 9093 and TLS port 9094 is not restricted to any application.

Note that Kubernetes only provides a Network Policy interface. The implementation is left to the administrator of the cluster. For instance, you can install a CNI plugin like Calico to implement the network policies in a cluster. Without such a plugin these network policies have no effect!

Rack Awareness

Configuring rack correctly in Kafka is very important for multiple reasons. Kafka uses rack configuration to determine where the partition replicas will end up. For instance if you have 6 Kafka brokers across 3 different availability zones (or racks) spread evenly (2 in each zone). When a topic with replication factor of 3 is created, Kafka will choose brokers that are in different zones. In this case, it will ensure no zone has more than 1 replica. This ensures maximum availability.

Strimzi allows configuring rack in Kafka brokers as below:

The topologyKey is a label that must exist on all nodes of the Kubernetes cluster. Most cloud managed clusters like EKS, AKS and GKE provide this by default. In case of EKS, the rack value passed to Kafka broker would be something like eu-central-1a.

Strimzi uses the above rack configuration to ensure each Kafka broker pod is also spread across all availability zones. It does it using a Kubernetes concept called Affinity. Let’s take a look at that.

Affinity

As an orchestrator, Kubernetes does scheduling. Schedulers deal with the basic question: “Where to run the pod?”. Sometimes you want certain pods to be on the same (or different) machine than another application pod. For instance, two applications that communicate a lot of data can be co-located on the same node to keep the traffic local. Or you want an application pod to be on a specific node which has higher CPU and memory. Such scenarios can be handled in Kubernetes using Affinity.

There are two types of Affinity: NodeAffinity — used to answer the question “Which node to run this pod on?” and PodAffinity (or PodAntiAffinity) — used to answer the question “Should this pod run in the same/different node as this other pod?” Both determine the node where the pod will run but the rules are based on nodes in former and pods in latter.

Strimzi defines both Node and Pod Affinity if rack configuration is enabled. Below is an example of NodeAffinity:

Above configuration might look complex but it is simple to follow. It tells the Kubernetes scheduler that the Kafka broker pods must be scheduled on Nodes that have a label with key failure-domain.beta.kubernetes.io/zone. So your worker nodes must have this label present. For managed clusters like EKS, AKS and GKE, this is already available.

The unusually long configuration item requiredDuringSchedulingIgnoredDuringExecution implies the rule should be strictly enforced when scheduling new Kafka broker pods (requiredDuringScheduling) but go easy on any already running Kafka broker pods found on a node that does not meet this criteria (ignoredDuringExecution).

Below is an example of PodAntiAffinity defined by Strimzi when rack is enabled:

To explain this configuration, let’s walk through scheduling 3 Kafka broker pods:

When scheduling broker1, scheduler will attempt to run the pod on a node in say availability zone AZ1. The affinity configuration above says check if there is already a Kafka broker pod running (based on strimzi.io labels) on this node and if such a pod is not running (AntiAffinity), then schedule broker1 on AZ1 which will succeed, Next scheduler attempts to schedule broker2 on node AZ1 but this time it finds the broker1 pod already running. So this node is skipped and a different node AZ2 is found and broker2 is scheduled. Step 2 is repeated for broker3 to be scheduled on AZ3.

What if you had a 4th broker to also be scheduled? Which node would that end up on? Here the configuration item preferredDuringSchedulingIgnoredDuringExecution becomes important. It implies that the affinity rule is preferred but not mandatory. So when scheduling the 4th broker, the rule will not be satisfied (assuming there are only 3 unique AZs) but scheduled on some node anyway.

Health Checks

No application deployment in Kubernetes can be considered production ready without proper health checks configured. Kubernetes allows you to configure two checks — liveness probe and readiness probe.

Liveness probe is used to determine when to restart the pod. Common probe solutions are TCP port check or HTTP GET call. If the pod is not responding for a certain interval, it will be restarted.

Readiness probe is used to determine when pods should start receiving traffic. This is done by adding the pod in the backend of the Service responsible for traffic to these pods.

In the case of Kafka, the distinction between Liveness and Readiness probes is important. Kafka starts the listeners but that is not the confirmation that it is ready to serve clients. This may be due to out of sync replicas that need to catch up with other brokers. If the broker is managing thousands of partitions, this syncing process could take minutes. Hence it is important to set the readiness probe for Kafka correctly. Remember, both Liveness and Readiness probes must be set.

Liveness Probe

Strimzi uses a custom bash script to test the listener on replication port (9091) using netstat. When this liveness probe is successful, it indicates that Kafka has started correctly and the replication listener is activated.

Readiness Probe

Setting up a readiness probe for Kafka is tricky. When is Kafka broker ready to serve clients? When the broker starts, it performs many operations like obtaining cluster metadata from ZooKeeper, synchronizing with the controller, log sanity checks, gain leadership of partitions, synchronize the replicas to join the ISR list of partitions and some more. Most of these activities happen after the listener has started, so while the broker is “live”, it is not yet “ready”.

When the broker is finally ready to serve requests from clients, it reports this via a JMX metric kafka.server:type=KafkaServer,name=BrokerState. Strimzi checks this metric in the readiness probe to determine when Kafka broker is ready to receive traffic. This is done in a Java agent which polls this metric and once the desired state is reached, writes a file to disk which is then checked for existence by the Readiness probe.

Having correct probes setup ensures that rolling upgrades of Kafka brokers would have zero downtime.

Conclusion

In this post we saw how running a stateful application like Kafka in Kubernetes is a challenge. Let’s recap what we learnt.

Use operators like Strimzi when running Kafka in Kubernetes Don’t underestimate the importance of maintaining and running a stateful application in Kubernetes. Make effective use of Kubernetes resources like Pod-disruption budgets, Network policies, Affinities and Health Checks.

With proper understanding of the various Kubernetes constructs, it is possible to run Kafka safely and reliably. Good luck!

About

Abhinav Sonar, Software Engineer @Axual

This article is written by Abhinav Sonkar, team member of Axual.

We are a club of enthusiasts with a passion for technology. Every day we challenge ourselves to push the boundaries in our profession and our quality.

That is how Axual was created. From the aim to make streaming simple. With Axual we put every company in the data driver. Our software enables situational awereness across the organization.

We are happy to take you through our process of learning, innovating, failing and improving through these blogs.

Want to know more? Read our blogs or visit www.axual.com