It’s no news that for quite a while our Kafka on Kubernetes take, Supertubes has been happily running inside an Istio-based service mesh, in both single or multi-cluster setups across hybrid clouds. While we have touched on several aspects of the advantages Istio gave us, this post’s aim is to collect some of the issues, cornerstones and benefits.

We see the service mesh as a key component of every modern Cloud Native stack. To make this a reality, we are on a mission to make Istio simple to use and manage for everyone. We have built a product called Backyards, the Banzai Cloud operationalized and automated service mesh, which makes setting up and operating an Istio-based mesh a cinch.

Kafka on Istio, the usual suspects (problems) 🔗︎

The internet is full of questions and problems reported by people struggling with running Istio and Kafka alongside each other. Most problems are related to communication or bootstrap, and they all come from one single source: the sidecar.

One of the major problems is that a sidecar is not yet a first class citizen in Kubernetes. It’s been coming for a while and was announced in the 1.18 release, however it’s been pushed back to 1.19. To understand the problem and the solution in more detail please check out our post about sidecars: Sidecar container lifecycle changes in Kubernetes. And why is this causing any issues, you wonder? Well, the main reason is that Kafka and it’s metadata store, Zookeeper are designed to have all the required resources available at startup time:

Note that Kafka and Zookeeper were designed for physical on-prem datacenters and while it works fairly well in the cloud, out of the box it is not ready to run on a dynamic environment as Kubernetes.

Zookeeper tries to speak with quorum members. If the Envoy proxy is not ready yet it may occur that ZK members cannot create a quorum.

Kafka tries to connect to Zookeeper. If the Envoy proxy is not ready, brokers will crash.

Default Zookeeper installation binds only to the pod IP. This causes problems when using Istio, because the proxy sidecar wants to forward packets to the localhost address which is not listening on port 3888, resulting in “connection refused” errors. The end result is that the Zookeeper nodes are unable to elect a leader and the ensemble never starts.

In older (<1.4.3) Istio versions, Pilot sends the whole configuration to the proxies, which causes the reloading of the entire configuration. During these reloads Envoy terminates all existing connections.

While the above are some of the most common runtime problems you might face, there are different new problems as well. Let’s assume that Kafka on Istio is already working and all of the typical Kafka and Zookeeper communication failures are fixed. What is one of the first areas of interest you would focus on? Yes, Security.

Kafka on Istio, security a different way 🔗︎

There are well defined ways of handling security on Kafka (proprietary) and on Kubernetes, and these don’t match. Getting them to just work without rewriting Kafka broker clients, persisting the existing ACLs and translating/enforcing them as K8s RBAC is an extremely hard challenge. There are several benefits in using Istio’s built-in security mechanism (more details in the next paragraph), because:

It provides full mTLS inside and outside the cluster.

mTLS can be used for all the components: Kafka, Zookeeper, Cruise Control, Mirror Maker - you don’t need to set up JKS truststores and keystores for each.

If you have a client application accessing Kafka, you only have to drop it into the mesh and you get instant mTLS.

But coming back to the original question: How should you handle the fine-grained Kafka ACL’s while clients access brokers using client certificates, the whole mesh is secured with mTLS, and Envoy does the SSL termination? The new (Istio 1.5) Envoy Kafka protocol filter comes to our rescue. That, with a KafkaPrincipalBuilder provided by Supertubes makes the whole process transparent to broker clients, and users are bypassing Envoy (instead of Envoy sending back a PLAINTEXT anonymous principal).

Kafka on Istio, the benefits 🔗︎

Now let’s go through the benefits. I am not going to list all of them as we’ve blogged about several benefits (check out the Supertubes posts).

Security benefits 🔗︎

Developers and operators do not have to worry about implementing security features, they can rely on the transparent security features brought in by the service mesh:

Accessing brokers outside or inside the mesh happens through mTLS and is provided by Istio.

Certificate issuance and renewal are fully managed by Istio.

There is up to 20% performance improvement just by relying on Istio’s mTLS.

No modifications or reconfigurations are needed on the client side to make mTLS work.

Secure Zookeeper quorum communication and access is provided for Kafka clusters installed with Supertubes on Kubernetes.

Fine grained access, built on Kubernetes native building blocks.

Operational benefits 🔗︎

Additional Supertubes benefits 🔗︎

The features above are already built and provided by Supertubes but this is not all. While setting up a production-ready Kafka cluster on Kubernetes becomes as simple as registering for an evaluation version and running the following command to install the CLI tool: :

curl https://getsupertubes.sh | sh && supertubes install -a

there is way more Istio can provide.

With the new Envoy protocol filter, RBAC integration can be entirely handled by the filter. There is now a way to define more fine grained ACLs than topics. We can push down ACLs to Kafka partition level.

The filter plugin can do major version protocol-level transformations. Even though clients might stay on older Kafka versions, the cluster itself can be upgraded, and version incompatibility handled at Kafka protocol filter level.

Observability and management UI highlight the complete flow.

Extended client throttling based not just on throughput (provided by Kafka already) but on other metrics as well.

And finally, Istio has introduced WebAssembly extensibility support and this brings a totally new option (and additional languages, other than C++) to write different filters for the chain.

Supertubes was designed to be a best-of-class implementation of Kafka on Kubernetes leveraging Cloud Native technologies. As such, we opted to integrate tightly with the Istio service mesh, which - among other things - brings a layer of security, manageability, along with performance benefits to Kafka. This is a particularly compelling package if you are a SaaS provider who wants to run Kafka “as a service” on your own Kubernetes infrastructure, on your own terms. Supertubes installs, configures, and manages all the components that are required for Kafka success on Kubernetes.

If you plan to run Kafka on Kubernetes, and interested to learn more about Supertubes, check out the product page or read the documentation.

Banzai Cloud Supertubes (Supertubes) is the automation tool for setting up and operating production-ready Kafka clusters on Kubernetes, leveraging a Cloud-Native technology stack. Supertubes includes Zookeeper, the Banzai Cloud Kafka operator, Envoy, Istio and many other components that are installed, configured, and managed to operate a production-ready Kafka cluster on Kubernetes. Some of the key features are fine-grained broker configuration, scaling with rebalancing, graceful rolling upgrades, alert-based graceful scaling, monitoring, out-of-the-box mTLS with automatic certificate renewal, Kubernetes RBAC integration with Kafka ACLs, and multiple options for disaster recovery.