Executive Summary

Kafka is emerging as a dominant messaging platform in microservice environments. Whereas one-to-one, request/response-style communications are usually handled using direct, synchronous, service-to-service communication (e.g., REST, grpc), one-to-many/many-to-one asynchronous communications are better handled using a Pub/Sub messaging platform like Kafka. Today’s typical microservice deployments involve ephemeral Kafka clients (producers and consumers) running in dynamic container orchestration environments such as Kubernetes, Marathon, or Docker Swarm. Kafka servers usually serve as an external data service that its clients use for asynchronously communicating data (e.g., SMACK stack).

Securing Kafka in such microservice environments is very challenging: it involves not only controlling access to sensitive data-at-rest in Kafka, but also securing and auditing data flowing between short-lived producers/consumers through Kafka servers. Kafka natively provides basic security features such as Java SSL, Kerberos/SASL, and simple ACLs. However, as we move into dynamic microservice environments with multiple tenants and clusters, native mechanisms can be challenging to operationalize and inadequate for compliance in regulated environments. Organizations are on their own to ensure end-to-end encryption between producers and consumers, enable multi-factor authentication, segment access to Kafka, setup secure key management (e.g., PKI), handle weak/leaked credentials including revoking access to compromised entities, setup fine-grained role- and time- based access controls, meet ever-changing audit and compliance requirements, and ensure developers spanning multiple teams are accessing Kafka following security best practices.

To address these challenges, we introduce a new approach using sidecars (aka micro-engines) that secures Kafka transparently, without needing any changes to Kafka clients, servers, or the underlying platform. Our approach has also been advocated by the recently announced open service mesh platform, Istio, from Google/IBM/Lyft albeit in the context of Kubernetes services. Whereas sidecars are primarily deployed in container-based green-field environments from the ground-up for operational needs like load balancing, this article focuses on deploying them in both container and non-container (process) based brown-field environments for security, and tailored to Kafka setups. Sidecars (can be process or container) are deployed alongside Kafka components including producers, consumers, brokers, and zookeeper servers. The sidecars authenticate each component using multiple factors (e.g., service account, metadata, etc.) and assign them cryptographic identities, intercept all connections and transparently upgrade them to mTLS, and exchange identities for fine-grained topic-level time-based access controls. In addition, to guarantee end-to-end security between producers and consumers, data-at-rest is also encrypted and can only be accessed by authorized consumers. All of this is accomplished without changing a single line of code or config in any Kafka component.

In this article, we demonstrate how such an approach can be leveraged for protecting Kafka to provide superior security, strong authNZ, higher TLS performance, and several operational benefits compared to using just the native Kafka security. In particular, this approach provides:

State-of-the-art secure transparent encryption between all Kafka components independent of Java versions on the servers and language limitations on the clients

between all Kafka components independent of Java versions on the servers and language limitations on the clients End-to-end producer-to-consumer encryption and access control including when data is at rest

including when data is at rest Segmentation between Kafka components and/or other applications without punching firewall holes

between Kafka components and/or other applications without punching firewall holes Multi-factor identity and strong authentication using decorated X.509 certs for clients, brokers, and zookeeper

using decorated X.509 certs for clients, brokers, and zookeeper Higher than native mTLS performance made possible by using high-speed TLS libraries, ciphersuites and other optimizations

made possible by using high-speed TLS libraries, ciphersuites and other optimizations Secure, simplified PKI infrastructure using short-lived certs (rotated every few mins), secure key management (e.g., keys not exposed on disk), and secure bootstrapping

using short-lived certs (rotated every few mins), secure key management (e.g., keys not exposed on disk), and secure bootstrapping Transparent encryption between zookeeper servers (zookeeper has no native support yet)

(zookeeper has no native support yet) Independent audit of accesses and policy revisions for fast-changing producers/consumers that are hard to track

of accesses and policy revisions for fast-changing producers/consumers that are hard to track Fine grained access controls such as leased access to clients for only approved topics, during specific times, etc. using role/attribute-based access controls (RBAC/ABAC)

such as leased access to clients for only approved topics, during specific times, etc. using role/attribute-based access controls (RBAC/ABAC) Deep visibility (e.g., topic, consumer group, etc.) into network traffic while preserving end-to-end encryption between client and server

(e.g., topic, consumer group, etc.) into network traffic while preserving end-to-end encryption between client and server Separation of application development from security considerations , allowing developers to just focus on application logic and velocity

, allowing developers to just focus on application logic and velocity Homogeneous controls across multiple services, not limited to just Kafka

Our evaluation results show that for typical microservice deployments, where the number of concurrent connections is high (>=64) and the record sizes are small (<= 1KB) our system provides huge (~200–300%) performance improvement over native TLS implementation both in terms of Throughput and Response Times using the latest supported Java version (1.8.x) in Kafka.

Surprisingly, even with just one connection, we saw up to ~300% throughput improvement, making this a broader problem than just in microservice environments.

The performance benefits are primarily due to using high-performance Banyan sidecars written in Go and limitations in native Java SSLEngine compared to Go crypto/tls (details in Performance Section). We expect to get similar performance results (perhaps better?) by extending other high-performance sidecars like Envoy, which is written in C++ and uses OpenSSL/BoringSSL TLS library. Looking ahead, on a preliminary port of Kafka to the pre-released Java 1.9 (1.9+181) the performance gap is narrowed, but still substantial (up to 36%). Although there is CPU cost associated with using sidecars (~15% for 32MB/s and 64 concurrent connections), the security, performance, and operational benefits provided by this approach easily outweigh the CPU overhead for most deployments.