Sizing for Throughput

Producer/Consumer throughput

Before trying to size your Event Hub Cluster, the question to ask would be what is the expected throughput of your Producer(s) and your Consumers(s). Your system throughput would be as fast as your weakest link.

What is the rate at which the producers are going to be producing ?, What is the message size of the produced messages ? How many consumers would be consuming from the Topics/Partitions and at what rate would they be able to process each message given the message size.

Number of Brokers (and Zookeepers)

Increasing the number of brokers and configuring replication across brokers is a mechanism to achieve not just parallelism and higher throughput but also to achieve high availability. HA may not be a factor when running Dev and Test on Event Hub in which case the need to deploy multiple brokers may not exist. However, it is strongly recommended to deploy multiple brokers ( 3+).

ZooKeeper plays a critical role in the Broker cluster management — keeping track of which brokers are leaving the cluster and which new ones are joining, leader election and configuration management. This makes it necessary to ensure that Zookeeper is also deployed in a HA cluster.

The recommendation for sizing your Zookeeper cluster is to use 1 instance for Dev/Test Environments, 3 instances to plan for 1 node failure and 5 instances to plan for 2 node failures.

CPU Shape of your Broker(s)

As of writing this post the available options on Oracle Cloud for EventHub Cloud Service for CPU Shapes are :

OC1m — 1 OCPU, 15 GB RAM

OC2m — 2 OCPU, 30 GB RAM

OC3m — 4 OCPU, 60GB RAM

OC4m — 8 OCPU, 120 GB RAM

An OCPU provides CPU capacity equivalent to one physical core of an Intel Xeon processor with hyper threading enabled. Each OCPU corresponds to two hardware execution threads, known as vCPUs.

This is a sample benchmarked throughput:

Configuration:

Producers/Consumers: 10/10

Topics: 10

Partitions: 10 per Topic,

Replication factor: 3

Broker Nodes: 5 (OC3m)

Zookeepers: 3 (OC1m)

No compression

acks=1

Note: This throughput is achieved via Kafka Native APIs. REST API throughput is known to be much less at around 1/4 of the Native Kafka API throughput. The test is run from Producers and Consumers in VMs in Oracle Cloud — with minimum latency between VMs.

Sizing for Topics and Partitions

The decision to choose the #partitions depend on the desired throughput and the degree of parallelism that your Producer / Consumer ecosystem can support. Generally speaking, increasing the # of partitions on a given topic , linearly increases your throughput — however the throughput bottleneck could end up being the rate at which your Producer can produce or the rate at which your consumers can consume.

The simple math used by Kafka users and published in a few Kafka blogs is as below:

Lets say the desired throughput is “t”. Max Producer throughput is “p” and max consumer throughput is “c”.

# of Partitions = max ( t/p, t/c).

A rule of thumb often used is to have at least as many partitions as # of consumers in largest consumer group.