Kafka vs RabbitMQ

what is Kafka, what is RabbitMQ, and what are the strength and weakness of each framework

Photo by Markus Spiske on Unsplash

Kafka vs RabbitMQ

There are countless articles on the internet comparing among these two leading frameworks, most of them just telling you the strength of each, but not providing a full wide comparison of features supports and specialties.

When I make a decision about which technology to choose, I always wish to see a comparison table, then I can quickly check what are the key features of my specific scenario, and get a decision.

RabbitMQ in a nutshell

Who are the players:

1. Consumer

2. Publisher

3. Exchange

4. Route

The flow starts from the Publisher, which send a message to exchange, Exchange is a middleware layer that knows to route the message to the queue, consumers can define which queue they are consuming from (by defining binding), RabbitMQ pushes the message to the consumer, and once consumed and acknowledgment has arrived, message is removed from the queue.

Any piece in this system can be scaled out: producer, consumer, and also the RabbitMQ itself can be clustered, and highly available.

Kafka

Who are the players

1. Consumer / Consumer groups

2. Producer

3. Kafka source connect

4. Kafka sink connect

5. Topic and topic partition

6. Kafka stream

7. Broker

8. Zookeeper

Kafka is a robust system and has several members in the game. but once you understand well the flow, this becomes easy to manage and to work with.

Producer send a message record to a topic, a topic is a category or feed name to which records are published, it can be partitioned, to get better performance, consumers subscribed to a topic and start to pull messages from it, when a topic is partitioned, then each partition get its own consumer instance, we called all instances of same consumer a consumer group.

In Kafka messages are always remaining in the topic, also if they were consumed (limit time is defined by retention policy)

Also, Kafka uses sequential disk I/O, this approach boosts the performance of Kafka and makes it a leader option in queues implementation, and a safe choice for big data use cases.

Let’s compare:

1. Distribution and parallelism

Both give a good distribution solution, but with some difference.

Let’s talk about consumers, in RabbitMQ, you can scale out the number of consumers, this means, for each queue instance you will have many consumers, this called competitive consumers because they compete to consume the message, in this form the message processing work is spread by all the active consumers, but still message can be procced only once.

In Kafka, the way to distribute consumers is by topic partitions, and each consumer from the group is dedicated to one partition.

You can use the partition mechanism to send each partition different set of messages by business key, for example, by user id, location, etc.

2. High Availability

Both solutions are highly available.

Kafka does it by using Zookeeper to manage the state of the cluster

RabbitMQ has provided clustering and highly available queues for several major versions. Version 3.8.0 shipped with “Quorum Queues” which use the Raft consensus algorithm to provide data replication with higher performance than “classic” HA queues.

3. Performance

Kafka leverages the strength of sequential disk I/O and requires less hardware, this can lead to high throughput: several millions of messages in a second, with just a tiny number of nodes.

RabbitMQ also can process a million messages in a second but requires 30+ nodes.

4. Replication

Kafka has replicated the broker by design, and if the master broker is down, automatically all the work is passed to another one which has a full replica of the died one, no message lost.

In RabbitMQ queues aren’t automatically replicable, this need to be configured. (Version 3.8.0 simplifies this. If your cluster has at least three nodes, all you must do is declare your queue as a Quorum Queue and replication is taken care of for you)

5. Multi subscriber

In Kafka messages can be subscribed by multi consumers, means, many consumer types not many instances of the same one.

In RabbitMQ messages can be routed to numerous queues depending on the exchange type (such as fanout or topic) and the queue bindings. In each queue, only one consumer of that queue can process the message, but if the message goes to multiple queues it can be processed by multiple consumers.

6. Message ordering

Because Kafka has partitions, you can get messages ordering in this unit.

messages are routed to topics by message key, so, when choosing a correct key, you will get one partition for any key, with ordered messages.

This can’t be achieved in RabbitMQ, only by trying by mimic this behavior by defining many queues and sending each message to a different queue, at scale, this can be hard to get.

compaction log: if the same message key has arrived multiple times, then Kafka saves only the last value in the log, and delete old messages.

7. Message protocols

RabbitMQ supports any standard queue protocols like AMQP, STOMP (Text-based), MQTT (lightweight publish/subscribe messaging) and HTTP, while Kafka supports primitives (int8, int16, int32, int64, string, arrays) and binary messages.

8. Message lifetime

Because Kafka is a log, messages are always there, you can control this by defining a message retention policy.

RabbitMQ is a queue, messages removed once consumed and acknowledgment arrived.

In RabbitMQ you can configure messages to be persistent, mark the queue as durable and messages as persistent.

9. Message acknowledgment

In both frameworks, producers get confirmation that the message arrives in the queue/topic and also the consumer sends an acknowledgment when the message consumed successfully. so you can be sure that messages didn’t get lost in the way.

10. Flexible routing to a topic/queue

In Kafka message is sent to a topic by key, in RabbitMQ there are more options, for example by regular expression and wildcard, check the docs for more information.

11. Message priority

In RabbitMQ, you can define message priorities, and consumed message with high priority first. for more information look in https://www.rabbitmq.com/priority.html

hard to achieve in Kafka (can be done by message keys, but on large scale, this can be hard)

12. Monitoring

In Kafka you have 3rd party tools:

License for a production environment

Confluent https://www.confluent.io/product/control-center/

Landoop http://www.landoop.com/

free

Burrow https://github.com/linkedin/Burrow

Kafka Tool http://www.kafkatool.com/

In RabbitMQ you have a built-in management UI (default <host_name>:15672).

Also, note that version 3.8.0 supports external monitoring via Prometheus and Grafana

13. Transaction support

Both support atomic writes, means if you write a bunch of messages to queue and one failed, all the transaction is rollbacked, this extremely used in Kafka stream processing.

Let’s recap

Use Kafka if you need

Time travel/durable/commit log

Many consumers for the same message

High throughput (millions of messages per second)

Stream processing

Replicability

High availability

Message order

Use RabbitMq if you need:

flexible routing

Priority Queue

A standard protocol message queue

Conclusion:

Actually, RabbitMQ is enough for simple use cases, with low traffic of data, you have certain benefits like a priority queue and flexible routing options. But for massive data and high throughput use Kafka without debates.

If you need a commit log or multiple consumers for the same messages, then go to Kafka because RabbitMQ can’t assist you with it.