In your journey to get away from monolithic applications and start streaming data processing, you’ll undoubtedly have to compare three solutions that have each tackled the distributed messaging problem in different ways. Kinesis, Kafka, and RabbitMQ all allow you to build your microservices applications. But which should you choose?

Kinesis, created by Amazon and hosted on Amazon Web Services (AWS), prides itself on real-time message processing for hundreds of gigabytes of data from thousands of data sources. Apache Kafka was developed by the fine folks over at LinkedIn and works like a distributed tracing service despite being designed for logging. Lastly, RabbitMQ is a general-purpose message broker that can be designed to fit any distributed tracing need but comes with a slightly steeper learning curve.

If you need a refresher on what distributed systems are, or if you aren’t quite sure how to distribute messages effectively in your serverless application, you may want to go back and read a few of the articles that we’ve published on the topic so far.

We will discuss the features, pros, and cons of the three services in question and also present use cases to help determine if the technology in question lines up with your business goals. We will rate each service per four different topics:

Ease of Getting Started

Scalability

Managed vs. Unmanaged

Maintenance Complexity

Ratings will be from 0 to 2, with the winner being the one with the fewest points in the end.

Ease of Getting Started

Here, testing involves measuring the time required to go from the home page of each service to installing it, configuring it, and getting a “Hello, World” message up and working. We should also be able to tell how much more effort would be required to get beyond a “Hello, World” example.

Kafka

With Kafka, an example was up and running in under 15 minutes. This is the total time it took to download, install, start the “zookeeper,” and send and receive a message. Overall, Kafka was impressively simple and easy to use.

The downside, however, is that in order to create a production-ready environment, you’ll need to consult additional documentation or guides in order to fully understand how Kafka works. This vastly increases the time from “Hello, World” to test-environment standup.

RabbitMQ

RabbitMQ (using the Pika Python library) was more in line with initial expectations for getting started. It required a bit more scripting, as sending messages via the Python terminal did not work out. Instead, two Python scripts (“send.py” and “receive.py”) had to be created to handle the messages.

However, these scripts can also be easily modified to handle a whole host of different payloads, making RabbitMQ the easiest to get beyond a simple “Hello, World” message to something usable in a real-world scenario. Total time spent on RabbitMQ? 45 minutes.

Kinesis

Amazon Kinesis is cloud-hosted, so you may think that it would be much easier to get up and running. No downloading or installing required–just point your script at the appropriate Amazon Resource Number (ARN) and be on your way. But in the end, Kinesis seemed to have more steps than the other services.

To make testing as true to a production scenario as possible, you need to make the appropriate AWS Identity and Access Management (IAM) roles to handle the messaging (not use a root AWS account). So you don’t just have to create the scripts and store them somewhere that the public cloud can access.

You also need to create the IAM roles and give those roles the appropriate permissions to see, touch, and execute each piece of the cloud infrastructure. After creating an IAM role, a Lambda function, a Kinesis stream, and a Lambda Event Source, you should be able to get the “Hello, World” text to come through.

Much like with RabbitMQ, some simple modifications could allow this to become a much more robust example. But for most, the initial time spent setting up Kinesis is a drawback that will probably outweigh any perceived benefit gained. All in all, it took 1.5 hours to get Kinesis going.

Ease of Getting Started Score:

Kafka – 0

RabbitMQ – 1

Kinesis – 2

Scalability

You may have a limited testing environment and cannot scale this out to what it would look like in a real-world scenario. But you can get an idea of the scale achieved by others by looking at published white papers and the kinds of software that are built on each of these services.

Kafka

For Kafka, Apache boasts 100,000+ messages processed per second. Combine this with the fact that it combines the Publisher/Subscriber and Shared Message queue strategies (utilizing “consumer groups”), and you have a robust system that can send and receive messages across numerous domains and services. Also, consumer groups and the Kafka architecture can be modified to achieve better performance based on the number of servers in your cluster as well as the number of consumer groups you’re attempting to provide messages to.

RabbitMQ

RabbitMQ is even more impressive, boasting one million messages per second when deployed as a cluster. One concern, however, is that RabbitMQ is bound by a single Erlang process, meaning that it can scale only as far as you have CPUs to process the messages. With cloud platforms allowing you to scale ad infinitum, you can technically solve the RabbitMQ scalability “issue” by throwing more money at it and adding more servers to the RabbitMQ cluster–perhaps not the best answer, but a simple one.

Kinesis

Finally, Kinesis is built into a cloud platform-as-a-service solution, so you have limited visibility into how it all works. You have to trust that AWS is doing its best to provide the most optimized service while also being beholden to an auction system for CPU time. Simply due to this lack of visibility and the fact that you can’t tweak its performance, Kinesis gets the lowest mark for this topic.

Scalability Score:

Kafka – 1

RabbitMQ – 0

Kinesis – 2

Ease of Maintenance

Maintenance complexity is tricky. Each service has slightly different requirements, and complexity really depends on the team in question and its particular needs and skills. To keep this category as unbiased as possible, we’ll look at routine tasks that need to be performed, non-routine or “break/fix” tasks that may come up, and the requisite skills for being able to maintain each service in-house.

We will then also briefly discuss managed vs. unmanaged options for each.

Maintenance Complexity

Kafka

For Kafka, familiarity with Apache’s Java Virtual Machines (JVMs) is a plus but not a requirement. JVMs can be tricky in their own right, and each requires a little bit of tweaking in order to smoothly get it up and running.

Cluster management becomes the most important aspect of Kafka since clustering medium-sized machines is necessary to reach scale. As a JVM, Kafka has a large overhead just to run the program, so smaller servers have diminishing returns. Resource mismatch is also a problem for larger boxes, since Kafka fluctuates between needing large amounts of processing power and large amounts of RAM. Several medium-sized servers in a cluster can handle the load more easily than a smaller number of large servers.

RabbitMQ

Cluster management is a crucial aspect for RabbitMQ. But the clustering process is much easier, the configuration is much more manageable, and the status messages for administrators are much more clear and concise. RabbitMQ simply lends itself to orchestration more easily due to its simplicity.

Kinesis

Finally, Kinesis management is really more about AWS management since you’ll be utilizing various AWS services to provide the inputs and outputs from Kinesis. Assuming you have a few AWS engineers on your team, this will be an easy feat. But without them, it will be a tough slog.

Maintenance Complexity Scores:

Kafka – 2

RabbitMQ – 0

Kinesis – 1

Managed vs. Unmanaged

How “hands-off” can you be with each of the above products? All three come with an option to have a company manage the service for you. If you do decide to take on infrastructure management yourself, each service behaves slightly differently.

Managing Kafka and RabbitMQ yourself means you’ll need to provision servers, configure the service, maintain hardware, architect high availability, manage storage and backups, set up alarms and monitoring, and plan for load changes. A managed service provider can relieve you of any or all of the above duties. Since Kinesis was designed and maintained by AWS, it is technically a managed service in and of itself–all you have to do is make sure your inputs can be processed by its queue.

The scores for manageability are as follows:

Kafka – 1.5

RabbitMQ – 1.5

Kinesis – 0

Decision Time

We’ve made it to decision time! You can review all scores in the table below. Remember, we’re going for “lowest score wins” in this scenario:



The choice of a distributed messaging service will depend on each organization’s unique needs. These rankings may not have factored in the appropriate tests or weights that would be necessary for deployment in your given environment, as these observations may differ per the possibilities within your company’s specific use case.

Conclusions reached here could also change with any future developments by each service discussed. RabbitMQ is the clear winner here. RabbitMQ is indeed very flexible but also limited by its single-process architecture. Kafka is by far the easiest to set up and get started with, but fleshing out a robust solution may take a bit more work than the “Hello, World” example lets on.

Kinesis is great for the programmer who wants to develop their software without having to mess with any troublesome hardware or hosting platforms. But locking into AWS with Kinesis may create more hoops to jump through than it’s worth.

If you’re building a project and team from the ground up and are looking to utilize a distributed tracing service, then these scores can serve as a good measure to better understand distributed tracing in terms of the services available to you, what they have to offer, their limitations, and their challenges.

To start a free trial with Epsagon, click here.

Read more on distributed apps:

Debugging Distributed Tracing Using logs

Common Design Patterns in Distributed Architectures