The Problems

The mechanism worked pretty well for some time. But as the systems kept growing, HTTP calls and the cronjobs started to show age. Most of the common issues encountered were:

Higher resolution times i.e. the tasks were not picked up immediately and the crons could have been running dry when there is no data to process

i.e. the tasks were not picked up immediately and the crons could have been running dry when there is no data to process Crons are per-machine tasks i.e. if you are going to run them on multiple servers, you are going to need some sort of locking mechanism which in return is going to need maintenance, for example cleaning up the stale locks from the abnormally failed crons etc. So we ended up having a lot of boilerplate code to manage even small tasks.

i.e. if you are going to run them on multiple servers, you are going to need some sort of locking mechanism which in return is going to need maintenance, for example cleaning up the stale locks from the abnormally failed crons etc. So we ended up having a lot of boilerplate code to manage even small tasks. Error Handling i.e. in the cases when the command fails during the process or there are some items that are never picked up

i.e. in the cases when the command fails during the process or there are some items that are never picked up No good way to monitor i.e. where and how the things are going or to track the load at the system at any point of time

i.e. where and how the things are going or to track the load at the system at any point of time System level task rather than an application level task. Developers should not care where the application is run. With crons being a system level task, it was difficult for developers to debug or to test

Developers should not care where the application is run. With crons being a system level task, it was difficult for developers to debug or to test Difficult to debug. Have a look at this stackexchange thread.

Choosing RabbitMQ

Keeping in view these issues, we started looking for the alternatives with at least the below listed items being in our wishlist

Lower the resolution time

Developer friendly and powerful

Easy to test and monitor

Easy to scale without any boilerplate code

Easy to handle errors

The idea was to make our systems easier to scale by converting mainly the state-flow and some other key functionalities to be handled through some queuing mechanism instead of the cronjobs.

We did some research and looked at the available alternatives. After comparisons and some thorough research, it came down to either using Apache Kafka or to go with RabbitMQ and we finally decided to go with RabbitMQ; mainly because of the reasons listed below –

Easier management and monitoring because of the available HTTP API, command line tool, and the admin panel

because of the available HTTP API, command line tool, and the admin panel Wide variety of available tools and plugins. There is a large number of tools and plugins available to extend what RabbitMQ does

There is a large number of tools and plugins available to extend what RabbitMQ does Ability to write and extend the functionality using custom plugins

using custom plugins Better Developer experience

Flexible routing and multiple messaging protocols etc

Using RabbitMQ

Before I go and explain how RabbitMQ fits into our architecture, let me put below a rough diagram of how RabbitMQ works

Have a look at this article if you would like to understand the different parts shown in the diagram above.

In the next sections, I will discuss how we have put it all together to make it work for us. First of all, I will be discussing how we are defining the required structures, then we discuss how the publishers are set to dispatch the messages, after that we will be discussing how the consumers consume these messages and then I will end the article with the bigger picture of how all of it is put together.

Defining Virtual Hosts, Exchanges, Queues, Policies and Users

Before you start using RabbitMQ, you need to define all the key pieces i.e. vhosts, exchanges, queues, policies and users. There are multiple ways to define them

Do it from the admin panel of RabbitMQ

Do it from the CLI

Use the HTTP API provided by RabbitMQ

We wanted something flexible so we ended up using the API. We have an internal tool called ops-tool where we have defined all the vhosts, exchanges and queues in the form of a folder structure. Whenever the developers have to update the virtual hosts, users, exchanges, queues or policies, all they have to do is open that tool, modify or add to the folder structure and run the command and it will automatically do the necessary modifications using the API. Diagram below shows part of the directory structure that is turned into the vhosts, exchanges and queues.

The command is idempotent; evaluates the difference and only applies the provided changes without affecting any existing structure. Plus, it is registered in the docker machines used by developers so that the changes are replicated on the developer machines as soon as they are made.

Standardizing the Message Formats

With the lots of applications and the messages coming from many different publishers, it was of the paramount importance that we standardize the message formats. In order to do that we created a standard package called Rabbit which helps the publishers connect to the broker and generate the standard messages to be sent. All the messages have to be defined in the form of classes in this package and messages must be dispatched by creating instances of these classes. Any application that wants to connect to RabbitMQ, it must have this package installed and the message formats defined.

Consuming the Messages

On the consumers side, we decided to split the commands from the actual projects and thus created associated cli applications for each of the products. For example, flight-api was divided into two applications i.e. flight-api and flight-api-cli, hotel-api divided into hotel-api and hotel-api-cli and so on.

CLI applications or consumers are just the simple commands that receive the messages and dispatch async requests using FPM to the relevant handler for that message.

All the Pieces Combined

The image below shows how all the pieces are put together.