Multi-tenancy and fairness in the context of microservices. An introduction to the concept of sharded queue.

Managing a job queue in a multi-tenant environment puts some uncommon requirements, this is how we handled them

THRON is a multi-tenant SaaS, and one of the challenges in this kind of systems is to be fair with respect to the tenants.

It’s based on a microservices architecture where most of the components do their job in an asynchronous fashion. In this article we’ll talk about the concept of sharded queue, how we are using it to manage prioritization among different tenants on shared resources, and more generally how and why we built our internal queuing library.

The problem to solve

We introduced the sharded queue in our architecture in order to give an answer to the following question: which job-to-be-done should a consumer pick up, in order to guarantee fairness?

To understand how this works we use an example: the queue for content conversion service. Every time a file is uploaded, it has to be processed to create multiple formats, extract semantic information and perform other pre-processing tasks. Let’s just focus on the content conversion part.

Simplifying, there is a conversion-service that continuously picks up and processes a conversion-to-be-done from a conversions-to-be-done-queue.

When a user uploads an image, some conversion-to-be-done will be put in the queue. Sooner or later the conversion-service will pick them up and process them.

From a cost management perspective it’s important to consider that the conversion-service pool is usually limited, so we might experience cases where the tasks to be performed are more than the available instances to perform them.

What happens when a single tenant A is uploading a great number of images while the other tenants B and C are just uploading a few images?

If you use a simple queue (FIFO), B and C might be unlucky and have their jobs queued after the huge amount of A jobs.

In this scenario we are breaking the fairness requirement, and this is exactly what we want to avoid.

The single task of tenant A and the 2 tasks of tenant B are queued behind the 4 tasks of tenant C

Assuming we could infinitely scale the conversion-service, we could just process everything at once, but unfortunately this is unpractical in the real world. It creates an unpredictable upper bound of costs and we would incur in some scalability limitation sooner or later anyway, such as bandwidth or write throughput limits.

We might also want to be able, in the future, to throttle conversion-service throughput per tenant depending on business needs (business model, different type of subscriptions, etc).

This also applies to those cases where content import is not time-critical. Think about migrating existing multi-Terabyte archives into our product, sometimes it’s reasonable to complete this task in a few weeks, so why stress the infrastructure and costs to go “full throttle”?

Our goal

We aim to achieve fairness without relying on ad-hoc logic in the conversion-service (so that we can apply this solution to any producer/consumer that requires tenant-fairness), and this can be done with the concept of sharded queue, depicted in the figure below.

From the conversion-service perspective, there is just one queue: the conversions-to-be-done-queue. Behind the scenes, this queue corresponds to multiple shards that can be seen as different queues, one per tenant. When the conversion-service dequeues a job from the conversions-to-be-done-queue, it is actually dequeuing a job from one of the shards in a round-robin fashion.

Sharded queue composed of three shards: one for tenant A, one for tenant B and one for tenant C

In other words, if the queue contains messages regarding n shards and we dequeue n jobs, we get jobs of n different tenants, regardless of the order the different tenants queued the jobs in and of the number of jobs per tenant.

Logical view of the shared queue as a single queue

Main requirements summary:

fairness : we want to avoid that one tenant behavior affects other tenants in any way;

: we want to avoid that one tenant behavior affects other tenants in any way; scalability : it should be possible to handle high load and scale according to the number of messages;

: it should be possible to handle high load and scale according to the number of messages; delivery guarantees: we want to achieve at-least-once delivery semantic: we can tolerate duplicated deliveries, but we can’t afford to lose any.

Buy or make?

We tried to look for something ready-to-use, but we didn’t find anything fulfilling our requirements.

It’s important to say that we are not trying to reinvent the wheel, we are just adding a layer of abstraction on top of existing and well-tested building blocks, and making it available as a library to all of our internal engineering teams.

Let’s see how it’s implemented and which building blocks we decided to use.

How we implemented it

At high-level the idea is very simple, there is a circular queue containing the shard ids and a queue per shard. When you want to dequeue a message, you just pick the next shard id from the circular list and dequeue a message from the corresponding queue. This is what I meant when I said we are not reinventing the wheel, but adding a layer of abstraction on top of existing building blocks: we are implementing a round robin mechanism on top of existing queues.

We planned to start with a working “proof of concept”, so we basically achieved the first implementation using MongoDB because it’s a tool we knew well and it was a good choice to achieve the goal with very little effort.

But the MongoDB-based solution was not suited for all our cases, mainly because we didn’t want to add additional load to existing clusters or add new clusters and affect our current license.

So we started exploring alternatives: the main ones we scouted were SQS, DynamoDB and Redis.

DynamoDB and SQS were attractive because of their as-a-service and scalable nature, but their limits were too strict, for example:

SQS ListQueues returns max 1000 queues, but we need to manage more than 1000 tenants;

SQS messages can remain in the queue for 14 days max but in our experience we had tasks being stuck in the queue for more, especially when investigating an issue;

it is not possible to get an SQS message by id, it’s a queue and you can only enqueue/dequeue. That’s fine but we also wanted an “administrator console”, for example to check whether a message is in the queue, its status, etc;

at the time of the scouting DynamoDB didn’t have transactions so, for example, it wasn’t possible to atomically update the “message in the queue” and “the counter of messages in the queue”.

Maybe it might have been possible to implement our sharded queue on top of some AWS service, but we estimated it to be more limited and time consuming compared to a Redis implementation, so in the end we did choose to implement it on top of Redis.

One of its main use cases is queues, it’s very easy to implement them with the data structures it provides. Moreover, in addition to the queues, it can be used to store the support data structures that we need, such as counters.

On the downside, it does not scale “automagically” so we would have to manually manage the scaling needs (adding/removing nodes to the cluster). In our workload, given the performance Redis has, we realized we can easily size a cluster that would sustain our growing needs for several months, so it was fine to accept this compromise.

Regarding the at-least-once delivery semantic, we are achieving it using exactly the same approach as Amazon SQS. When a consumer dequeues a message from the queue it doesn’t get removed from the queue, because we can’t be sure that he’s actually able to receive and correctly process the message. It’s up to the consumer to delete explicitly the message after he considers it processed. To avoid giving the same message to other consumers while it has been dequeued but it’s still in the queue, it gets marked as invisible for a limited period of time (concept of visibility timeout).

With this approach, we are ensuring that we don’t lose messages because they get removed from the queue only when someone explicitly declares he has actually received and processed them. On the other hand, a consumer could process a message but fail before being able to remove it from the queue. In this case the message will return visible and then processed again. Hence, we have at-least-once delivery semantics.

What we are happy about

the concept of sharded queue allows us to decouple business logic from the non-functional requirement of fairness;

by making it available as a library to any internal team we avoid having different teams providing different solutions to the same problem;

we gained some experience on Redis, that we already reused in other contexts given it’s a sort of “swiss army knife”;

Open points

at the moment it’s a java/scala library, so it can be used only from languages that run on the JVM;

cannot be applied, in its current state, to long-running tasks. For example a single video conversion task can take up to a few hours, so if all the underlying resources are allocated to long-running conversions of a single tenant and another tenant needs a conversion, he needs to wait possibly for hours.

Future evolution

understand if it would be useful to release it as an open source project;

we are thinking about exposing the API as a RESTful service so that it can be used independently of the language/platform of choice;

understand if it’s possible to add an implementation on top of DynamoDB, recently it added support for transactions so we have to review our technical analysis. The interesting thing is that it has a different cost model compared with our current implementation: per-message cost vs fixed cost (the cost of the cluster).

How are you tackling this issue? Any suggestion about tools or approaches we might have overlooked?