It’s a mouthful, I know!

I am building an asynchronous distributed system that serves as a backend for a mobile app. The system does a lot of processing and communicates with couple of external components. From the start, I decided to design the system as a choreography of micro-services. But other than this basic constraint, I decided on iterative approach to the design. Starting with something simple and evolving it over time…

In this article, I will walk you through couple of challenges I had designing the system and how I choose to solve it. To me, it was interesting to see how the system started coming together and how patterns started to emerge. Because I designed the system from scratch adding layers of complexity over time, I feel I now have better grasp of some of the concepts and patterns in distributed computing and I would like to share that experience with you. This is a first article that focuses on system resiliency but I plan on writing several more, covering various aspects of the whole system.

Before we get started, let me briefly cover the difference between choreography and orchestration:

Choreography vs Orchestration

Service orchestration represents a single centralized executable business process (the orchestrator) that coordinates the interaction among different services. The orchestrator is responsible for invoking and combining the services.

Service Orchestration — curtesy of user Andrei https://stackoverflow.com/a/29808740

Service choreography is a global description of the participating services, which is defined by exchange of messages, rules of interaction and agreements between two or more endpoints. Choreography employs a decentralized approach for service composition.

Service Choreography — curtesy of user Andrei https://stackoverflow.com/a/29808740

The Problem

The problem arose when I decided to switch out an external API to my own as a cost saving measure and to better cater to my use case. Creating the new API was a challenge of it’s own (and well worth a separate article — if you ever wonder when you are going to use all those fancy algorithms they taught you in your Computer Science class, you will want to read this!) but in this article I want to focus on a different, but nevertheless equally interesting problem.

This new API is a self contained system (set of two micro-services) that provides an interface for the main distributed pipeline to query new processing and receive results. Processing itself is a very resource intensive task — single request requires several GBs of RAM and hefty amount of CPU time.

In the first “MVP” version of the system I did not care much about the resource consumption which quickly led to the service crashing when several requests were being processed at the same time.

Most obvious measure — that I chose not to implement at this time — is to scale out. You know: stick a load balancer in front, and scale (automatically or manually) the number of machines that run the service. My service is stateless, so there would be no need to implement session affinity.

But I chose not to implement (auto-)scaling primarily to save cost. This subsystem is expensive as it is with a single node.

There were additional constrains on what the system needed to support so it ended up requiring a custom Docker image running on Ubuntu distro — and Infrastructure-as-a-Service is never a cheap option. At this phase in the project, we don’t need to scale.

First measure I took to remedy the problem was to introduce and internal request queue and throttling mechanism. It uses a notion of “processing tokens” — essentially a number that represents number of requests that can process in parallel on the system and this number is configurable. If I decide to scale up my machine, I can grant more tokens, if I scale down, I can decrease the number of tokens.

Ok, so now that we introduced the internal throttling that self-regulates the system and prevents it from running out of memory we have introduced, or rather accentuated, another problem. Our processing pipeline is making synchronous requests to this API and waiting for a response. With the queue now in picture, request might queue up until processed. Now, it becomes pretty typical for a request to be processed past the original request timeout period. Making it seemingly fail to the calling system. And even though we provide successful response eventually, the calling system already moved on handling the response as failed (most likely retrying, further increasing load on the API).

Asynchronous Processing

The logical next step was to switch to asynchronous invocation of this API.

Asynchronicity is often advocated as the best default in distributed systems as it provides de-coupling, because any message can be sent independently of the availability of the receiver. The message will get delivered as soon as the service provider becomes available.

In my case, I don’t have to worry about time limit for the overall transaction. Of course, I still care deeply about the transaction being processed as soon as possible, but there is no inherent problem with delay caused by a timeout and retry of an operation within the system.

To implement asynchronous processing, I introduced a set of parameters that allow the calling system to provide a messaging queue URI (with short-lived access token for increased security) — queue, and set of arbitrary parameters that the API will just repeat back to the queue along with the result of it’s operation when it’s ready — queueMetadata.