In my previous post, I described strategies for improving thread utilization in an IO-heavy environment. I will take a closer look at the thread-based asynchronous programming approach in this post. Whenever I say “blocked thread” in this discussion, I mean threads blocked/waiting on IO. This is the waste we are trying to get rid of — threads blocked on the CPU can only be unblocked by addition of more hardware. This strategy allows us to achieve massive system scale even when working with blocking code.

What is thread-based asynchronous programming

The thread-based asynchronous programming approach, also called “work-stealing” or “bulkheading”, allows one thread pool to hand over a task to another thread pool (let’s call it a work thread pool) and be notified to handle the result when the worker thread pool is done with the task.

From the perspective of the calling thread, the system now becomes asynchronous as all of its work on a single call path is not being done sequentially — it does something, then hands over IO related tasks to one or more worker pool, and then comes back to resume execution from that point onwards (having done some completely independent task in between).

Threads on the worker pool still gets blocked for IO, but now only the threads of this pool get blocked, thereby limiting the cost to the system. Other code paths of the system which do not involve IO activity are scaled-up by the freed caller thread. System throughput increases considerably because the calling thread doesn’t sit around waiting for IO to complete — it can perform other computations.

Calling thread gives its task to worker thread and is notified with the result on completion

A good analogy for understanding this behaviour is that of a checkout counter in a shopping mall. A small number of checkout counters are able to handle a large mall of visitors so long as not every one comes for checkout at the same time. Only a small number of workers (those at the checkout counter) are blocked on the checkout function — other workers are free to assist shoppers. How many shoppers could be accommodated in the mall if a worker had to be attached to a shopper from the moment they entered the mall till checkout?

A more technical analogy is that of a connection pool, e.g. database connection pool or TCP connection pool. In a service, we could have all threads that want to call another service create their own RPC connections (let’s ignore the connection creation cost) and fire their own API calls. However, so long as not all threads need to access the other service at the same time (i.e. there are other things the system has to do), we can create a small worker pool of RPC connection and funnel all API calls through them and free the calling threads of this blockage. By multiplexing the calls over this small thread pool, we can free up a lot of other threads more doing non-IO related work. This is exactly what happens when we use Apache Async HTTP client or others of its ilk.

Handling task completion

We have so far spoken about off-loading of work to worker threads. The other equally important aspect of the asynchronous model is the interrupt-based program execution pattern. Having offloaded its task to a worker thread, the calling thread needs to know where it was in its call path when it receives the result of the task from the worker. But tracking runtime state is a problem. Where is it to be kept and how?

This problem is typically solved by introducing callbacks or callback handlers which are methods to be invoked by the worker thread on completion of the task given to it. The calling thread registers these callbacks to the Future returned by the worker pool and the language/framework can now easily track and invoke them on the calling thread by issuing an interrupt to it (to get it to stop whatever it was doing) and instructing it to execute the relevant callback. The handoff look something like this:

Thread 1 calls worker pool to give it a task. Thread 2 in worker pool executes the task and invoke callback. Thread 1 gets an interrupts and switches to executing the callback

Different languages give different callback provisions, but variants of onComplete and onFailure are the most common. As the name suggests, these are invoked on success and failure of the task given to the worker pool.