On Kotlin Coroutines and how concurrency is different from parallelism

The official docs describe Kotlin Coroutines as a tool "for asynchronous programming and more", especially are coroutines supposed to support us with "asynchronous or non-blocking programming". What exactly does this mean? How is "asynchrony" related to the terms "concurrency" and "parallelism", tags we hear about a lot in this context as well. In this article, we will see that coroutines are mostly concerned about concurrency and not primarily about parallelism. Coroutines provide sophisticated means which help us structure code to make it highly concurrently executable, also enabling parallelism, which isn't the default behavior though. If you don't understand the difference yet, don't worry about it, it will get clearer throughout the article. Many people, I included, struggle to make use of these terms correctly. Let's learn more about coroutines and how they relate to the discussed topics.

(You can find a general introduction to Kotlin coroutines in this article)

Asynchrony - A programming model

Asynchronous programming is a topic we've been reading and hearing about a lot in the last couple of years. It mainly refers to "the occurrence of events independent of the main program flow" and also "ways to deal with these events" (Wikipedia). One crucial aspect of asynchronous programming is the fact that asynchronously started actions do not immediately block the program and take place concurrently. When programming asynchronously, we often find ourselves triggering some subroutine that immediately returns to the caller to let the main program flow continue without waiting for the subroutine's result. Once the result is needed, you may run into two scenarios: 1) the result has been fully processed and can just be requested or 2) You need to block your program until it is available. That is how futures or promises work. Another popular example of asynchrony is how reactive streams work like as described in the Reactive Manifesto:

Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensures loose coupling, isolation and location transparency. [...] Non-blocking communication allows recipients to only consume resources while active, leading to less system overhead.

Altogether, we can describe asynchrony, defined in the domain of software engineering, as a programming model that enables non-blocking and concurrent programming. We dispatch tasks to let our program continue doing something else until we receive a signal that the results are available. The following image illustrated this:

We want to continue reading a book and therefore let a machine do the washing for us.

Disclaimer: I took this and also the two following images from this Quora post which also describes the discussed terms.

Concurrency - It's about structure

After we learned what asynchrony refers to, let's see what concurrency is. Concurrency is not, as many people mistakenly believe, about running things "in parallel" or "at the same time". Rob Pike, a Google engineer, best known for his work on Go, describes concurrency as a "composition of independently executing tasks" and he emphasizes that concurrency really is about structuring a program. That means that a concurrent program handles multiple tasks being in progress at the same time but not necessarily being executed simultaneously. The work on all tasks may be interleaved in some arbitrary order, as nicely illustrated in this little image:

Concurrency is not parallelism. It tries to break down tasks which we don't necessarily need to execute at the same time. Its primary goal is structure, not parallelism.

Parallelism - It's about execution

Parallelism, often mistakenly used synonymously for concurrency, is about the simultaneous execution of multiple things. If concurrency is about structure, then parallelism is about the execution of multiple tasks. We can say that concurrency makes the use of parallelism easier, but it is not even a prerequisite since we can have parallelism without concurrency.

Conclusively, as Rob Pike describes it: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once". You can watch his talk "Concurrency is not Parallelism" on YouTube.

Coroutines in terms of concurrency and parallelism

Coroutines are about concurrency first of all. They provide great tools that let us break down tasks into various chunks which are not executed simultaneously by default. A simple example illustrating this is part of the Kotlin coroutines documentation:



fun main() = runBlocking<Unit> { val time = measureTimeMillis { val one = async { doSomethingUsefulOne() } val two = async { doSomethingUsefulTwo() } println("The answer is ${one.await() + two.await()}") } println("Completed in $time ms") } suspend fun doSomethingUsefulOne(): Int { delay(1000L) return 13 } suspend fun doSomethingUsefulTwo(): Int { delay(1000L) return 29 }

The example terminates in roughly 1000 milliseconds since both "somethingUseful" tasks take about 1 second each and we execute them asynchronously with the help of the async coroutine builder. Both tasks just use a simple non-blocking delay to simulate some reasonably long-running action. Let's see if the framework executes these tasks truly simultaneously. Therefore we add some log statements that tell us the threads the actions run on:

[main] DEBUG logger - in runBlocking [main] DEBUG logger - in doSomethingUsefulOne [main] DEBUG logger - in doSomethingUsefulTwo

Since we use runBlocking from the main thread, it also runs on this one. The async builders do not specify a separate CoroutineScope or CoroutineContext and therefore also inherently run on main .

We have two tasks run on the same thread, and they finish after a 1-second delay. That is possible since delay only suspends the coroutine and does not block main . The example is, as correctly described, an example of concurrency, not utilizing parallelism. Let's change the functions to something that really takes its time and see what happens.

Parallel Coroutines

Instead of just delaying the coroutines, we let the functions doSomethingUseful calculate the next probable prime based on a randomly generated BigInteger which happens to be a fairly expensive task (since this calculation is based on a random it will not run in deterministic time):

fun doSomethingUsefulOne(): BigInteger { log.debug("in doSomethingUsefulOne") return BigInteger(1500, Random()).nextProbablePrime() }

Note that the suspend keyword is not necessary anymore and would actually be misleading. The function does not make use of other suspending functions and blocks the calling thread for the needed time. Running the code results in the following logs:

22:22:04.716 [main] DEBUG logger - in runBlocking 22:22:04.749 [main] DEBUG logger - in doSomethingUsefulOne 22:22:05.595 [main] DEBUG logger - Prime calculation took 844 ms 22:22:05.602 [main] DEBUG logger - in doSomethingUsefulOne 22:22:08.241 [main] DEBUG logger - Prime calculation took 2638 ms Completed in 3520 ms

As we can easily see, the tasks still run concurrently as in with async coroutines but don't execute at the same time anymore. The overall runtime is the sum of both sub-calculations (roughly). After changing the suspending code to blocking code, the result changes and we don't win any time while execution anymore.

Note on the example

Let me note that I find the example provided in the documentation slightly misleading as it concludes with "This is twice as fast, because we have concurrent execution of two coroutines" after applying async coroutine builders to the previously sequentially executed code. It only is "twice as fast" since the concurrently executed coroutines just delay in a non-blocking way. The example gives the impression that we get "parallelism" for free although it's only meant to demonstrate asynchronous programming as I see it.

Now how can we make coroutines run in parallel? To fix our prime example from above, we need to dispatch these tasks on some worker threads to not block the main thread anymore. We have a few possibilities to make this work.

Making coroutines run in parallel

1. Run in GlobalScope

We can spawn a coroutine in the GlobalScope . That means that the coroutine is not bound to any Job and only limited by the lifetime of the whole application. That is the behavior we know from spawning new threads. It's hard to keep track of global coroutines, and the whole approach seems naive and error-prone. Nonetheless, running in this global scope dispatches a coroutine onto Dispatchers.Default , a shared thread pool managed by the kotlinx.coroutines library. By default, the maximal number of threads used by this dispatcher is equal to the number of available CPU cores, but is at least two.

Applying this approach to our example is simple. Instead of running async in the scope of runBlocking , i.e., on the main thread, we spawn them in GlobalScope :

val time = measureTimeMillis { val one = GlobalScope.async { doSomethingUsefulOne() } val two = GlobalScope.async { doSomethingUsefulTwo() } }

The output verifies that we now run in roughly max(time(calc1), time(calc2)) :

22:42:19.375 [main] DEBUG logger - in runBlocking 22:42:19.393 [DefaultDispatcher-worker-1] DEBUG logger - in doSomethingUsefulOne 22:42:19.408 [DefaultDispatcher-worker-4] DEBUG logger - in doSomethingUsefulOne 22:42:22.640 [DefaultDispatcher-worker-1] DEBUG logger - Prime calculation took 3245 ms 22:42:23.330 [DefaultDispatcher-worker-4] DEBUG logger - Prime calculation took 3922 ms Completed in 3950 ms

We successfully applied parallelism to our concurrent example. As I said though, this fix is naive and can be improved further.

2. Specify a coroutine dispatcher

Instead of spawning async in the GlobalScope , we can still let them run in the scope of, i.e., as a child of, runBlocking . To get the same result, we explicitly set a coroutine dispatcher now:

val time = measureTimeMillis { val one = async(Dispatchers.Default) { doSomethingUsefulOne() } val two = async(Dispatchers.Default) { doSomethingUsefulTwo() } println("The answer is ${one.await() + two.await()}") }

This adjustment leads to the same result as before while not losing the child-parent structure we want. We can still do better though. Wouldn't it be most desirable to have real suspending functions again? Instead of taking care of not blocking the main thread while executing blocking functions, it would be best only to call suspending functions that don't block the caller.

3. Make blocking function suspending

We can use withContext which "immediately applies dispatcher from the new context, shifting execution of the block into the different thread inside the block, and back when it completes":

suspend fun doSomethingUsefulOne(): BigInteger = withContext(Dispatchers.Default) { executeAndMeasureTimeMillis { log.debug("in doSomethingUsefulOne") BigInteger(1500, Random()).nextProbablePrime() } }.also { log.debug("Prime calculation took ${it.second} ms") }.first

With this approach, we confine the execution of dispatched tasks to the prime calculation inside the suspending function. The output nicely demonstrates that only the actual prime calculation happens on a different thread while everything else stays on main. When has multi-threading ever been that easy? I really like this solution the most.

(The function executeAndMeasureTimeMillis is a custom one that measures execution time and returns a pair of result and execution time)

23:00:20.591 [main] DEBUG logger - in runBlocking 23:00:20.648 [DefaultDispatcher-worker-1] DEBUG logger - in doSomethingUsefulOne 23:00:20.714 [DefaultDispatcher-worker-2] DEBUG logger - in doSomethingUsefulOne 23:00:21.132 [main] DEBUG logger - Prime calculation took 413 ms 23:00:23.971 [main] DEBUG logger - Prime calculation took 3322 ms Completed in 3371 ms

Caution: We use Concurrency and Parallelism interchangeably although we should not

As already mentioned in the introductory part of this article, we often use the terms parallelism and concurrency as synonyms of each other. I want to show you that even the Kotlin documentation does not clearly differentiate between both terms. The section on "Shared mutable state and concurrency" (as of 11/5/2018, may be changed in future) introduces with:

Coroutines can be executed concurrently using a multi-threaded dispatcher like the Dispatchers.Default . It presents all the usual concurrency problems. The main problem being synchronization of access to shared mutable state. Some solutions to this problem in the land of coroutines are similar to the solutions in the multi-threaded world, but others are unique.

This sentence should really read "Coroutines can be executed in parallel using multi-threaded dispatchers like Dispatchers.Default ..."

Conclusion

It's important to know the difference between concurrency and parallelism. We learned that concurrency is mainly about dealing with many things at once while parallelism is about executing many things at once. Coroutines provide sophisticated tools to enable concurrency but don't give us parallelism for free. In some situations, it will be necessary to dispatch blocking code onto some worker threads to let the main program flow continue. Please remember that we mostly need parallelism for CPU intensive and performance critical tasks. In most scenarios, it might be just fine to don't worry about parallelism and be happy about the fantastic concurrency we get from coroutines.

Lastly, let me say Thank you to Roman Elizarov who discussed these topics with me before I wrote the article. 🙏🏼