The news I’m most excited about in C++20 is the the inclusion of coroutines. While there are other good steps forward, coroutines can significantly simplify the development of concurrent systems and help C++ tackle some of the complexity that is associated with it.

I decided to try and re-imagine some of the systems I’ve worked on in terms of C++ coroutines and see if they will become easier to reason about. To do that I’ll explore a grossly simplified model of loading an HTML page. In this model we only have to do 3 things:

Load an HTML document on any thread Load a CSS document on any thread Create the Document on a thread selected as “main” – happens after the HTML and CSS file are loaded.

As you can see the example is simple, but contains some key nuggets – we have both independent and dependent tasks. Also a certain task – the Document creation should happen on a predetermined thread. The need to run some tasks on specific threads is common especially when interfacing with third-party libraries like graphic API, scripting VMs etc.

Task-based approach

The classic model of creating an async I/O system would be posting tasks on a thread pool and relying on callbacks to get notified when a specific task is complete. This is how the popular libuv library works. Callbacks unfortunately make the code difficult to follow and require manually passing resources across tasks, which complicates their management.

In the “cohtml” HTML rendering engine we developed at Coherent Labs, we built a task system where continuations have to be explicitly specified by posting a new task. The code itself is simplified by the use of lambdas. A lot of details on the task system can be found here.

Let’s get down to the code and see the approach in action.

Note that the code is simplified and shortened for the sake of clarity .The actual production interfaces are more complicated but are beyond the scope of this post.

The code is fairly self-explanatory. The new document loading does the 3 steps we mentioned earlier. It posts 2 tasks on any worker thread through the Enqueue method and they will load the text files. The cool thing is that the usage of lambdas does simplify readability – we see the logic that will be executed close to where the operation is initiated.

There are some downsides as well. The sample contains no logic to execute the MakeDOM method (the continuation), when both the HTML and CSS file are loaded. We solve this by calling a function OnDocumentResourcesLoaded that checks that everything required is loaded and then posts on the Main thread the MakeDOM method. We also have to add synchronization to our Document fields, because we have to make sure the OnDocumentResourcesLoaded sees the changes applied by all thread.

Another downside is that tasks in this design can’t return a value – they only execute logic. We pass around the resulting Document as a shared pointer to record the new data.

Coroutines approach

Coroutines in C++ offer the basis to build more complicated systems around them. This is inline with the philosophy of the language to provide the foundations and let the developers build libraries on top of them (not always followed unfortunately). Other languages offering coroutines like Go and Kotlin bundle a runtime for scheduling and executing them.

This design choice of the standardization committee means that we can’t easily delve into experimenting with coroutines but would have to write a lot of code to make them truly usable. I decided to leverage the excellent cppcoro library by Lewis Baker. The library is still missing some core features – especially on Linux, but nevertheless is a great example on how we can use coroutines for concurrent programming.

I re-wrote the previous example with cppcoro and this is the result:

I particularly like the CreateDocument function where is the gist of the logic. The other functions are helpers. static_thread_pool is a thread pool managed by cppcoro that allows us to run coroutines in parallel.

Let’s look in more detail at CreateDocument . In cppcoro, the coroutines are not started until they are awaited, so the calls to ReadFile only create cppcoro::tasks that contain the coroutine. When we co_await them on lines 34-36 they are scheduled and started on the thread pool. When both files are read, the CreateDocument coroutine is resumed on the Main thread. All the logic is nicely held within just one function. Overall the code is easier to follow and resembles a linear function – very nice.

There is an inefficiency in the design as well unfortunately. While we are loading the files, the Main thread just waits for them to be ready. A more efficient system would actually schedule one of the file loading coroutines on the Main thread and actively cooperate to get the needed result. The Go language scheduler implements such cooperation. This is doable in C++ and requires a more complicated scheduler.

I’m very happy with the final result of the experiment. C++ coroutines did simplify my sample code flow. Unfortunately wide C++ coroutines adoption requires good libraries and they are still few or incomplete. I hope this will soon change.