This article will explain how the async/await pair really works and why it could be better if real cooperative threading was used instead.

To those who have already read the article

If you have already read the article and only want to know what's new, click here. The second update is here.

Background

I already wrote the article Yield Return Could Be Better and I must say that async/await could be better if a stack-saving mechanism was implemented to do real cooperative threading. I am not saying that the async/await is a bad thing, but it could be added without compiler changes (enabling any .NET compiler to use it) and maybe adding keywords to make its usage explicit. Different from the other time, I will not only talk about the advantages, I will provide a sample implementation of a stacksaver and show its benefits.

Understanding the async/await pair

The async/await was planned for .NET 5 but it is already available in the 4.5 CTP. Its promise is to make asynchronous code easier to write, which it indeed does.

But my problem with it is: Why do people want to use the asynchronous pattern to begin with?

The main reason is: To keep the UI responsive.

We can already maintain the UI responsive using secondary threads. So, what's the real difference?

Well, let's see this pseudo-code:

using ( var reader = ExecuteReader()) while (reader.ReadRecord()) listbox.Items.Add(reader.Current)

Very simple, a reader is created and while there are records, they are added to a listbox. But imagine that it has 60 records, and that each ReadRecord takes one second to complete. If you put that code in the Click of a Button, your UI will freeze for an entire minute.

If you put that code in a secondary thread, you will have problems when adding the items to the listbox, so you will need to use something like listbox.Dispatcher.Invoke to really update the listbox.

With the new await keyword, your method will need to be marked as async and you will need to change the while line, like this:

while ( await reader.ReadRecordAsync())

And your UI will be responsible.

That's magic!

Your UI became responsible by a simple call to await ?

And what's that ReadRecordAsync ?

Well, here is where the complexity really lives. The await is, in fact, registering a continuation and then allowing the actual method to finish immediately (in the case of a Button Click, the thread is free to further process UI messages). Everything that comes after await will be stored in another method, and any data used before and after the await keyword will live in another class created by the compiler and passed as a parameter to that continuation.

Then there is the implementation of ReadRecordAsync . This one may be considered the hardest part, as it may use some kind of real asynchronous completion (like IO completion ports of the Operating System) or it will still use a secondary thread, like a ThreadPool thread.

Secondary threads

If it still uses secondary threads, you may wonder how it is going to be faster than a normal secondary thread.

Well... it is not going to be faster, it may be a little slower as it by default needs to send a message back to the UI thread when the process is completed. But if you are going to update the UI, you will already need to do that.

Some speed advantage may reside on the fact that the actual thread may already start something else (instead of waiting doing nothing) and also on the ThreadPool usually used by the Task s, which forbids too many concurrent work items. That is, some work items need to end so new work items (including Task s) can start. With normal threads, we may risk having too many threads trying to run at once (much more than the real processor count), when it will be faster to let some threads simply wait to start (and also too many threads occupy too many OS resources).

Noticing the obvious

Independent of the benefits of the ThreadPool and the ease of use of the async keyword, did you notice that when you put an await in a method the actual thread is free to do another job (like processing further UI messages)?

And that at some point such await will receive a result and continue? With that you can very easily start five different jobs. Each one, at the end, will continue running on the same thread (probably the UI).

It is not hard to see those jobs as "slim" threads. As a Job , they start, they "block" awaiting, and they continue. The real thread can do other things in the "blocking" part, but the same already happens with the CPU when a real thread enters a blocking state (the CPU continues doing other things while the thread is blocked).

Such Jobs don't necessarily have priorities, they run as a simple queue in their manager thread but every time they finish or enter in a "wait state", they allow the next job to run.

So, they will all run in the same real thread, and one Job must await or finish to allow others to run. That's cooperative threading.

It could be better

I said at the beginning that it could be better so, how?

Well, real cooperative threads will do the same as the await keyword, but without the await keyword, without returning a Task , and consequently making the code more prepared to future changes.

You may think that code using await is prepared for future changes, but do you remember my pseudo-code?

using ( var reader = ExecuteReader()) while (reader.ReadRecord()) listbox.Items.Add(reader.Current)

Imagine that you update it to use the await keyword. At this moment, only the ReadRecord method is asynchronous, so the code ends-up like this:

using ( var reader = ExecuteReader()) while ( await reader.ReadRecordAsync()) listbox.Items.Add(reader.Current)

But in the future, the ExecuteReader method (which is almost instantaneous today) may take 5 seconds to respond. What do I do then?

I should create an ExecuteReaderAsync that will return a Task and should replace all the calls to ExecuteReader() by an await ExecuteReaderAsync() . That will be a giant breaking change.

Wouldn't it be better if the ExecuteReader itself was able to tell "I am going to sit and wait, so let another job run in my place"?

Pausing and resuming a Job

Here is where all the problems are concentrated and here is the reason await keyword exists. Well, I think people at Microsoft got so fascinated that they could change the compiler to manage secondary callstacks using objects and delegates (effectively creating the continuation) that they forgot they can create a full new callstack and replace it.

If you don't know what the callstack is, you may have already seen it in the debugger window. It keeps track of all methods that are actually executing and all variables. If method A calls method B, which then calls method C, it will have the exact position in method C, the position it will be when C returns, and also the position to return to A when B returns.

A continuation is the hard version of this. In fact, simply continuing with another method is easy, the problem is creating a try/catch block in method A and putting a continuation to B that is still in the same try/catch . In fact the compiler will create an entire try/catch in method A and in method B, both executing the same code in the catch (probably with an additional method to be reutilized by the catch code).

If instead of managing a "secondary callstack" in a continuation they created a completely new callstack and replaced the thread callstack by the new and, at wait points, restored the original callstack, it will be much simpler as all the code that uses the callstack will continue to use it. No additional methods or different control flows to deal with try/catch es.

Such an alternative callstack is what I call a StackSaver in the other article but my original idea was misleading. It does not need to save and restore part of the callstack. It is a completely separate callstack that can be be used in place of the normal callstack (and will restore the original callstack in waits or as its last action). It will be a "single pointer" change to do all the job (or even a single CPU register change).

Good theory, but it will not work

The .NET team did a lot of changes to support the "compiler magic" to make the async work, and I tell that if we can simply create new callstacks, we can have the same benefits with an even easier to use and more main tenable code, and that all we need is to be able to switch from one callstack to another.

That looks too simple and maybe you think that I am missing something, even if you don't know what, and so you believe it will not work.

Well, that's why I created my simulation of a StackSaver to prove that it works.

My simulation uses full threads to store the callstack, after all there is no way to switch from one callstack to another at the moment. But this is a simulation, and it will prove my point.

Even being full threads, I am not simply letting them run in parallel as that will have all the problems related to concurrency (and will be normal threading). The StackSaver class is fully synchronized to its main thread, so only one runs at a time.

This will give the sensation of:

Calling the StackSaver.Execute to start executing the other callstack "in the actual thread";

to start executing the other callstack "in the actual thread"; When the action running in the StackSaver ends or calls StackSaver.YieldReturn , the control goes back to the original callstack.

The only big difference of my StackSaver is that anything that uses the Thread identify (like WPF) will notice that it is another thread. So it is not a real replacement but works for my simulation purposes and already allows to create a yield return replacement without any compiler tricks.

You didn't see wrong, I am not committing an error, by default the StackSaver allows for a yield return replacement, not for an async/await replacement.

Doing the async/await replacement with the StackSaver

To use the StackSaver as an async/await replacement, we must have a thread that deals with one or more StackSaver s. I am calling the class that creates such a thread as CooperativeJobManager .

It runs like an eternal loop. If there are no jobs, it waits (real thread waiting, no job waiting). If there are one or more Job s, it dequeues a Job and makes it run. As soon as it returns (by a yield return or by finishing) and the original caller regains execution, it checks if it should put the Job again in the queue (as the last one) or not.

The only problem then is to wait for something. When the Job request a "blocking" operation, it must create a CooperativeWaitEvent , will set-up how the async part of the job really works (maybe using the ThreadPool , maybe using IO completion ports), will mark itself as waiting, and will yield return .

The main callstack, after seeing the Job is waiting, will not put it in the execution queue again. But when the real operation ends and "Sets" the wait event, it will requeue the job.

It is simple as that and here is the entire code of the CooperativeJobManager :

using System; using System.Collections.Generic; using System.Threading; namespace Pfz.Threading.Cooperative { public sealed class CooperativeJobManager: IDisposable { private readonly HashSet<CooperativeJob> _allTasks = new HashSet<CooperativeJob>(); internal readonly Queue<CooperativeJob> _queuedTasks = new Queue<CooperativeJob>(); internal bool _waiting; private bool _wasDisposed; public CooperativeJobManager() { var thread = new Thread(_RunAll); thread.Start(); } public void Dispose() { lock (_queuedTasks) { _wasDisposed = true ; if (_waiting) Monitor.Pulse(_queuedTasks); } } public bool WasDisposed { get { return _wasDisposed; } } private void _RunAll() { CooperativeJob task = null ; while ( true ) { lock (_queuedTasks) { if (_queuedTasks.Count == 0 ) { if (task == null ) { do { if (_wasDisposed && _allTasks.Count == 0 ) return ; _waiting = true ; Monitor.Wait(_queuedTasks); } while (_queuedTasks.Count == 0 ); } } else { if (task != null ) _queuedTasks.Enqueue(task); } if (_queuedTasks.Count != 0 ) { _waiting = false ; task = _queuedTasks.Dequeue(); } } CooperativeJob._current = task; if (!task._Continue() || task._waiting) task = null ; } } public CooperativeJob Run(Action action) { if (action == null ) throw new ArgumentNullException( " action" ); var result = new CooperativeJob( this ); var stackSaver = new StackSaver(() = > _Run(result, action)); result._stackSaver = stackSaver; lock (_queuedTasks) { _allTasks.Add(result); _queuedTasks.Enqueue(result); if (_waiting) Monitor.Pulse(_queuedTasks); } return result; } private void _Run(CooperativeJob task, Action action) { try { CooperativeJob._current = task; action(); } finally { CooperativeJob._current = null ; lock (_allTasks) _allTasks.Remove(task); } } } }

With it, you can call Run passing an Action and that action will start as a CooperativeJob .

If the action never calls a CooperativeJob.YieldReturn or some cooperative blocking call, it will effectively execute the action directly. If the action does some kind of yield or cooperative wait, then another job can run in its thread.

Now imagine this in your old Windows Forms application. At each UI event, you call CooperativeJobManager.Run to execute the real code. In those codes, any operation that may block (like accessing databases, files, or even Sleeps) allows another job to run. And that's all, you have full asynchronous code that does not have the complication of multi-threading and really looks like synchronous code.

The source for download is done in .NET 3.5 and I am sure it may work even under .NET 1.0 (may require some changes).

The real missing thing is the StackSaver class which, as I already told you, uses real threads in this implementation, so it is more useful for demonstration purposes only.

Advantages of cooperative threading over async/await done by the compiler

Will be available to use any .NET compiler if it is in a class like the one presented here.

You will not cause a breaking change if one method that today does not "block" starts to "block" in the future.

You will not have an easier continuation style, because you can simple avoid it. In any place you need a continuation, create a new Job that may "block" without affecting your thread responsiveness.

that may "block" without affecting your thread responsiveness. The callstack will be used normally, avoiding a CPUu register used to store a reference to the "state" and another one already used by the callstack, which should make things a little faster.

By having the callstack there, it will be easier to debug.

Advantages of the async/await done by the compiler over cooperative threading

I can only see one. It is explicit, so users can't say they faced an asynchronous problem when they did synchronous code.

But that can be easily solved in cooperative threading by flags that will effectively tell the CooperativeJob that it cannot "block", raising an exception if a "blocking" call is done. It is certainly easier to make an area as "must not run other jobs here" than to have to await 10 times to do 10 different reads or writes.

Blocking versus "Blocking"

From my writing, you may notice that a "blocking" call is not the same as a blocking call.

A "blocking" call blocks the actual job but lets the thread run freely. A real blocking call blocks the thread and, when it returns, it continues running the same job.

Surely it may be problematic if we have a framework full of blocking and "blocking" calls. But Microsoft is already reinventing everything with Metro (and even Silverlight has a network API that is asynchronous only).

So, why not replace all thread-blocking calls with job-blocking calls and make programming async software as easy as normal blocking software?

Did you like the idea?

Then ask Microsoft to add real cooperative threading through a stack-saver by clicking on this link and then voting for it.

The sample

I only did a very simple sample to show the difference of a real thread-blocking call versus a job-blocking call.

I am surely missing better samples and maybe I will add them later. Do not let the simplicity of the sample kill the real potential of the callstack "switching" mechanism, which can make better versions of asynchronous code, yield return , and also open a lot of new scenarios for cooperative programming, making it easier to write more isolated code that can both scale and be prepared for future improvement without breaking changes.

POLAR - The first implementation of a StackSaver

I am finally presenting the first version of a StackSaver for .NET itself (even if it is a simulation) but this is not the first time I show a working version of the concept. I already presented it working in my POLAR language.

The language is still an hybrid between compilation and interpretation, but it uses the stacksaver as a real callstack replacement and it will be relatively easy to implement asynchronous calls to it using the Job concept instead of the await keyword. I don't have a date for it as I am doing too many things at the time (like still adapting to a new country), but I can guarantee that it could be capable of working with such Job s without even knowing how to deal with secondary threads.

Coroutines and Fibers

When I started writing this article, I didn't really know what coroutines where and I had no idea what a fiber was.

Well, at this moment I am really considering renaming my StackSaver class to Coroutine, as that is what it is really providing. And Fibers are OS resources that allow to save the callstack and jump to another one and is the resource needed to create coroutines.

I did try to implement the StackSaver class using Fibers through P/Invoke but unfortunately unmanaged Fibers don't really work in .NET. I really think that it is related to garbage collection, after all when searching for root objects, .NET will not see the "alternative callstacks" created by unmanaged fibers and will collect objects that are still alive, but unseen.

Either way, at this moment I will keep the name StackSaver and "Jobs", as this is similar to task but does not cause trouble with the Task class.

Update - Trying to explain better

From the comments I understand that I did not give the best explanation and people are getting confused by my claims.

If you see the source code of the StackSaver , you will see threads that block. So don't see the code of the StackSaver . See its idea:

You create a StackSaver with a delegate. When you call stacksaver.Execute , it will execute that delegate until it ends or until it finds a stacksaver.YieldReturn/StackSaver.StaticYieldReturn .

When yielding, the original caller returns to its execution, and when it calls Execute again, the statement of the delegate just after the YieldReturn will continue. This will generate the exact same effect of the yield return used by enumerators.

Then the async/await replacement is based on a kind of "scheduler" that I call CooperativeJobManager . That scheduler is able to wait if it has 0 jobs scheduled or runs one job after the other when there are jobs scheduled.

The only thing missing by default is the the capacity to "unschedule" a job while it is waiting and to reschedule it again when the asynchronous part gets a result. That is done by marking the job as waiting and "yield returning". The scheduler then does not reschedule that job immediately, but when the "wait event" is signaled, the job is scheduled again.

If the scheduler was using the ThreadPool , it will have the same capacity of async/await in the sense that after awaiting, the job may be continued by another thread.

If that is still not enough to understand, I am already considering creating a C++ version of the code that does not use Thread s in the StackSaver class. But the rest of the code (that uses the StackSaver ) will be the same... and I am not sure if C++ code will really help get the idea.

A better example on why my proposed approach is more prepared for future changes

I said that my approach is more prepared for future changes but the examples where too abstract. That may be one of the reasons for confusion. So, let's focus on something more real.

Let's imagine a very simple interface for getting ImageSource s. The interface has a Get method that receives a filename. Very simple, but let's see two completely different implementations. One loads all the images on startup, so when asking for an image, it is always there and returns immediately.

The other always loads images when asked to. It does not try to do any caching.

Now, let's imagine that when I click a button, I get all the images (let's say there are 100s of them), generate the thumbnails for all of them in a single image, and then save them. Here comes the problem with asynchronous code: How can the interface return an ImageSource if the image loading is asynchronous?

The answer is: The interface can't return an ImageSource . It should return a Task<ImageSource> .

In the end, with the Task based asynchronous code, we will:

Create 100 Task s, even when using the implementation that has all images in memory.

s, even when using the implementation that has all images in memory. One extra task will be created for the method that generates the thumbnails.

Finally, when saving, an extra task is generated for the file save (even if we don't use it, but the asynchronous Write will create it).

will create it). In fact, there are some more tasks, as the opening and reading are two different asynchronous things, like creating and writing to the files.

As you can see, there are a lot of tasks created here, even when the implementation has all things in memory.

It is possible to store the tasks themselves in the cache (and that will avoid some of the async magic) but we will still have a higher overhead when reading the results from the cache that has everything in memory.

With my proposed "job synchronous/thread asynchronous code":

One job is created to execute all the code.

The 100 image gets will not "block" with the cache that has all images already loaded, or they will block 100 times the Job , not the Thread , when loading the images with the other implementation.

, not the , when loading the images with the other implementation. After getting or loading all images with "synchronous" semantics, it will execute the thumbnail generation normally, and then saves the images, "blocking" the Job again.

again. Then by ending the method, the job ends.

Total jobs? 1. If we use the implementation that has all images in memory, we will have faster code because we will receive the ImageSource s as results, not Task s to then get their results.

Still think that Task based asynchronity is better?

If you believe that Task based asynchrony will be better because it may use secondary threads if needed, then think again, as Job based asynchrony can too. The secondary threads, if any, are used by the real asynchronous code (when loading or reading a file, IO completion ports can be used). After the asynchronous action ends, it should ask the continuation Task to be scheduled (with a Job , it will be rescheduled).

If the image loading itself may use hardware acceleration to convert bytes into image representation and so is returning a Task too, well, the Job can also start that hardware asynchronous code and be put to sleep, returning its execution when the generated image is ready.

All the advantages of the Task based approach that I can resume as can be continued later, be it on the same thread or on another thread are there. Most part of the disadvantages (like you getting lost when a [ThreadStatic] value is not there anymore) are present too. But all your methods can continue to return the right value types (not Task s).

With my proposed solution, if some code may end-up calling synchronous or asynchronous code (like the interface that may return images directly or load them) you don't need to generate extra Task s only to be sure that it will work when the code is asynchronous. Simply let the Job block and be rescheduled later.

I hope it makes more sense now.

Update 2 - Discussion with Eugene Sadovoi

After a lot of talk with Eugene Sadovoi, I am sure I am not clear enough. So, for those who are still lost, I am sorry. I really tried to omit some things trying to make the article shorter and easier to read, but apparently I did the opposite.

And, for those who simply want more info, I will try to give it now. So, some new "viewpoints" on the matter:

Tasks versus Jobs... or, may I say... Jobs == Tasks

Not only the words Job and Task may have the same meaning, they are effectively the same. In all my article, I tried to use the word Job to represent a cooperative Job, while a Task represents .NET classes ( Task and Task<T> ).

But the only thing that is really needed for a Task to become a Job is the possibility to "pause" at any moment. With the await keyword, we can only pause the actual method if it returns a Task . We can't pause the caller of the actual method.

If the await was capable of pausing the actual Task , be it the Task returned by this method, the Task that called this method directly, or the Task that called an unknown number of methods before reaching the actual method, the Task will be a Job and await will really represent a "make the actual Task/Job wait and let the actual thread do something else".

Under the hood

So, all my article is in fact "Under the Hood". How we can make the actual Task be paused at any moment?

Returning a Task is an implementation detail. What users want is to use the await keyword... and, when using it, they really want to say: While waiting for this result, allow the actual thread to do something else.

With the actual compiler implementation, it is impossible for a method to return void and make the caller Task await. They make the actual task "return a continuation to continue later". I think that is too much implementation details, users don't want that.

With cooperative threading, which is in fact based on some kind of stack saving/switching mechanism, we can really make a Task wait at any moment. It is not required to register a continuation and return all methods on the call stack (and those, to register continuations if needed). It can simple say: "await now, independent of how many things I have on the callstack" and then have the continuation code as the next instruction. That affects a lot of the other methods (the callers) not the actual method.

Finally, what changes for users?

The Task s are not created for any method that may await . They are created at "keypoints" only.

For a WPF or Windows Forms application, that means that every "UI event" must create a Task ", so it can await at any moment.

As long as you don't need parallel execution, you simply write synchronous code that will work with asynchronous sub-methods. But when you really want parallel execution, you create Task s over the calling methods (that will become delegates) and use things like Task.WaitAny or Task.WaitAll .

OK... let's compare

Maintenance - My solution works as any blocking code and does not require changes if in the future an inner method starts to block.

Learning curve - As you don't really change the code, it is easy to learn.

Speed - Considering a Job can be a "pausable Task", all optimizations done to tasks can be done to jobs.

Speed 2 - Considering the state machine (used by the actual Task implementation), you always have a little "cost" to return to the exact same position in the method, which is a fixed time with the stack-saving mechanism (and I am not even sure if there aren't optimizations or specific CPU commands to save the stack/registers).

Memory - My approach may use more memory for the callstack, but it may end-up allocating much less Task objects, so it may even end-up using less memory. I will consider here the actual implementation and mine are equivalent, no one is really better.

Context switches - As happens with any await use, will only happen when the Operating System re-schedules the actual thread with another real thread (that is unavoidable) or when the actual "Task/Job" yields or enters some await state.

To compiler developers - It will not require other compilers to change as the Tasks will be a "pausable" and "awaitable" class. As there are no compiler tricks, there is no chance of one compiler generating a better state machine than other. There is no chance of one compiler supporting it and other compilers not.

Also, with the exact same implementation, all errors that users may face will be the same independent of the used compiler. With the compiler based trick, it is possible that some compilers have one kind of issue, while other compilers have others.