Some Background

Recently I created a program that downloads xml files from a remote FTP server, parses them, extracts their data from them, and inserts the extracted data (including all of the xml) into an Azure SQL Server. My first approach was (foolishly) to do it all synchronously. Grab a file from the ftp server, validate it, parse it, and insert it into the database. Then do it all again for the next file. One after the other. Slowly. Wasting valuable resources. Wasting time. There are a plethora of excuses as to why I approached it this way at first, but I think the better thing to focus on is that I fixed it.

Even more recently I needed to update the program to be more efficient (I know right?!!) and cost effective. The program is running in a Worker Role on an Azure Cloud Service. It would be nice to use as little CPU as infrequently as possible. As I was redesigning the system, it became abundantly clear that I needed to async my code.

Why Async?

I think Microsoft explains it succinctly enough for us to understand:

Asynchrony is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn’t depend on the web resource until the potentially blocking task finishes. (https://msdn.microsoft.com/en-us/library/hh191443.aspx)

The Task-based Asynchronous Pattern (TAP) is Microsofts’ newest solution to async programming. Again from Microsoft:

[TAP] introduces a simplified approach, async programming, that leverages asynchronous support in the .NET Framework 4.5 and the Windows Runtime. The compiler does the difficult work that the developer used to do, and your application retains a logical structure that resembles synchronous code. As a result, you get all the advantages of asynchronous programming with a fraction of the effort. (https://msdn.microsoft.com/en-us/library/hh191443.aspx)

Sounds good to me. I remember taking the SCJP for Java 6 and threads were complicated to say the least. I like TAP better. So much better.

The Code

FileImporter

The following isn’t exactly how it was when it was synchronous, but it gives a pretty good picture of the situation.

The above code sample shows an example of a simple class called FileImporter that imports a list of files from some remote server into presumably some kind of storage such as a database or blob storage.

Downloading data takes time. Database transactions take time. Uploading to blob storage takes time. Each file has to wait in line to be imported. It has to wait for each process to complete before it can be used: before it can fulfill its purpose. This is wasteful. And frankly sad.

AsyncFileImporter

Let’s sweeten it up with some async goodness. Check out AsyncFileImporter below.

Some noteworthy points.

The RunAsync method queues up a list of Tasks for each file to be imported using a query and then executing that query by calling ToList(). The method then waits for each Task to complete.

The method SomeLongRunningProcess is a fake of a real long running task such as a file download using a StreamReaders ReadToEndAsync method or an insert into a database using EntityFramework. Here we just randomly wait between one and ten seconds to return the file name to the calling method. (This is one way to write your tests for async methods)

Using the async keyword on a method marks it as an asynchronous method and it should return a Task. An async method needs to make use of the await keyword, which tells the compiler to go ahead and return from the async method so the program isn’t blocked by what’s behind it. Then when await finally returns, the program picks back up there.

Conclusion

This is a high level view of it, but you get the idea. Your program is no longer blocked by long running processes and can get more done in less time. This is more efficient and frankly pretty cool.

If you want more than gists you can clone the github repo and run the solution. Play around with it. You’ll notice when you run the project, the console will output the filenames out of order. You can replace the call to the async class with the synchronous class and see how it runs differently.

As a disclaimer of sorts this may not be the absolute best way to do this. I am always looking to improve, so if you have a more awesome and better way of doing this, please let me know.

Thanks for reading and have fun coding.

Some resources