Introduction

Microsoft has made a significant investment in Parallel computing features with Microsoft Visual Studio 2010 and the .NET Framework 4.0. Often the best way to understand new features is to look at core feature components and then find the core features or concepts within; proceeding deeper and finding a set of components that seem to be everywhere. The Task Parallel Library (TPL) is a core part of parallel computing in the .NET Framework. The Task class is the heart of TPL. I'm going to explain how a developer can use the Task class to leverage TPL.

A New Model

While a developer could leverage the TPL through features that existed in prior versions of Microsoft Visual Studio and the .NET Framework, truly getting all the benefits of TPL requires architectural approaches that may be new to many .NET developers.

I think of TPL as following a new kind of model that simply extends familiar concepts like threads, delegates, and the Threadpool. Models follow prescriptive patterns to solve problems and work in a particular context. The first step along the path to understanding the new model is becoming acquainted with the ideas behind the new model. The basic ideas are as follows:

Delegates are references to functions. Delegates can be passed around an application without passing a full reference to a class. A developer can control the flow of an application by organizing and invoking data structures containing delegates.

Delegates can be passed to Threads in the ThreadPool or to Threads created in the application. Developers use multiple threads to improve application performance and responsiveness.

Because threads can run concurrently and often data structures are shared by two concurrently running threads, executing delegates must control how they access shared data structures. An executing delegate must signal or lock a shared data structure before changing the data structure.

Locking data structures and creating too many Threads can introduce bottlenecks and/or consume resources in an application and can defeat the purpose of multi-threading.

To really grasp how to apply TPL requires code. For the remainder of the article I'm going to walk through some sample TPL code that demonstrates how Task works with other TPL classes.

TPL Core and Sample Code

There are two namespaces that encapsulate many of the TPL classes: System.Threading.Tasks and System.Collections.Concurrent . Here is a list of some of the classes you'll find in the namespaces.

Task

TaskScheduler

TaskFactory

TaskCancelledException

BlockingCollection<T>

The code below taken from page 55 in the "Patterns of Parallel Programming" whitepaper on the Microsoft Parallel Computing web site; utilizes classes from each namespace.

static void ProcessFile(string inputPath, string outputPath) { var inputLines = new BlockingCollection<string>(); var processedLines = new BlockingCollection<string>(); // Stage #1 var readLines = Task.Factory.StartNew(() => { try { foreach (var line in File.ReadLines(inputPath)) inputLines.Add(line); } finally { inputLines.CompleteAdding(); } }); // Stage #2 var processLines = Task.Factory.StartNew(() => { try { foreach (var line in inputLines.GetConsumingEnumerable() .Select(line => Regex.Replace(line, @"\s+", ", "))) { processedLines.Add(line); } } finally { processedLines.CompleteAdding(); } }); // Stage #3 var writeLines = Task.Factory.StartNew(() => { File.WriteAllLines(outputPath, processedLines.GetConsumingEnumerable()); }); Task.WaitAll(readLines, processLines, writeLines); }

As you can see, this is an interesting piece of code. It looks sequential, yet the subject of this article is parallel computing. If you've seen or written multithreaded applications before, you're probably not used to seeing a program structured this way. TPL hides much of the ugliness you've seen in multithreaded and asynchronous code.

Now let's break down how this code executes. As promised earlier in the article, I'm starting with the role of the Task class.