I firmly believe that supercomputing of today is the mainstream computing of tomorrow. A year and a half ago I wrote a blog about the future of concurrent programming based on new developments in systems and languages in the HPC (High-Performance Computing) community. Hopefully, this year I’ll learn more at the SC11 conference that’s taking place in Seattle in November (my employer, Corensic, will have a booth there). I’m especially interested in Chapel, the HPCS (High Productivity Computing Systems) language under development by Cray Inc., also here in Seattle. There will be a whole-day Chapel tutorial at SC11, which I’m going to attend.

Why Chapel? Whenever I go to a conference and hear about a new language development to support parallel programming, I immediately compare it with Chapel. Chapel does task-based parallelism better than Cilk, TBB or PPL; data-based parallelism better than AMP or ArBB; generic programming better than D (sorry, Andrei, I’m really partial to concepts) — the list goes on. It’s unfortunate that Chapel is pigeonholed as an HPC language, because it’s perfectly adequate for general purpose programming. In fact I installed it on my laptop and wrote a few programs in it.

A lot of HPC is dedicated to scientific computations, modeling of complex systems, and processing of large quantities of data. That’s where parallel and distributed programming shines. There is no doubt in my mind that the kind of computational power that’s used in scientific computations today will soon be available on game consoles, desktops, and then on tablets and smartphones, likely in concert with cloud computing. But we are not going to use our iPhones to simulate chain reactions in nuclear warheads or heat conduction in rocket engines, are we?

So what everyday tasks could benefit from this kind of power? Obviously the game industry has insatiable appetite for computing resources. Enhanced and virtual reality are peeking from around the corner. Speech recognition and natural language processing have already made inroads into smartphones. But I’m sure that, once the power is there, we will find plenty of new and unexpected applications — If you build it, they will come.

The question is: How do we write programs that can harness the power of multicores, GPUs, and distributed systems like the Cloud — possibly all three at the same time? One thing I know for sure: Not by painstakingly managing threads, locks, message passing, copying of data over the network, etc. And this is where the current C++ (C++11) is stuck, and Chapel blazes the trail.

The major advantage of Chapel, in my mind, is that it separates the logic of the algorithm from the details of its implementation on a particular system. In the ideal world you would write a program in a high-level language and the compiler plus runtime would figure out how to run in on a particular system — what can be run in parallel, which parts can be delegated to GPUs, which parts can migrate to other machines on the network, and so on. Well, we can always dream! In reality the programmer must still tell the compiler all those things. Yes, you can do this in C++ but you’ll make your code totally unreadable. The details of implementation would quickly obscure the heart of the algorithm.

In Chapel, you express parallelism in terms of tasks; not threads, thread pools, processes, or computers. You express communications in terms of shared global address space that can span whole clusters of computers. Separately, on the side, you describe the distribution of computations in terms of locales. Each node on the network is a separate locale. Each GPU is a locale (this feature is still under development). You define your data structure in global address space, but you separately describe how you would like it to be cut up and distributed between various locales.

You may see elements of this approach in other languages, libraries, and language extensions, but never in such comprehensive manner as in Chapel. Tasks, for instance, appear in Cilk, PPL (Parallel Pattern Library), and TBB (Threading Buildg Blocks), together with elements of data-driven parallelism. Intel extended its TBB library to ArBB (Array Building Blocks); Microsoft came up with a C++ extension, AMP (Accelerated Massive Parallelism); AMD put its weight behind OpenCL — everybody and his brother are trying to catch the wave of parallelism and high-throughput computing. It just so happens that the HPC crowd has been riding this wave for a long time and there’s a lot we can learn from them.

Which is why Seattle will be hot during the week of November 12-18, no matter what the weather reports predict.

Additional Links