Earlier this month, the OpenCL specification was released by the Khronos group. Khronos is a group made up of representatives from companies in the computing industry. The group focuses on creating and managing standards for graphics, multimedia and parallel computing on everything from mobile devices to desktop and workstation computers. Part of Khronos' charge is OpenGL and all it's relatives with the Open- prefix, so naming also makes sense.

The goal of OpenCL is to make certain types of parallel programming easier and to provide vendor agnostic hardware accelerated parallel execution of code. That's a bit of a mouth full, but the bottom line is that OpenCL will give developers a common set of easy to use tools to use to take advantage of any device with an OpenCL driver (processors, graphics cards, ect.) for the processing of parallel code.

While there are already tools available that enable parallel processing, these tools are largely dedicated to task parallel models. The task parallel model is built around the idea that parallelism can be extracted by constructing threads that each have their own goal or task to complete. While most parallel programming is task parallel, there is another form of parallelism that can greatly benefit from a different model.

In contrast to the task parallel model, data parallel programming runs the same block of code on hundreds (or thousands or millions or ...) of data points. Whereas my video game may have threads for handling AI, physics, audio, game state, rendering, and possibly more finely grained tasks if I'm up to the challenge, a data parallel program to do something like image processing may spawn millions of threads to do the processing on each pixel. The way these threads are actually grouped and handled will depend on both the way the program is written and the hardware the program is running on.

As we've said many times in the past, graphics is almost infinitely parallelizable. Millions of pixels on the screen can all act (mostly) independently of each other. Light weight threads handle the calculation of everything that has to do with a particular pixel. As pixels get smaller and we pack more on screens, there is more opportunity for parallel work. Graphics cards are currently the best data parallel processing engines we have available. And once OpenCL drivers are available, developers will have access to all that horsepower for any other data parallel tasks they see fit.

Now, it won't make sense to run a word processor on your graphics card, as there just isn't enough happening at once to take advantage of the hardware. Single threaded performance on a GPU isn't that great, especially compared to a general purpose CPU, and trying to run code that isn't massively parallel just isn't going to be a great idea. But there are plenty of things that can benefit from the GPU. Basically any multimedia processing can benefit, from video and audio decoding, editing, and encoding, to image manipulation, to helping speed up your math homework (brute force computation ala Maple, Matlab, and Mathematica could certainly benefit from the GPU). There could be some interesting encryption and/or compression techniques that are born out of the data parallel approach as well.

The best applications of data parallel computing have likely not been seriously considered at this point, as it takes time to get from the availability of tools to the finished product, let alone the conception of ideas that have heretofore been precluded by the realities of parallel programming. But OpenCL isn't a miracle that will make everything speed up. Rather it is a vehicle by which developers will be able to make a small subset of tasks orders of magnitude faster using hardware that is already in most people's computers. Which is certainly nice. But let's take a closer look.