GPU Programming For The Rest Of Us

Page 1 of 2

One step closer to the "-gpu" option

I'm sure by now everyone has heard that you can run real code on GPU's (Graphical Processing Units). GPUs are the graphics card in your desktop or even the graphic engines running your game consoles at home (never at work - right?). The potential performance improvement for codes or algorithms that can take advantage of the GPU's programming model and do most of their computation on the GPU is enormous. There are cases of over a 100X performance improvement for some codes running on GPUs relative to CPUs.

But there are some limitations to using GPUs for computation. One of the critical limitations is that you have to take your code and rewrite it for the GPU as in the case of Brook+ from AMD or OpenCL from the Kronos Group. Alternatively, you may have to "adapt" your C code to use some extra functions and data types (extensions) in the case of CUDA from NVIDIA. Unfortunately, you just can't take your existing code with a compiler and use a compile option such as "-gpu" to magically build code for the GPU... or can you?

Writing Code for GPUs

I'm not sure how many people have written code or tried to write code for GPUs (I will use GPU in place of GP-GPUs because it's easier on my carpal tunnel symptoms), but in general it's not as easy as it appears. If you are porting your application to GPUs then you have to take your code, understand the algorithms reasonably well, and then determine places where you think GPUs will shine. Then you have to either (1) rewrite the entire code, or (2) rewrite targeted portions of the code, or (3) port the desired portions of the code to a new language. For newly written code, you can take your algorithm and frame it into a SIMD (Single Instruction, Multiple Data) context, and then write code. For old code, written lan before GPUs, or new code, writing for GPUs it's not easy. Let's take a look at what tools are available for writing GPU code.

If you want to get really hard core you can actually write GPU code using OpenGL. It is an API (Application Programming Interface), or language if you will, that allows you to write applications that run on GPUs. Originally OpenGL was designed to be used for writing 2D and 3D computer graphics applications. But people have discovered that you can use it to run general programs that aren't necessarily graphically oriented. But you have to be able to "code" in OpenGL and write your algorithms using it. There are some simple tools that can help you get started, but in general, you have to think in graphical concepts such as textures, shaders, etc., and be able to express your algorithm in terms of these concepts using OpenGL. I like to think of this as the "Assembler Language approach" to coding for GPUs. That is, you are down in the low level bowels of the language and the hardware to effectively write and run code on GPUs. In addition, such low-level approaches can limit the portability of the code from one platform to another.

While it is still very difficult for non-graphical programmers to write OpenGL code or for OpenGL coders to think about non-graphical algorithms, there are some success stories of applications. You can try this link or this link to read about some successful OpenGL applications that people have written.

Fairly early on people realized that GPUs, while showing huge potential, were not going to have widespread adoption given that they were so difficult to write code for. So higher level languages were developed. There is a whole laundry list of languages and I won't go over them here. But here is list of the higher level languages and libraries that people are using or have used to write code for GPUs:

For all of these languages and libraries, you will need to rewrite or port your application. The degree of severity varies depending upon the specific option. Arguably, CUDA is one of the easiest because you can take existing C code and add GPU code to it along with some data passing function calls to move data to/from the host CPU to the GPU.

On the other hand, languages and libraries such as BrookGPU, Brook+, and even OpenCL, will require you to rewrite much of your application. However, the developers have tried to make it as close as possible to C. Some languages and libraries are available under various open-source licenses. Others are freely available but are not open-source. Then there are others that are commercial products.

Regardless of the language or package chosen, the amount of work that goes into porting or rewriting varies. I view all of the previously mentioned languages as something like Assembler+. That is, a step above something like Assembler, but not nearly the same as C, C++, or Fortran.

What developers really want is to continue to use their current development tools for developing for GPUs. They don't want to have to rewrite codes or learn new languages. They may adapt their codes somewhat, perhaps a small amount. But overall they just want to build codes for GPUs using their existing development tools and existing code base (as much as possible).

The Evolution of GPU Tools and Developers

Developers are looking for something easy or automatic that helps them run their code on GPUs. This is what I've been referring to as the magic "-gpu" option. The idea is that the compiler is all-seeing and all-knowing so that it can inspect your code, find the parts that look SIMD appropriate code, and create a CPU/GPU binary. I think people also want to be able to eat anything they want without gaining weight or endangering their health (at least that's my dream). But the point is that this is an almost impossible dream. However, we can move down the path in that direction.

This situation is not without precedence. If you've been around a few years you may remember the rise of the vector processor. At first the developers had to deal with trying to rewrite their codes to utilize vector processors. At the same time, the compiler vendors had modified their compilers to help developers recognize opportunities for vectorization as well as create good vectorized code. Over time, developers got better at writing vector code and the compilers became better at recognizing vector opportunities and generating really good vector code. The results after several years were really good developers who, on average, pretty well understood how to write vector code and were armed with good compilers that could recognize vector code opportunities and generate very good vector code. In addition the compilers produced good enough performing code that developers did not have to resort to assembler codes that they first used to achieve a good portion of the potential performance. It took several iterations between developers and compiler creators to get to the end result. A better review of the history of vector compilers was written by Michael Wolfe from PGI (The Portland Group) at this Linux Journal.

In many ways we are following the same steps of vector processors with GPUs. We are at the beginning of the cycle where we were with vector compilers and code development. We have some early tools for developing codes for GPUs and developers are just starting to develop and, more importantly, understanding how to develop codes for GPUs. But recently, the next step in the evolution of tools (compilers) for GPUs was recently taken by The Portland Group.

PGI 8.0 Technology Preview

The Portland Group recently announced that they will have a technology preview version of their new 8.0 compilers. The preview will be given to a restricted group of testers initially and then expand to other developers over time. The first customers should see the preview in early 2009.

So what is so special about this announcement from The Portland Group? I'm glad you have asked :) What PGI has done is to add Pragmas or compiler directives to their compilers. These pragmas allow the compiler to analyze the code and generate GPU code that is then sent to the NVIDIA compiler (the Cuda compiler is part of the freely available CUDA software). The PGI compiler continues to compile the CPU based code and link it to CUDA built GPU binary. Then you execute the resulting combination.

Since we're all geeks here (well, at least I am), let's look at some details at how you code GPUs now and what PGI's announcement does for us.