In Part 1 of this multi-part article on OpenMP, I described how OpenMP source code is processed by an OpenMP compiler. OpenMP compilers resemble the typical compiler pipeline of traditional compilers and augment it with OpenMP-specific stages and passes that transform OpenMP code to multi-threaded execution (see Figure 1). In this installment, I explore how a parallel construct is transformed into code that can be handed over to a team of threads for execution. My focus will be on how to utilize POSIX threads, but the concepts shown here can easily mapped to other threading APIs as well.

Before we dive into the OpenMP low-level transformation to create threaded code, we first need to investigate how most threading APIs work to understand why these code transformations are necessary. Without too much loss of generality, I will start with a short example of how to concurrently execute a piece of code with a threading model such as POSIX threads. (Other threading APIs rely on similar concepts to implement threading.) I believe this is a good kick start for all that are not experts in low-level thread programming.

Let's assume we want to write a POSIX version of the all-time favorite "Hello World". Here's the sequential code:

#include <stdio.h> int main(int argc, char** argv) { printf("Hello World

"); return 0; }

The following code fragment shows the POSIX version of the previous program. Each thread will print the "Hello World" string and then exit:

#include <stdio.h> #include <pthread.h> #define NUM_THREADS 8 void* run(void* arg) { printf("Hello World

"); return NULL; } int main(int argc, char** argv) { int i; pthread_t threads[NUM_THREADS]; for (i = 0; i < NUM_THREADS; i++) { pthread_create(/*1*/ &threads[i], /*2*/ NULL, /*3*/ run, /*4*/ NULL); } for (i = 0; i < NUM_THREADS; i++) { pthread_join(/*1*/ threads[i], /*2*/ NULL); } return 0; }

Multi-threading with POSIX relies on a so-called start routine that contains the code to be executed. A typical task for enabling a sequential program for POSIX threads thus is to move the corresponding code from the sequential code area to a newly created thread function. In the above example, we have moved the printf statement from the main method to a new function called run. This function is required to accept one pointer that points to the arguments for the invocation and it returns a pointer to the address for the function's return value.

To execute the run function in a new thread, we invoke the POSIX call pthread_create in a for loop to create a set of threads for execution. We pass along four arguments to pthread_create: (1) a pointer to the thread handle that can be used by the creator to control its created child threads, (2) a pointer to thread attributes (we can ignore this here), (3) a function pointer of the start routine (the run function in our case), and (4) the pointer that will be passed to the start routine as the argument. After the for loop is done, there are NUM_THREADS threads that will run concurrently in addition to the very first thread that executes the main function of our program.

At the end of the "Hello World" example, we need to take care to wait until all threads have finished executing their work. We use the pthread_join call for this purpose. It suspends the calling thread until a second thread has finished execution. The second thread is determined by the thread handle, which is passed as the first argument of pthread_join. The second argument is a pointer to variable that will receive a pointer to a return value.