In this document, we concentrate on the C++ language in particular, and use GCC to compile the examples.

This document attempts to give a quick introduction to OpenMP (as of version 4.5), a simple C/C++/Fortran compiler extension that allows to add parallelism into existing source code without significantly having to rewrite it.

This document attempts to give a quick introduction to OpenMP , a simple C/C++/Fortran compiler extension that allows to add parallelism into existing source code without significantly having to entirely rewrite it.

Note: If your GCC complains that "-fopenmp" is valid for D but not for C++ when you try to use it, or does not recognize the option at all, your GCC version is too old. If your linker complains about missing GOMP functions, you forgot to specify "-fopenmp" in the linking.

To harness that power, it is becoming important for programmers to be knowledgeable in parallel programming — making a program execute multiple things simultaneously.

As CPU speeds no longer improve as significantly as they did before, multicore systems are becoming more popular.

You can compile them like this:

There is also a runtime library that can be accessed through, but it is less often needed. If you need it, you can check the #definefor conditional compilation in case of compilers that don't support OpenMP.

By C and C++ standards, if the compiler encounters athat it does not support, it will ignore it. So adding the OMP statements can be done safely without breaking compatibility with legacy compilers.

Only when the compiler interprets thoselines, it becomes a parallel program. It really does calculate N values simultaneously where N is the number of threads. In GCC, libgomp determines that from the number of processors.

As you can see, there is very little in the program that indicates that it runs in parallel. If you remove thelines, the result is still a valid C++ program that runs and does the expected thing.

This program can be improved in many different ways, but it is left simple for the sake of an introductory example.

This program calculates the classic Mandelbrot fractal at a low resolution and renders it with ASCII characters, calculating multiple pixels in parallel.

OpenMP 4.0 added support for offloading code to different devices, such as a GPU. Therefore there can be three layers of parallelism in a single program: Single thread processing multiple data; multiple threads running simultaneously; and multiple devices running same program simultaneously.

This version requires compiler support for at least OpenMP 4.0, and the use of a parallel floating point library such as AMD ACML or Intel SVML (which can be used in GCC with e.g. ‑mveclibabi=svml).

This code divides the table initialization into multiple threads, which are run simultaneously. Each thread initializes a portion of the table.

OpenMP consists of a set of compilers that control how the program works. The pragmas are designed so that even if the compiler does not support them, the program will still yield correct behavior, but without any parallelism.

#pragma omp

barrier

flush

The parallel construct

The parallel construct starts a parallel block. It creates a team of N threads (where N is determined at runtime, usually from the number of CPU cores, but may be affected by a few things), all of which execute the next statement (or the next block, if the statement is a {…} -enclosure). After the statement, the threads join back into one. #pragma omp parallel { // Code inside this region runs in parallel. printf ( " Hello!

" ) ; } This code creates a team of threads, and each thread executes the same code. It prints the text "Hello!" followed by a newline, as many times as there are threads in the team created. For a dual-core system, it will output the text twice. (Note: It may also output something like "HeHlellolo", depending on system, because the printing happens in parallel.) At the } , the threads are joined back into one, as if in non-threaded program. This code creates a team of threads, and each thread executes the same code. It prints the text "Hello!" followed by a newline, as many times as there are threads in the team created. For a dual-core system, it will output the text twice. (Note: It may also output something like "HeHlellolo", depending on system, because the printing happens in parallel.) At the, the threads are joined back into one, as if in non-threaded program. Internally, GCC implements this by creating a magic function and moving the associated code into that function, so that all the variables declared within that block become local variables of that function (and thus, locals to each thread).

ICC, on the other hand, uses a mechanism resembling fork() , and does not create a magic function. Both implementations are, of course, valid, and semantically identical. Internally, GCC implements this by creating a magic function and moving the associated code into that function, so that all the variables declared within that block become local variables of that function (and thus, locals to each thread).ICC, on the other hand, uses a mechanism resembling, and does not create a magic function. Both implementations are, of course, valid, and semantically identical. Variables shared from the context are handled transparently, sometimes by passing a reference and sometimes by using register variables which are flushed at the end of the parallel block (or whenever a flush is executed). Parallelism conditionality clause: if if clause in the parallel command, such as: The parallelism can be made conditional by including aclause in the parallel command, such as: extern int parallelism_enabled ; #pragma omp parallel for if(parallelism_enabled) for ( int c = 0 ; c < n ; ++ c ) handle ( c ) ; In this case, if parallelism_enabled evaluates to a zero value, the number of threads in the team that processes the for loop will always be exactly one. In this case, ifevaluates to a zero value, the number of threads in the team that processes theloop will always be exactly one. Variables shared from the context are handled transparently, sometimes by passing a reference and sometimes by using register variables which are flushed at the end of the parallel block (or whenever ais executed).

Loop construct: for

for construct splits the for-loop so that each thread in the current team handles a different portion of the loop. Theconstruct splits the for-loop so that each thread in the current team handles a different portion of the loop. #pragma omp for for ( int n = 0 ; n < 10 ; ++ n ) { printf ( " %d " , n ) ; } printf ( " .

" ) ; This loop will output each number from 0…9 once. However, it may do it in arbitrary order. It may output, for example: 0 5 6 7 1 8 2 3 4 9. Internally, the above loop becomes into code equivalent to this: int this_thread = omp_get_thread_num () , num_threads = omp_get_num_threads () ; int my_start = ( this_thread ) * 10 / num_threads ; int my_end = ( this_thread + 1 ) * 10 / num_threads ; for ( int n = my_start ; n < my_end ; ++ n ) printf ( " %d " , n ) ; So each thread gets a different section of the loop, and they execute their own sections in parallel. This loop will output each number from 0…9 once. However, it may do it in arbitrary order. It may output, for example:Internally, the above loop becomes into code equivalent to this:So each thread gets a different section of the loop, and they execute their own sections in parallel. Note: #pragma omp for only delegates portions of the loop for different threads in the current team. A team is the group of threads executing the program. At program start, the team consists only of a single member: the master thread that runs the program. Note:only delegates portions of the loop for different threads in the current team. A team is the group of threads executing the program. At program start, the team consists only of a single member: the master thread that runs the program. To create a new team of threads, you need to specify the parallel keyword. It can be specified in the surrounding context: #pragma omp parallel { #pragma omp for for ( int n = 0 ; n < 10 ; ++ n ) printf ( " %d " , n ) ; } printf ( " .

" ) ; Equivalent shorthand is to specify it in the pragma itself, as #pragma omp parallel for : #pragma omp parallel for for ( int n = 0 ; n < 10 ; ++ n ) printf ( " %d " , n ) ; printf ( " .

" ) ; You can explicitly specify the number of threads to be created in the team, using the num_threads attribute: #pragma omp parallel num_threads(3) { // This code will be executed by three threads. // Chunks of this loop will be divided amongst // the (three) threads of the current team. #pragma omp for for ( int n = 0 ; n < 10 ; ++ n ) printf ( " %d " , n ) ; } Note that OpenMP also works for C. However, in C, you need to set explicitly the loop variable as private , because C does not allow declaring it in the loop body: int n ; #pragma omp for private(n) for ( n = 0 ; n < 10 ; ++ n ) printf ( " %d " , n ) ; printf ( " .

" ) ; See the "private and shared clauses" section for details. To create a new team of threads, you need to specify thekeyword. It can be specified in the surrounding context:Equivalent shorthand is to specify it in the pragma itself, asYou can explicitly specify the number of threads to be created in the team, using theattribute:Note that OpenMP also works for C. However, in C, you need to set explicitly the loop variable as, because C does not allow declaring it in the loop body:See the "private and shared clauses" section for details. In OpenMP 2.5, the iteration variable in for must be a signed integer variable type. In OpenMP 3.0, it may also be an unsigned integer variable type, a pointer type or a constant-time random access iterator type. In the latter case, std::distance() will be used to determine the number of loop iterations. What are: parallel , for and a team parallel , parallel for and for is as follows: A team is the group of threads that execute currently. At the program beginning, the team consists of a single thread. A parallel construct splits the current thread into a new team of threads for the duration of the next block/statement, after which the team merges back into one.

for divides the work of the for-loop among the threads of the current team. It does not create threads, it only divides the work amongst the threads of the currently executing team.

divides the work of the for-loop among the threads of the current team. It does not create threads, it only divides the work amongst the threads of the currently executing team. parallel for is a shorthand for two commands at once: parallel and for . Parallel creates a new team, and for splits that team to handle different portions of the loop. The difference betweenandis as follows: If your program never contains a parallel construct, there is never more than one thread; the master thread that starts the program and runs it, as in non-threading programs. If your program never contains aconstruct, there is never more than one thread; the master thread that starts the program and runs it, as in non-threading programs. Scheduling #pragma omp for schedule(static) for ( int n = 0 ; n < 10 ; ++ n ) printf ( " %d " , n ) ; printf ( " .

" ) ; There are five scheduling types: static , dynamic , guided , auto , and (since OpenMP 4.0) runtime . In addition, there are three scheduling modifiers (since OpenMP 4.5): monotonic , nonmonotonic , and simd . The scheduling algorithm for the for-loop can explicitly controlled.There are five scheduling types:, and (since OpenMP 4.0). In addition, there are three scheduling modifiers (since OpenMP 4.5):, and static is the default schedule as shown above. Upon entering the loop, each thread independently decides which chunk of the loop they will process. is the default schedule as shown above. Upon entering the loop, each thread independently decides which chunk of the loop they will process. There is also the dynamic schedule: #pragma omp for schedule(dynamic) for ( int n = 0 ; n < 10 ; ++ n ) printf ( " %d " , n ) ; printf ( " .

" ) ; In the dynamic schedule, there is no predictable order in which the loop items are assigned to different threads. Each thread asks the OpenMP runtime library for an iteration number, then handles it, then asks for next, and so on. This is most useful when used in conjunction with the ordered clause, or when the different iterations in the loop may take different time to execute. There is also theschedule:In the dynamic schedule, there is no predictable order in which the loop items are assigned to different threads. Each thread asks the OpenMP runtime library for an iteration number, then handles it, then asks for next, and so on. This is most useful when used in conjunction with theclause, or when the different iterations in the loop may take different time to execute. The chunk size can also be specified to lessen the number of calls to the runtime library: #pragma omp for schedule(dynamic, 3) for ( int n = 0 ; n < 10 ; ++ n ) printf ( " %d " , n ) ; printf ( " .

" ) ; In this example, each thread asks for an iteration number, executes 3 iterations of the loop, then asks for another, and so on. The last chunk may be smaller than 3, though. The chunk size can also be specified to lessen the number of calls to the runtime library:In this example, each thread asks for an iteration number, executes 3 iterations of the loop, then asks for another, and so on. The last chunk may be smaller than 3, though. Internally, the loop above becomes into code equivalent to this (illustration only, do not write code like this): int a , b ; if ( GOMP_loop_dynamic_start ( 0 , 10 , 1 , 3 , & a , & b )) { do { for ( int n = a ; n < b ; ++ n ) printf ( " %d " , n ) ; } while ( GOMP_loop_dynamic_next (& a , & b )) ; } The guided schedule appears to have behavior of static with the shortcomings of static fixed with dynamic -like traits. It is difficult to explain — Internally, the loop above becomes into code equivalent to this (illustration only, do not write code like this):Theschedule appears to have behavior ofwith the shortcomings offixed with-like traits. It is difficult to explain — this example program maybe explains it better than words do. (Requires libSDL to compile.) The "runtime" option means the runtime library chooses one of the scheduling options at runtime at the compiler library's discretion. The "runtime" option means the runtime library chooses one of the scheduling options at runtime at the compiler library's discretion. A scheduling modifier can be added to the clause, e.g.: #pragma omp for schedule(nonmonotonic:dynamic

The modifiers are: monotonic : Each thread executes chunks in an increasing iteration order.

: Each thread executes chunks in an increasing iteration order. nonmonotonic : Each thread executes chunks in an unspecified order.

: Each thread executes chunks in an unspecified order. simd : If the loop is a simd loop, this controls the chunk size for scheduling in a manner that is optimal for the hardware limitations according to how the compiler decides. This modifier is ignored for non-SIMD loops. A scheduling modifier can be added to the clause, e.g.:The modifiers are: The ordered clause The order in which the loop iterations are executed is unspecified, and depends on runtime conditions. However, it is possible to force that certain events within the loop happen in a predicted order, using the ordered clause. However, it is possible to force that certain events within the loop happen in a predicted order, using theclause. #pragma omp for ordered schedule(dynamic) for ( int n = 0 ; n < 100 ; ++ n ) { files [ n ] . compress () ; #pragma omp ordered send ( files [ n ] ) ; } This loop "compresses" 100 files with some files being compressed in parallel, but ensures that the files are "sent" in a strictly sequential order. This loop "compresses" 100 files with some files being compressed in parallel, but ensures that the files are "sent" in a strictly sequential order. If the thread assigned to compress file 7 is done but the file 6 has not yet been sent, the thread will wait before sending, and before starting to compress another file. The ordered clause in the loop guarantees that there always exists one thread that is handling the lowest-numbered unhandled task. If the thread assigned to compress file 7 is done but the file 6 has not yet been sent, the thread will wait before sending, and before starting to compress another file. Theclause in the loop guarantees that there always exists one thread that is handling the lowest-numbered unhandled task. Each file is compressed and sent exactly once, but the compression may happen in parallel. Each file is compressed and sent exactly once, but the compression may happen in parallel. There may only be one ordered block per an ordered loop, no less and no more. In addition, the enclosing for construct must contain the ordered clause. There may only be oneblock per an ordered loop, no less and no more. In addition, the enclosingconstruct must contain theclause. OpenMP 4.5 added some modifiers and clauses to the ordered construct. OpenMP 4.5 added some modifiers and clauses to theconstruct. #pragma omp ordered threads means the same as #pragma omp ordered . It means the threads executing the loop execute the ordered regions sequentially in the order of loop iterations.

means the same as . It means the threads executing the loop execute the regions sequentially in the order of loop iterations. #pragma omp ordered simd can only be used in a for simd loop.

can only be used in a loop. #pragma omp ordered depend(source) and #pragma omp ordered depend( vectorvariable ) also exist. The collapse clause collapse clause to apply the threading to multiple nested iterations. When you have nested loops, you can use theclause to apply the threading to multiple nested iterations. Example: #pragma omp parallel for collapse(2) for ( int y = 0 ; y < 25 ; ++ y ) for ( int x = 0 ; x < 80 ; ++ x ) { tick ( x , y ) ; } Example: The reduction clause reduction clause is a special directive that instructs the compiler to generate code that accumulates values from different loop iterations together in a certain manner. It is discussed in a separate chapter later in this article. Example: Theclause is a special directive that instructs the compiler to generate code that accumulates values from different loop iterations together in a certain manner. It is discussed in a separate chapter later in this article. Example: int sum = 0 ; #pragma omp parallel for reduction(+:sum) for ( int n = 0 ; n < 1000 ; ++ n ) sum += table [ n ]; In OpenMP 2.5, the iteration variable inmust be a signed integer variable type. In OpenMP 3.0, it may also be an unsigned integer variable type, a pointer type or a constant-time random access iterator type. In the latter case,will be used to determine the number of loop iterations.

Sections

sections setting is just for that. Sometimes it is handy to indicate that "this and this can run in parallel". Thesetting is just for that. #pragma omp sections { { Work1 () ; } #pragma omp section { Work2 () ; Work3 () ; } #pragma omp section { Work4 () ; } } This code indicates that any of the tasks Work1 , Work2 + Work3 and Work4 may run in parallel, but that Work2 and Work3 must be run in sequence. Each work is done exactly once. This code indicates that any of the tasksandmay run in parallel, but thatandmust be run in sequence. Each work is done exactly once. As usual, if the compiler ignores the pragmas, the result is still a correctly running program. As usual, if the compiler ignores the pragmas, the result is still a correctly running program. Internally, GCC implements this as a combination of the parallel for and a switch-case construct. Other compilers may implement it differently. Internally, GCC implements this as a combination of the paralleland a switch-case construct. Other compilers may implement it differently. Note: #pragma omp sections only delegates the sections for different threads in the current team. To create a team, you need to specify the parallel keyword either in the surrounding context or in the pragma, as #pragma omp parallel sections .

Example: Note:only delegates the sections for different threads in the current team. To create a team, you need to specify thekeyword either in the surrounding context or in the pragma, asExample: #pragma omp parallel sections // starts a new team { { Work1 () ; } #pragma omp section { Work2 () ; Work3 () ; } #pragma omp section { Work4 () ; } } or or #pragma omp parallel // starts a new team { //Work0(); // this function would be run by all threads. #pragma omp sections // divides the team into sections { // everything herein is run only once. { Work1 () ; } #pragma omp section { Work2 () ; Work3 () ; } #pragma omp section { Work4 () ; } } //Work5(); // this function would be run by all threads. }

The simd construct (OpenMP 4.0+)

OpenMP 4.0 added explicit SIMD parallelism (Single-Instruction, Multiple-Data). SIMD means that multiple calculations will be performed simultaneously by the processor, using special instructions that perform the same calculation to multiple values at once. This is often more efficient than regular instructions that operate on single data values. This is also sometimes called vector parallelism or vector operations (and is in fact the preferred term in OpenACC). There are two use cases for the simd construct. There are two use cases for theconstruct. Firstly, #pragma omp simd can be used to declare that a loop will be utilizing SIMD. float a [ 8 ], b [ 8 ]; ... #pragma omp simd for ( int n = 0 ; n < 8 ; ++ n ) a [ n ] += b [ n ]; Secondly, #pragma omp declare simd can be used to indicate a function or procedure that is explicitly designed to take advantage of SIMD parallelism. The compiler may create multiple versions of the same function that use different parameter passing conventions for different CPU capabilities for SIMD processing. Firstly,can be used to declare that a loop will be utilizing SIMD.Secondly,can be used to indicate a function or procedure that is explicitly designed to take advantage of SIMD parallelism. The compiler may create multiple versions of the same function that use different parameter passing conventions for different CPU capabilities for SIMD processing. #pragma omp declare simd aligned(a,b:16) void add_arrays ( float * __restrict__ a , float * __restrict__ b ) { #pragma omp simd aligned(a,b:16) for ( int n = 0 ; n < 8 ; ++ n ) a [ n ] += b [ n ]; } Without the pragma, the function will use the default non-SIMD-aware ABI, even though the function itself may do calculation using SIMD. Without the pragma, the function will use the default non-SIMD-aware ABI, even though the function itself may do calculation using SIMD. Since compilers of today attempt to do SIMD regardless of OpenMP simd directives, the simd directive can be thought essentially as a directive to the compiler, saying: “Try harder”. The collapse clause collapse clause can be added to bind the SIMDness into multiple nested loops. The example code below will direct the compiler into attempting to generate instructions that calculate 16 values simultaneously, if at all possible. Theclause can be added to bind the SIMDness into multiple nested loops. The example code below will direct the compiler into attempting to generate instructions that calculate 16 values simultaneously, if at all possible. #pragma omp simd collapse(2) for ( int i = 0 ; i < 4 ; ++ i ) for ( int j = 0 ; j < 4 ; ++ j ) a [ j * 4 + i ] += b [ i * 4 + j ]; The reduction clause reduction clause can be used with SIMD just like with parallel loops. Theclause can be used with SIMD just like with parallel loops. int sum = 0 ; #pragma omp simd reduction(+:sum) for ( int n = 0 ; n < 1000 ; ++ n ) sum += table [ n ]; The aligned clause aligned attribute hints the compiler that each element listed is aligned to the given number of bytes. Use this attribute if you are sure that the alignment is guaranteed, and it will increase the performance of the code and make it shorter. Theattribute hints the compiler that each element listed is aligned to the given number of bytes. Use this attribute if you are sure that the alignment is guaranteed, and it will increase the performance of the code and make it shorter. The attribute can be used in both the function declaration, and in the individual SIMD statements. The attribute can be used in both the function declaration, and in the individual SIMD statements. #pragma omp declare simd aligned(a,b:16) void add_arrays ( float * __restrict__ a , float * __restrict__ b ) { #pragma omp simd aligned(a,b:16) for ( int n = 0 ; n < 8 ; ++ n ) a [ n ] += b [ n ]; } The safelen clause restrict keyword in C tells the compiler that it can assume that two pointers will not address the same data (and thus it is safe to change the ordering of reads and writes), the safelen clause in OpenMP provides much fine-grained control over pointer aliasing. While thekeyword in C tells the compiler that it can assume that two pointers will not address the same data (and thus it is safe to change the ordering of reads and writes), theclause in OpenMP provides much fine-grained control over pointer aliasing. In the example code below, the compiler is informed that a[x] and b[y] are independent as long as the difference between x and y is smaller than 4. In reality, the clause controls the upper limit of concurrent loop iterations. It means that only 4 items can be processed concurrently at most. The actual concurrency may be smaller, and depends on the compiler implementation and hardware limits. In the example code below, the compiler is informed thatandare independent as long as the difference between x and y is smaller than 4. In reality, the clause controls the upper limit of concurrent loop iterations. It means that only 4 items can be processed concurrently at most. The actual concurrency may be smaller, and depends on the compiler implementation and hardware limits. #pragma omp declare simd void add_arrays ( float * a , float * b ) { #pragma omp simd aligned(a,b:16) safelen(4) for ( int n = 0 ; n < 8 ; ++ n ) a [ n ] += b [ n ]; } The simdlen clause (OpenMP 4.5+) The simdlen clause can be added to a declare simd construct to limit how many elements of an array are passed in SIMD registers instead of using the normal parameter passing convention. The uniform clause The uniform clause declares one or more arguments to have an invariant value for all concurrent invocations of the function in the execution of a single SIMD loop. The linear clause (OpenMP 4.5+) linear clause is similar to the firstprivate clause discussed later in this article. Theclause is similar to theclause discussed later in this article. Consider this example code: Consider this example code: #include <stdio.h> int b = 10 ; int main () { int array [ 8 ]; #pragma omp simd linear(b:2) for ( int n = 0 ; n < 8 ; ++ n ) array [ n ] = b ; for ( int n = 0 ; n < 8 ; ++ n ) printf ( " %d

" , array [ n ] ) ; } What does this code print? If we ignore the SIMD constructs, we can see it should print the sequence 10,10,10,10,10,10,10,10. What does this code print? If we ignore the SIMD constructs, we can see it should print the sequence 10,10,10,10,10,10,10,10. But, if we enable the OpenMP SIMD construct, the program should now print 10,12,14,16,18,20,22,24. This is because the linear clause tells the compiler, that the value of b inside each iteration of the loop should be a copy of the original value of b before the SIMD construct, plus the loop iteration number, times the linear scale, which is 2 in this case. But, if we enable the OpenMP SIMD construct, the program should now print 10,12,14,16,18,20,22,24. This is because theclause tells the compiler, that the value ofinside each iteration of the loop should be a copy of the original value ofbefore the SIMD construct, plus the loop iteration number, times the linear scale, which is 2 in this case. In essence, it should be equivalent to the following code: int b_original = b ; for ( int n = 0 ; n < 8 ; ++ n ) array [ n ] = b_original + n * 2 ; However, as of GCC version 6.1.0, the linear clause does not seem to be implemented correctly, at least according to my understanding of the specification, so I cannot do more experimentation. In essence, it should be equivalent to the following code:However, as of GCC version 6.1.0, theclause does not seem to be implemented correctly, at least according to my understanding of the specification, so I cannot do more experimentation. The inbranch and notinbranch clauses inbranch clause specifies that the function will always be called from inside a conditional statement of a SIMD loop. The notinbranch clause specifies that the function will never be called from inside a conditional statement of a SIMD loop. Theclause specifies that the function will always be called from inside a conditional statement of a SIMD loop. Theclause specifies that the function will never be called from inside a conditional statement of a SIMD loop. The compiler may use this knowledge to optimize the code. The compiler may use this knowledge to optimize the code. Since compilers of today attempt to do SIMD regardless of OpenMPdirectives, thedirective can be thought essentially as a directive to the compiler, saying: “Try harder”.

The for simd construct (OpenMP 4.0+)

for and simd constructs can be combined, to divide the execution of a loop into multiple threads, and then execute those loop slices in parallel using SIMD. Theandconstructs can be combined, to divide the execution of a loop into multiple threads, and then execute those loop slices in parallel using SIMD. float sum ( float * table ) { float result = 0 ; #pragma omp parallel for simd reduction(+:result) for ( int n = 0 ; n < 1000 ; ++ n ) result += table [ n ]; return result ; }

The task construct (OpenMP 3.0+)

for and sections are too cumbersome, the task construct can be used. This is only supported in OpenMP 3.0 and later. Whenandare too cumbersome, theconstruct can be used. This is only supported in OpenMP 3.0 and later. These examples are from the OpenMP 3.0 manual: These examples are from the OpenMP 3.0 manual: struct node { node * left , * right ; }; extern void process ( node * ) ; void traverse ( node * p ) { if ( p -> left ) #pragma omp task // p is firstprivate by default traverse ( p -> left ) ; if ( p -> right ) #pragma omp task // p is firstprivate by default traverse ( p -> right ) ; process ( p ) ; } In the next example, we force a postorder traversal of the tree by adding a taskwait directive. Now, we can safely assume that the left and right sons have been executed before we process the current node. In the next example, we force a postorder traversal of the tree by adding adirective. Now, we can safely assume that the left and right sons have been executed before we process the current node. struct node { node * left , * right ; }; extern void process ( node * ) ; void postorder_traverse ( node * p ) { if ( p -> left ) #pragma omp task // p is firstprivate by default postorder_traverse ( p -> left ) ; if ( p -> right ) #pragma omp task // p is firstprivate by default postorder_traverse ( p -> right ) ; #pragma omp taskwait process ( p ) ; } The following example demonstrates how to use the task construct to process elements of a linked list in parallel. The pointer p is firstprivate by default on the task construct so it is not necessary to specify it in a firstprivate clause. The following example demonstrates how to use the task construct to process elements of a linked list in parallel. The pointer p is firstprivate by default on the task construct so it is not necessary to specify it in a firstprivate clause. struct node { int data ; node * next ; }; extern void process ( node * ) ; void increment_list_items ( node * head ) { #pragma omp parallel { #pragma omp single { for ( node * p = head ; p ; p = p -> next ) { #pragma omp task process ( p ) ; // p is firstprivate by default } } } }

All OpenMP constructs in C and C++ are indicated with afollowed by parameters, ending in a newline. The pragma usually applies only into the statement immediately following it, except for theandcommands, which do not have associated statements.