OpenMP Parallel For

The parallel directive #pragma omp parallel makes the code parallel, that is, it forks the master thread into a number of parallel threads, but it doesn’t actually share out the work.
What we are really after is the parallel for directive, which we call a work-sharing construct. Consider

#include <iostream>
#include <omp.h>

using namespace std;
main (void)
{
	int i;
	int foo;
#pragma omp parallel for
  for(i=1;i<10;i++){
#pragma omp critical
{
foo=omp_get_thread_num();
	  cout << "Loop number: "<< i << " " << "Thread number: " << foo << endl;
}
}
}

The for directive applies to the for loop immediately preceding it. Notice how we don’t have to outline a parallel region with curly braces {} following this directive in contrast to before. This program yields:

[michael@michael lindonslog]$ ./openmp 
Loop number: 1 Thread number: 0
Loop number: 8 Thread number: 3
Loop number: 2 Thread number: 0
Loop number: 3 Thread number: 0
Loop number: 9 Thread number: 3
Loop number: 6 Thread number: 2
Loop number: 4 Thread number: 1
Loop number: 7 Thread number: 2
Loop number: 5 Thread number: 1
[michael@michael lindonslog]$ 

Notice what I said about the order. By default, the loop index i.e. “i” in this context, is made private by the for directive.

At the end of the parallel for loop, there is an implicit barrier where all threads wait until they have all finished. There are however some rules for the parallel for directive

  1. The loop index, i, is incremented by a fixed amount each iteration e.g. i++ or i+=step.
  2. The start and end values must not change during the loop.
  3. There must be no “breaks” in the loop where the code steps out of that code block. Functions are, however, permitted and run as you would expect.
  4. The comparison operators may be < <= => >

There may be times when you want to perform some operation in the order of the iterations. This can be achieved with an ordered directive and an ordered clause. Each thread will wait until the previous iteration has finished it’s ordered section before proceeding with its own.

int main(void){
int i,a[10];
#pragma omp parallel for ordered 
for(i=0;i<10;i++){
a[i]=expensive_function(i);
#pragma omp ordered
printf("Thread ID: %d    Hello World %d\n",omp_get_thread_num(),i);
}
}

Will now print out the Hello Worlds in order. N.B. There is a penalty for this. The threads have to wait until the preceding iteration has finished with its ordered section of code. Only if the expensive_function() in this case were expensive, would this be worthwhile.