CSC 5001– High Performance Systems

OpenMP - Labs

Hello world

Write a sequential C program that prints "Hello world".

Compile and execute it.

Modify your program in order to print the message "Hello World" by each thread attending the computation.

How many threads are launched by default?
Fix the number of threads to launch at 4.
Fill the mesage with the id of the thread actually doing the printing.

Check that the bottom-up compatibilty of your program with its initial sequential version by compiling it withour the OpenMP option. Modify your program to protect the OpenMP library calls if necessary.

Variable scopes

Play with the private et firstprivate clauses:

Declare and initialize two integers.
In a parallel region with one private variable and one firstprivate variable, print each of the sum of the current thread id with each of thoses variables.

Play with the shared clause :

In the sequential region, declare and initilaize a third variable.
In the parallel region with this third variable declared as shared, print it and verify its value.

Directive for

Loop parallelization

Implement a sequential SAXPY program in C where a scalar is added to each cell of a array.
Parallelize it thanks to OpenMP.

Scheduling

Thanks to a omp_get_schedule function call, determine which is the scheduling policy of your system by default.
Play with the others available scheduling policies by printing the thread identifiers and the loop iteration number. Observe their distribution.

Synchronization

Dupplicate the loop with a static scheduling and a modifier fixed to array size / 2.
Print a message "After loop" after this second parallel section.
Fix your OMP_NUM_THREADS environment variable to 4 and observe the scheduling with and without the nowait clause.

Reduction in a parallel loop

In a new loop, compute the sum of the array elements by accumulate it in a variable sum.
Parallelize it thanks to OpenMP.
Compare the sequetial and parallel time span by using the function double omp_get_wtime(void).

Directive Critical

Even if it is inefficient, implement the same reduction of the elements of the same array with a critical directive.

Stencil - Homogeneous computation load

A stencil consists of passing a filter over the elements of a data structure in order to update their value according their neighboring elements. In this exercice, the objective is to parallelize a 5-point stencil that updates each element of a matrix with the sum of 5 elements : the current element, the element to its left, the element to its right, the element above and the element below.

Starting from this sequential code, implement a parallel OpenMP version.

Evaluate the performance obtained by varying the running thread number and compare it with the performance of the sequential version.

Experiment the collapse clause to refine the grain of your your parallelization.

What do you notice in terms of performance? Play with different scheduling strategies to achieve a efficient parallelism.

In order to optimize computational load and access to cached cached data, modify the algorithm by creating calculation tiles.