CSC 5001– High Performance Systems

MPI - Lab

Hello world

The aim of this exercise is to discover the MPI compilation and deployment chain.

Write the MPi program hello.c. This program prints "Hello from task <rank> / <nb tasks> on <machine>". To do that:

Initialize MPI
Get the MPI rank of the current process
Get the number of MPI ranks
Get the name of the machine running the current process (see man gethostname)
Finalisez MPI

Compile it with mpicc and run it with mpirun on one or several machines
Remainder on mpirun:
- -n <X> specifies the number of MPI ranks
- -f specifies the machines that run the program

You should get something like this:

$ mpirun -f hosts  -np 4 ./hello1
Hello World from task 2 / 4 on b313-03
Hello World from task 3 / 4 on b313-04
Hello World from task 1 / 4 on b313-02
Hello World from task 0 / 4 on b313-01
	    

hello1.c

Modify hello.c so that messages are sorted (i.e. rank 0 prints first, then rank 1, etc.)

To do this, each rank N waits until rank N-1 prints message N-1, then it prints message N, and then notifies process N+1.

You should get something like this:

$ mpirun -f hosts  -np 4 ./hello2
Hello World from task 0 / 4 on b313-01
Hello World from task 1 / 4 on b313-02
Hello World from task 2 / 4 on b313-03
Hello World from task 3 / 4 on b313-04
	    

hello2.c

Parallelizing an application

L'application stencil compute a 2D diffusion (a kind of heat equation). A 2D matrix contains values (eg. the temperature of a point in space), and each iteration computes a 5-points stencil: for each point (i,j), we compute

V_k+1(i,j) = V_k(i-1, j) + V_k(i, j-1) + V_k(i, j) + V_k(i+1, j) + V_k(i, j+1)

The program generates a random number of "hot points", computes several iterations, and write the result in result.dat. This result can be visualize with plot.gp (you may need to install GNUplot):

$ mpirun  -np 1 ./stencil_mpi
Initialization (problem size: 100)
Start computing (50 steps)
STEP 0...
STEP 1...
STEP 2...
[...]
STEP 49...
50 steps in 0.006758 sec (0.000135 sec/step)
dumping the result data in result.dat
$ ./plot.gp
$ evince result.pdf 
    

The goal of this exercise it to parallelize this application with MPI. To simplify, each MPI rank processes a N x N matrix, and computes values from V_k+1(1,1) to V_k+1(N-2,N-2) based on V_k(0,0) to V_{(N-1, N-1)}.

Modify stencil_mpi.c to add MPI communications:

After each iteration, the MPI process r sends line 1 of cur_step to rank r-1, and line N-2 of cur_step to rank r+1
After each iteration, the MPI process r receives the line sent by r-1, stores it in next_step[0], and receives the line sent by r+1 and store it in next_step[N-1]
At the end of the program, each rank sends its matrix to rank 0 so that it can write the result in a write.

Non-blocking communication

Insteaf of computing all the points of the matrix, and then send some lines to the neihghbours, we want to use non-blocking communication in order to "hide" the cost of communication. We first compute the lines to be sent, start communication, and compute the remaining lines, before waiting for the end of the communication.

Modify your program to implement this algorithm.

stencil_mpi_corrige.tgz