Using OpenMP on the HEC
What is OpenMP?
OpenMP is a set of compiler directives, library routines and environment variables that can be used to specify shared-memory parallelism in C, C++ and Fortran codes.
OpenMP compiler directives can be inserted into source code to indicate to the compiler sections that can be readily parallelised, which allows a programmer to highlight the core sections of code that can benefit from parallelism. When compiled with a special compiler flag, the compiler will create a multi-threaded version of the application that can automatically distribute the highlighted parallel sections of code to different CPUs on the same node. On the current HEC, this offers up to eight-way parallelism.
It should be stressed that not all codes will benefit from such attempts at parallelism, and not all sections of code can be parallelised. Users should test serial and parallel versions of their code to ensure that the parallel version is making good use of the additional processors.
Programming and Compiling with OpenMP
A detailed explanation of the OpenMP compiler directives can found in the guides for both the PGI and Intel compiler suites (see the Further Reading section)
To compile OpenMP code using the PGI compiler, compile with your normal set of PGI compiler flags, and add the compiler argument -mp.
To compile OpenMP code using the Intel compiler, compile with your normal set of Intel compiler flags, and set the -openmp
Don't forget to make sure that the correct module for your preferred compiler suite has already been added to your environment.
Submitting OpenMP batch jobs
The following job template will run a 8-processor version of the program my_program compiled with PGI compilers:
#BSUB -n 8
#BSUB -R 'span[ptile=8]'
#BSUB -o myjob.%J.out
#BSUB -e myjob.%J.err
#BSUB -J myjob
my_program < my_program.input
The additional scheduler directives above serve the same purpose as for MPI jobs; they specify the number of processes required, how they should be tiled (for OpenMP, all on a single node), and that the job should have exclusive access to a compute node.
The number of processors OpenMP code should use are set by the environment variable OMP_NUM_THREADS. Programs compiled for OpenMP will automatically detect this environment variable, and split themselves across the specified number of processors. Generally a choice of 8 processors is best, corresponding to the maximum number of processors on a single compute node.
To submit an OpenMP job, simply use the regular bsub command - the job script contains all the instructions required to inform the scheduler that this is a parallel job.
- The Official OpenMP home page
- Online Compiler Guides