Using OpenMP on the HPC
What is OpenMP?
OpenMP is a set of compiler directives, library routines and environment variables that can be used to specify shared-memory parallelism in C, C++ and Fortran codes.
The OpenMP compiler directives can be used to indicate to the compiler sections of code that can be readily parallelised, which allows a programmer to highlight the core sections of code that can benefit from parallelism. When compiled with a special compiler flag, the compiler will create a multi-threaded version of the application that can automatically distribute the highlighted parallel sections of code to different CPUs on the same node. On the current HPC, this offers up to four-way parallelism.
It should be stressed that not all codes will benefit from such attempts at parallelism, and not all sections of code can be parallelised. Users should test serial and parallel versions of their code to ensure that the parallel version is making good use of the additional CPUs, as OpenMP jobs will automatically reserve all 4 job slots on a node.
Programming and Compiling with OpenMP
An detailed explanation of the OpenMP compiler directives can found in the guides for both the PGI and Intel compiler suites (see the Further Reading section)
To compile OpenMP code using the PGI compiler, compile with your normal set of PGI compiler flags, and add the compiler argument -mp.
To compile OpenMP code using the PGI compiler, compile with your normal set of Intel compiler flags, and set the -openmp
Don't forget to make sure that the correct module for your preferred compiler suite has already been added to your environment.
Submitting OpenMP jobs
The following job template will run a 4-processor version of the program my_program compiled with PGI compilers:
#$ -o $HOME/my_program_directory/my_program.stdout
#$ -e $HOME/my_program_directory/my_program.stderr
#$ -S /bin/bash
. /etc/profile
export OMP_NUM_THREADS=4
cd my_program_directory
time my_program < my_program.input
The following job template will run a 4-processor version of the program my_program compiled with Intel compilers:
#$ -o $HOME/my_program_directory/my_program.stdout
#$ -e $HOME/my_program_directory/my_program.stderr
#$ -S /bin/bash
. /etc/profile
module add intel
export NCPUS=4
cd my_program_directory
time my_program < my_program.input
The number of processors OpenMP code should use are set by the NCPUS (for Intel) and OMP_NUM_THREADS (for PGI) environment variables respectively. Programs compiled for OpenMP will automatically detect these environment variables, and split themselves across the specified number of processors. Generally a choice of 4 processors is best, corresponding to the maximum number of processors on a single execution node.
To submit an OpenMP job, use the following command:
qsub -pe smp 1 myjob.com
Note: The number of smp slots should always be 1, no matter what values are selected for NCPUS and OMP_NUM_THREADS. The HPC schedules all parallel jobs on a per node basis - your parallel job will be assigned an execution node to itself, and no other jobs will run on that node for the duration.
Further Reading
- The Official OpenMP home page
- Online Compiler Guides
- For all PGI
compilers
- See the Users Guide for details on OpenMP
- For Intel
C/C++ and
Fortran compilers
- See the Optimisation Guides for details on OpenMP
- For all PGI
compilers
©Lancaster University Computer User Agreement Privacy Statement
©Lancaster University ISS Governance Computer User Agreement Privacy & Cookies Notice