GPUs and CUDA 

GPUs and CUDA


Overview of GPUs and CUDA

The HEC currently supports 18 special-purpose compute nodes each hosting two Tesla M2075 GPUs (Graphics Processing Units). Specially written codes can make use of these devices by offloading the computationally intensive parts of their workload onto the highly parallel GPUs. An increasing number of third party applications - particularly scientific codes - offer GPU functionality. Users can also write their own GPU-parallel codes using the CUDA API (Application Programmer's Interface) or using one of several CUDA-enabled libraries for tasks such as linear algebra and FFTs.

Submitting CUDA/GPU jobs

Important Note: The job template below does not include the '#BSUB -L /bin/bash' directive, as this interferes with GPU card setup procedures and will cause your job to fail. Be aware that without this directive your jobs will inherit the environment of your current login shell, rather than starting with a clean shell environment.

GPU-enabled nodes are currently available through the gpuall. NOTE:The queue has been configured with a lower job priority value and jobs are limited to 24 hours of run time. This configuration allows the group contributing to the GPU nodes to meet their project workload deadlines while still allowing spare resources to be used by others; at some points in the year the GPUs may be busy with higher priority jobs from this group.

The following example will run the GPU-enabled nbody benchmark which comes installed with the CUDA environment:


#BSUB -q gpuall
#BSUB -a gpuexcl_p
#BSUB -J nbody

#BSUB -oo nbody.out
#BSUB -eo nbody.err

. /etc/profile
module add cuda

nbody -benchmark -n=100000

Note the BSUB directives: -q gpuall ensures that the job is submitted to a queue which contains only GPU-enabled nodes - CUDA jobs submitted to non-GPU nodes will fail at runtime. The -a gpuexcl_p BSUB directive ensures that a single GPU card is reserved and available to the job's environment.

The nbody benchmark can be run across a number of cards on the same host, as specified by the -numdevices option. Here is a job template for a 2-card job:


#BSUB -q gpuall
#BSUB -a gpuexcl2_p
#BSUB -J nbody

#BSUB -oo nbody.out
#BSUB -eo nbody.err

. /etc/profile
module add cuda

nbody -benchmark -n=100000 -numdevices=2

The BSUB directive -a gpuexcl2_p ensures that two GPU cards will be reserved and available to the job's environment. Note that different applications will have different mechanisms for using more than one card. Some applications will have additional arguments, such as the -numdevices= argument for nbody, while others may simply use all the cards they detect. Please refer to your application's own documentation for details.

Programming with CUDA

CUDA tools and libraries are available via the cuda module:


module add cuda

The CUDA compiler is nvcc, which will be able to handle the CUDA-extensions to C necessary to compile CUDA kernels. CUDA codes can be compiled on the login node elysium once the cuda module is loaded.

The cuda module also makes a number of CUDA-enabled libraries available via the $CUDA_LIBDIR directory. These include cublas for CUDA-enabled BLAS routines and cufft for CUDA-enabled FFT functions. See below for details on the libraries.

The header file directory containing the include files is available as $CUDA_INCDIR

Further Reading