You are here

GPU Clusters


GPUs provide enormous floating point capacity and memory bandwidth, and can yield up to 10 times as much performance (science) per dollar, but do require the use of specialized libraries or specialized programming techniques.

Programming model
The user's application runs on the host, and invokes kernels that execute on the GPUs.  These kernels are typically written in NVIDIA's CUDA language, a C-like language that makes the GPU appear to behave like thousands of cores running in parallel, each working on one piece of data (data parallel). Key LQCD kernels ("level 3 routines") have been written in CUDA and wrapped in C and C++ into a single data parallel library that is easily used by community LQCD codes.  Parallel jobs typically use one MPI process per GPU, although other approaches are also used.

Considerable information is available online for writing custom CUDA routines, and then linking them into C or C++ applications.

Developer Tools

  • GCC
  • OpenMPI
  • CUDA

It is recommended that gcc 4.6.3 or greater be used as it supports the generation of AVX instructions for the host.

Setting up the Environment

The following instructions are for the BASH shell. For TCSH users mileage may vary

To set up gcc-4.6.3 (if needed) add the following to your .bashrc or job-script (This is for the BASH Shell).

module load gcc-4.6.3
module load mvapich2-1.8
module load openmpi-1.6.3
module load cuda