Listed below are some useful tips on using the 21g cluster. This is a compilation of tips from current users of this cluster and your mileage may vary. 21g hardware details are listed here. Should you have questions on 21g please use the following support web page.
Posted 3/22/2022
Running the Grid Benchmark_ITT code on a single M150 we get 530GFlop/s at the "comparison point", a bit better performance than a P100. We get 634.5 GFlop/s running the same test on a single RTX-2080 card.
Posted 11/23/2021
To launch (for example) 2 processes each of which uses 4 GPUs the following environment variables were important:
CUDA_VISIBLE_DEVICES either has to be unset or has to be set to 0,1,2,3,4,5,6,7,8.
ROCR_VISIBLE_DEVICES dictates which GPUs are used. (see next post for more details on this)
HIP_VISIBLE_DEVICES does not seem to make any difference.
Posted 11/12/2021
Question: I need to launch a bunch of single GPU jobs on 21g. Is there any way to run multiple instances of those single GPU jobs on a single node?
Answer: There is no way to just reserve a single gpu on 21g. You have to run 8 separate programs (without the srun) with each run configured to "see" a different gpu. That can be accomplished by setting ROCR_VISIBLE_DEVICES for each srun properly as shown by an example below:
!/bin/bash
#SBATCH --nodes=1
#SBATCH -p 21g
export OMP_NUM_THREADS=16
ROCR_VISIBLE_DEVICES=0 ./mybinary &
ROCR_VISIBLE_DEVICES=1 ./mybinary &
ROCR_VISIBLE_DEVICES=2 ./mybinary &
ROCR_VISIBLE_DEVICES=3 ./mybinary &
ROCR_VISIBLE_DEVICES=4 ./mybinary &
ROCR_VISIBLE_DEVICES=5 ./mybinary &
ROCR_VISIBLE_DEVICES=6 ./mybinary &
ROCR_VISIBLE_DEVICES=7 ./mybinary &
wait
Posted 8/25/2021
Here is one simple way to compile a kernel for MI100 on 21g: make sure to use --amdgpu-target=gfx906,gfx908 which is similar to cuda_sm. gfx908 is for MI100, gfx906 is for MI50.
[@qcdi2001]$> module load rocm
[@qcdi2001]$> hipcc --amdgpu-target=gfx906,gfx908 -o helloWorld helloWorld.cpp
Compile flags for hipcc can be obtained by executing hipconfig --cxx
If you need support or have questions please use the following support web page.