21g Cluster: Useful tips

Tue, 11/30/2021 - 15:08 — amitoj

Listed below are some useful tips on using the 21g cluster. This is a compilation of tips from current users of this cluster and your mileage may vary. 21g hardware details are listed here. Should you have questions on 21g please use the following support web page.

Posted 3/22/2022

Running the Grid Benchmark_ITT code on a single M150 we get 530GFlop/s at the "comparison point", a bit better performance than a P100. We get 634.5 GFlop/s running the same test on a single RTX-2080 card.

Posted 11/23/2021

To launch (for example) 2 processes each of which uses 4 GPUs the following environment variables were important:

CUDA_VISIBLE_DEVICES either has to be unset or has to be set to 0,1,2,3,4,5,6,7,8.
ROCR_VISIBLE_DEVICES dictates which GPUs are used. (see next post for more details on this)

HIP_VISIBLE_DEVICES does not seem to make any difference.

Posted 11/12/2021

Question: I need to launch a bunch of single GPU jobs on 21g. Is there any way to run multiple instances of those single GPU jobs on a single node?

Answer: There is no way to just reserve a single gpu on 21g. You have to run 8 separate programs (without the srun) with each run configured to "see" a different gpu. That can be accomplished by setting ROCR_VISIBLE_DEVICES for each srun properly as shown by an example below:

!/bin/bash
#SBATCH --nodes=1
#SBATCH -p 21g

export OMP_NUM_THREADS=16
ROCR_VISIBLE_DEVICES=0 ./mybinary &
ROCR_VISIBLE_DEVICES=1 ./mybinary &
ROCR_VISIBLE_DEVICES=2 ./mybinary &
ROCR_VISIBLE_DEVICES=3 ./mybinary &
ROCR_VISIBLE_DEVICES=4 ./mybinary &
ROCR_VISIBLE_DEVICES=5 ./mybinary &
ROCR_VISIBLE_DEVICES=6 ./mybinary &
ROCR_VISIBLE_DEVICES=7 ./mybinary &
wait

Posted 8/25/2021

Here is one simple way to compile a kernel for MI100 on 21g: make sure to use --amdgpu-target=gfx906,gfx908 which is similar to cuda_sm. gfx908 is for MI100, gfx906 is for MI50.

[@qcdi2001]$> module load rocm
[@qcdi2001]$> hipcc --amdgpu-target=gfx906,gfx908 -o helloWorld helloWorld.cpp

Compile flags for hipcc can be obtained by executing hipconfig --cxx

If you need support or have questions please use the following support web page.

Main menu

Navigation

You are here

Main menu

Navigation

User login

You are here

21g Cluster: Useful tips