Ensure that shared libraries for OpenMP, MPI, C/C++ etc are set up as described in the Compilers and Tools section
Appropriate thread affinity, and MPI process pinning is important for high performance. These factors are often intertwined. We will consider first OpenMP thread affinity (as if we were running a non-MPI job) and then consider MPI.
There are two sets of environment variables controlling OpenMP. One set are Intel specific and the others are part of the OpenMP 4.0 standard. For further reference please see Fermilab pages on Intel OpenMP. Common useful variables are
In particular KMP_AFFINITY is a general option with several parts, which may be combined. Useful options are
The KMP_AFFINITY can be used in conjunction with KMP_PLACE_THREADS for example to enable a compact ordering, but with only 2 threads per core (bash shell)
# 64 cores and 128 threads so 2 threads per core
export KMP_PLACE_THREADS=1s,64c,2t
# compact ordering within the core, reporting thread bindings
export KMP_AFFINITY=verbose,compact,granularity=thread
# set number of threads
export OMP_NUM_THREADS=128
# run the job
./my_job.sh