Intel OpenMP Support
The C/C++ and Fortran compilers support version 4.0 of the OpenMP standard. A critical issue for performance is ensuring that threads are bound appropriately to cores within the node. The OpenMP standard settings are available along with environment variables starting with KMP which are specific to the Intel Compiler. The full list of environment variables can be found here. We discuss below some of the more useful variables, and refer to the manual for the rest.
OMP_NUM_THREADS= nthreads
launch a job with up to nthreads threads. If not specified the maximum number of threads is given which can be as high as 256
KMP_PLACE_THREADS=Xs,Yc,Zt
bin threads to X sockets, Y cores and Z threads per core. For KNL, unless one uses SNC Cluster mods, X will always be 1
KMP_AFFINITY=optionlist
This is a very versatile environment variable and we will discuss it below
It is worth noting that OMP_NUM_THREADS is implited by KMP_PLACE_THREADS. KMP_AFFINITY can be used to control how thread IDs are assigned to hardware threads. The value for KMP_AFFINITY is a comma separated list of options. Options can be e.g.
verbose
Print out a list of thread bindings at startup
compact
Compact thread ordering. Thread IDs increase fastest amongst the hyperthreads of a core, and more slowly amongst the cores
scatter
Scattered thread ordering. Thread IDs increase fastest between NUMA domains, then cores and slowest among hyperthreads.
granularity=thread
the finest level of allocation is a thread.
KMP_PLACE_TREADS and KMP_AFFINITY can be used together. For example to run only 2 threads per core, but still have the thread IDs increase fastest within a core one could use:
# 128 threads = 64 cores x 2 threads per core
export KMP_PLACE_THREADS=1s,64c,2t
# Thread IDs increase fastest within a cores hyperthreads, print thread bindings at startup
export KMP_AFFINITY=verbose,compact,granularity=thread