Intel OpenMP support

Intel OpenMP Support

The C/C++ and Fortran compilers support version 4.0 of the OpenMP standard. A critical issue for performance is ensuring that threads are bound appropriately to cores within the node. The OpenMP standard settings are available along with environment variables starting with KMP which are specific to the intel Compiler. The full list of environment variables can be found here. We discuss below some of the more useful variables, and refer to the manual for the rest

launch a job with up to nthreads threads. If not specified the maximum number of threads is given which can be as high as 256
bin threads to X sockets, Y cores and Z threads per core. For KNL, unless one uses SNC Cluster mods, X will always be 1
This is a very versatile environment variable and we will discuss it below

It is worth noting that OMP_NUM_THREADS is implited by KMP_PLACE_THREADS.

KMP_AFFINITY can be used to control how thread IDs are assigned to hardware threads. The value for KMP_AFFINITY is a comma separated list of options. Options can be e.g.

Print out a list of thread bindings at startup
Compact thread ordering. Thread IDs increase fastest amongst the hyperthreads of a core, and more slowly amongst the cores
Scattered thread ordering. Thread IDs increase fastest between NUMA domains, then cores and slowest among hyperthreads.
the finest level of allocation is a thread

KMP_PLACE_TREADS and KMP_AFFINITY can be used together. For example to run only 2 threads per core, but still have the thread IDs increase fastest within a core one could use:

# 128 threads = 64 cores x 2 threads per core
export KMP_PLACE_THREADS=1s,64c,2t

# Thread IDs increase fastest within a core-s hyperthreads, print thread bindings at startupexport KMP_AFFINITY=verbose,compact,granularity=thread