The C/C++ and Fortran compilers support version 4.0 of the OpenMP standard. A critical issue for performance is ensuring that threads are bound appropriately to cores within the node. The OpenMP standard settings are available along with environment variables starting with KMP which are specific to the intel Compiler. The full list of environment variables can be found here. We discuss below some of the more useful variables, and refer to the manual for the rest
It is worth noting that OMP_NUM_THREADS is implited by KMP_PLACE_THREADS.
KMP_AFFINITY can be used to control how thread IDs are assigned to hardware threads. The value for KMP_AFFINITY is a comma separated list of options. Options can be e.g.
KMP_PLACE_TREADS and KMP_AFFINITY can be used together. For example to run only 2 threads per core, but still have the thread IDs increase fastest within a core one could use:
# 128 threads = 64 cores x 2 threads per core
export KMP_PLACE_THREADS=1s,64c,2t
# Thread IDs increase fastest within a core-s hyperthreads, print thread bindings at startupexport KMP_AFFINITY=verbose,compact,granularity=thread