The intel compiler provides for several ways of setting thread and core affinity, using the KMP_AFFINITY environment variable:
There are several modes of binding threads, which are described in this article.
However some common options are:
tid=0: core=0, smt_thread=0 tid=1: core=0, smt_thread=1 tid=2: core=0, smt_thread=2 tid=3: core=0, smt_thread=3 tid=4: core=1, smt_thread=0 tid=5: core=1, smt_thread=1 ...
tid=0: core=0, smt_thread=0 tid=1: core=1, smt_thread=0 ... tid=59: core=59, smt_thread=0 tid=60: core =0, smt_thread=1 tid=61: core =1, smt_thread=1 ... tid=120: core=0, smt_thread=2 tid=121: core=1, smt_thread=2 ...
but the IDs still run fastest within a core (like compact). A description can be found here
The list contains the O/S thread IDs to use, and the position in the
list decides the OpenMP thread ID. One can supply a granularity
qualifier to choose, whether the
OpenMP thread gets mapped to the desired OS thread, or whether the
runtime system can migrate it to other threads within the same core as
the desired thread.
A qualifier of granularity=core allows migration within the core of the desired O/S thread. A granularity=thread does not allow such migration
and directs binding solely to the desired O/S thread. Below is an example using granularity=thread
KMP_AFFINITY="explicit,proclist=[0,3,5,9],granularity=thread" tid=0: on H/W thread 0 -- (on core 59) tid=1: on H/W thread 3 -- (on core 1) tid=2: on H/W thread 5 -- (on core 2) tid=3: on H/W thread 9 -- (on core 3)
When one sets granularity=core the thread can be scheduled to any thread within the core containing the H/W thread IDs in the list. An example is below:
# Default granularity is per core KMP_AFFINITY=explicit,proclist=[0,3,5,9] tid=0: on one of h/w threads (0,237,238,239) -- this is core 59 which contains H/W thread 0 tid=1: on one of h/w threads (1,2,3,4) -- this is core 0 which contains H/W thread 3 (second in our list) tid=2: on one of h/w threads (5,6,7,8) -- this is core 1 which contains H/W thread 5 (third in our list) tid=4: on one of h/w threads (9,10,11,12) -- this list is on core 2 which contains H/W thread 9 (last on our list)
export OMP_NUM_THREADS=4 export KMP_AFFINITY="verbose,explicit,proclist=[0,3,5,9],granularity=core"
will now print out a lot of information if one subsequently runs an
OpenMP program. Some sample output (with our comments added) is below:
OMP: Info #156: KMP_AFFINITY: 240 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #159: KMP_AFFINITY: 1 packages x 60 cores/pkg x 4 threads/core (60 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 0 # COMMENT: H/W thread 1 maps to core=0, thread=0 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 0 thread 2 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 3 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 1 thread 1 ... OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 59 thread 0 # COMMENT: H/W thread 0, is mapped to core 59, thread=0 OMP: Info #171: KMP_AFFINITY: OS proc 237 maps to package 0 core 59 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 238 maps to package 0 core 59 thread 2 OMP: Info #171: KMP_AFFINITY: OS proc 239 maps to package 0 core 59 thread 3 OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine # COMMENT: Threads may migrate on core OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,237,238,239} # COMMENT: OMP thread 0 is bound to core 59 OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,2,3,4} # COMMENT: OMP thread 1 is bound to core 0 OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {5,6,7,8} # COMMENT: OMP thread 2 is bound to core 1 OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {9,10,11,12} # COMMENT: OMP thread 3 is bound to core 2
One final consideration, is that on these 60 core systems, core 59 is
reserved for system functions. Explicitly scheduling threads on core 59
can slow down
program execution. In principle, the scatter affinity will schedule threads onto core 59. Your mileage may vary.