The intel compiler provides for several ways of setting thread and core affinity, using the KMP_AFFINITY environment variable: There are several modes of binding threads, which are described in this article. However some common options are:
tid=0: core=0, smt_thread=0
tid=1: core=0, smt_thread=1
tid=2: core=0, smt_thread=2
tid=3: core=0, smt_thread=3
tid=4: core=1, smt_thread=0
tid=5: core=1, smt_thread=1
...
tid=0: core=0, smt_thread=0
tid=1: core=1, smt_thread=0
...
tid=59: core=59, smt_thread=0
tid=60: core =0, smt_thread=1
tid=61: core =1, smt_thread=1
...
tid=120: core=0, smt_thread=2
tid=121: core=1, smt_thread=2
...
but the IDs still run fastest within a core (like compact).
The list contains the O/S thread IDs to use, and the position in the list decides the OpenMP thread ID. One can supply a granularity qualifier to choose, whether the OpenMP thread gets mapped to the desired OS thread, or whether the runtime system can migrate it to other threads within the same core as the desired thread. A qualifier of granularity=core allows migration within the core of the desired O/S thread. A granularity=thread does not allow such migration and directs binding solely to the desired O/S thread. Below is an example using granularity=thread
KMP_AFFINITY="explicit,proclist=[0,3,5,9],granularity=thread"
tid=0: on H/W thread 0 -- (on core 59)
tid=1: on H/W thread 3 -- (on core 1)
tid=2: on H/W thread 5 -- (on core 2)
tid=3: on H/W thread 9 -- (on core 3)
When one sets granularity=core the thread can be scheduled to any thread within the core containing the H/W thread IDs in the list. An example is below:
# Default granularity is per core KMP_AFFINITY=explicit,proclist=[0,3,5,9] tid=0: on one of h/w threads (0,237,238,239) this is core 59 which contains H/W thread 0 tid=1: on one of h/w threads (1,2,3,4) this is core 0 which contains H/W thread 3 (second in our list) tid=2: on one of h/w threads (5,6,7,8) this is core 1 which contains H/W thread 5 (third in our list) tid=4: on one of h/w threads (9,10,11,12) this list is on core 2 which contains H/W thread 9 (last on our list)
export OMP_NUM_THREADS=4
export KMP_AFFINITY="verbose,explicit,proclist=[0,3,5,9],granularity=core"
will now print out a lot of information if one subsequently runs an OpenMP program. Some sample output (with our comments added) is below:
OMP: Info #156: KMP_AFFINITY: 240 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #159: KMP_AFFINITY: 1 packages x 60 cores/pkg x 4 threads/core (60 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 0 # COMMENT: H/W thread 1 maps to core=0, thread=0
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 0 thread 2
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 3
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 1 thread 1
...
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 59 thread 0 # COMMENT: H/W thread 0, is mapped to core 59, thread=0
OMP: Info #171: KMP_AFFINITY: OS proc 237 maps to package 0 core 59 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 238 maps to package 0 core 59 thread 2
OMP: Info #171: KMP_AFFINITY: OS proc 239 maps to package 0 core 59 thread 3
OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine # COMMENT: Threads may migrate on core
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,237,238,239} # COMMENT: OMP thread 0 is bound to core 59
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1,2,3,4} # COMMENT: OMP thread 1 is bound to core 0
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {5,6,7,8} # COMMENT: OMP thread 2 is bound to core 1
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {9,10,11,12} # COMMENT: OMP thread 3 is bound to core 2
One final consideration, is that on these 60 core systems, core 59 is reserved for system functions. Explicitly scheduling threads on core 59 can slow down program execution. In principle, the scatter affinity will schedule threads onto core 59. Your mileage may vary.