You are here

Running Knights Landing Jobs



Setting up Shared Libraries

Ensure that shared libraries for OpenMP, MPI, C/C++ etc are set up as described in the Compilers and Tools section

Affinity, Binding and MPI Pinning

Appropriate thread affinity, and MPI process pinning is important for high performance. These factors are often intertwined. We will consider first OpenMP thread affinity (as if we were running a non-MPI job) and then consider MPI.

OpenMP Thread Affinity

There are two sets of environment variables controlling OpenMP. One set are Intel specific and the others are part of the OpenMP 4.0 standard. For further reference please see Fermilab pages on Intel OpenMP. Common useful variables are

 OMP_NUM_THREADS
The number of OpenMP threads

 KMP_PLACE_THREADS=1s,Xc,Yt

bind threads to X cores with Y threads per core

 KMP_AFFINITY

a very general option,see below

In particular KMP_AFFINITY is a general option with several parts, which may be combined. Useful options are

 verbose
Display thread assignments at startup

 compact

Compact Thread IDs (threads run fastest amongst SIMT threads, slowest amongst cores and sockets)

 scatter

Scatter the thread IDs (threads run fastest among sockets (SNC-4 mode), cores and slowest within cores

 granularity=thread

Treat available hyperthreads as the finest level of granularity when binding

The KMP_AFFINITY can be used in conjunction with KMP_PLACE_THREADS for example to enable a compact ordering, but with only 2 threads per core (bash shell)

   # 64 cores and 128 threads so 2 threads per core  

   export KMP_PLACE_THREADS=1s,64c,2t     

   # compact ordering within the core, reporting thread bindings  

   export KMP_AFFINITY=verbose,compact,granularity=thread  

   # set number of threads  

   export OMP_NUM_THREADS=128  

   # run the job  

   ./my_job.sh