The simplest way to exploit MCDRAM is to use it as a cache. This requires the nodes to be
booted into Cache mode, or jobs to be submitted to jobs with the tag cache-quad.
No code changes are needed for this mode of operation, and it is probably the best place
to start for applications which do not fit entirely into the 16GB of MCDRAM per node
If the application and its dynamic working set fit entirely into MCDRAM, one can force
the application to use perform memory allocations from MCDRAM. This can be accomplished
using the numactl utility. In the flat-quadrant cluster modes DDR memory is in
NUMA domain 0, and and MCDRAM is NUMA domain 1. Hence
numactl -m 1 ./executable
will force a process to use MCDRAM.
Caveats with this approach, are that if MCDRAM is full, the job will fail. Also this will force
allocations into MCDRAM, but I am not certain at to the location of automatic/stack memory.
Memkind and HBW_Malloc are memory allocators based on JE Malloc. Calling these libraries
can explicitly allocate memory in the high bandwidth MCDRAM region. The memkind library
and hbw_malloc are available through the Intel Compiler toolchain. For more information see
for example: the Colfax KNL MCDRAM tutorial.