This section describes how to submit MPI jobs under the Slurm batch system. Here we assume that MPI jobs are compiled using one of available versions of Intel parallel studio
Intel MPI on KNL. Generally, there are two ways to launch an MPI job under the Slurm: 1)
mpiexec.hydra from Intel parallel studio command line tool suite 2)
srun from Slurm command utility. Even though both methods work pretty well under the Slurm,
srun will allow Slurm to control and clean up all the MPI processes easily in addition to account all MPI processes more accurately.
- Using mpiexec.hydra and ssh or rsh to launch an MPI job.
This is very much similar to the way in launching MPI jobs under old PBS batch system. Inside a job submission script, one has to figure out nodes allocated to the job using scontrol command. The following is a simple sbatch script.
#!/bin/bash -l
#SBATCH -A youraccount
#SBATCH -p phi
#SBATCH -N 4
#SBATCH -t 00:30:00
#SBATCH -J youjobname
#SBATCH --mail-type=END
#SBATCH -C 18p
# create a temp file
tmpfile=`mktemp`
# convert slurm compact form to regular form
/usr/bin/scontrol show hostnames $SLURM_JOB_NODELIST > $tmpfile
source /dist/intel/parallel_studio_xe_2017/parallel_studio_xe_2017.0.035/bin/psxevars.sh intel64
mpiexec.hydra -bootstrap rsh -PSM2 -f $tmpfile -np $numnodes -perhost 1 mpiprog
rm -f $tmpfile
Note: one can replace -bootstrap rsh by -bootstrap ssh if one's ssh keys to all hosts are set up correctly.
The -PSM2 option is to tell Intel MPI to use OPA fabrics.
- Using mpiexec.hydra and bootstrap slurm option to allow slurm to manage an MPI job
This method is very similar to the above. The only difference is the bootstrap option: -bootstrap slurm. This option allows mpi processes to be launched and managed by the Slurm system.
mpiexec.hydra -bootstrap slurm -PSM2 -f $tmpfile -np $numnodes -perhost 1 anothermpiprog
- Using srun command to launch an MPI job
This method is the preferred method according to the official Slurm documentation. In this way, one does not need to find out allocated nodes like previous two methods. The following is a simple script.
#!/bin/bash -l
#SBATCH -A youraccount
#SBATCH -p phi
#SBATCH -N 4
#SBATCH -t 00:30:00
#SBATCH -J mightyrun
#SBATCH --mail-type=END
#SBATCH -C 18p
source /dist/intel/parallel_studio_xe_2017/parallel_studio_xe_2017.0.035/bin/psxevars.sh intel64
# The following 3 variables to make sure opa fabrics is used
export I_MPI_FABRICS_LIST=tmi
export I_MPI_TMI_PROVIDER=psm2
export I_MPI_FALLBACK=0
# Using PMI slurm process management
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
srun --mpi=pmi2 -n 4 fullpath_to_mpi_prog