SLURM (Simple Linux Utility For Resource Management) is a very powerful open source, fault-tolerant, and highly scalable resource manager and job scheduling system of high availability currently developed by SchedMD. Initially developed for large Linux Clusters at the Lawrence Livermore National Laboratory, SLURM is used extensively on most Top 500 supercomputers around the globe.
2. SLURM Commands
One must log in to the appropriate submit host (see Start Here in the graphics above) in order to run SLURM commands for the appropriate accounts and resources.
scontrol
andsqueue
: Job control and monitoring.sbatch
: Batch jobs submission.salloc
: Interactive job sessions are request.srun
: Command to launch a job.sinfo
: Nodes info and cluster status.sacct
: Job and job steps accounting data.- Useful environment variables are
$SLURM_NODELIST
and$SLURM_JOBID
.
3. SLURM User Accounts
In order to check your “default” SLURM account use the following command:
[@ ~]$ sacctmgr list user name=johndoe User Def Acct Admin ---------- ---------- ---------- johndoe project None
To check “all” the SLURM accounts you are associated with use the following command:
[@ ~]$ sacctmgr list user name=johndoe withassoc User Def Acct Admin Cluster Account Partition Share Priority MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- johndoe projectx None scicomp projecta 1 normal normal
4. SLURM Environment Variables
Variable Name | Description | Example Value | PBS/Torque analog |
$SLURM_JOB_ID | Job ID | 5741192 | $PBS_JOBID |
$SLURM_JOBID | Deprecated. Same as SLURM_JOB_ID | ||
$SLURM_JOB_NAME | Job Name | myjob | $PBS_JOBNAME |
$SLURM_SUBMIT_DIR | Submit Directory | /u/home/username | $PBS_O_WORKDIR |
$SLURM_JOB_NODELIST | Nodes assigned to job | farm190[1-5] | cat $PBS_NODEFILE |
$SLURM_SUBMIT_HOST | Host submitted from | ifarm1802.jlab.org | $PBS_O_HOST |
$SLURM_JOB_NUM_NODES | Number of nodes allocated to job | 2 | $PBS_NUM_NODES |
$SLURM_CPUS_ON_NODE | Number of cores/node | 8,3 | $PBS_NUM_PPN |
$SLURM_NTASKS | Total number of cores for job | 11 | $PBS_NP |
$SLURM_NODEID | Index to node running on relative to nodes assigned to job | 0 | $PBS_O_NODENUM |
$SLURM_LOCALID | Index to core running on within node | 4 | $PBS_O_VNODENUM |
$SLURM_PROCID | Index to task relative to job | 0 | $PBS_O_TASKNUM – 1 |
$SLURM_ARRAY_TASK_ID | Job Array Index | 0 | $PBS_ARRAYID |