You are here

SLURM Batch Scheduler (for Users)

  1. SLURM on Jefferson Lab Farm Clusters
  2. SLURM Commands
  3. SLURM User Accounts
  4. SLURM Environment Variables
  5. Additional useful information

1. SLURM on Jefferson Lab Farm Clusters

SLURM (Simple Linux Utility For Resource Management) is a very powerful open source, fault-tolerant, and highly scalable resource manager and job scheduling system of high availability currently developed by SchedMD. Initially developed for large Linux Clusters at the Lawrence Livermore National Laboratory, SLURM is used extensively on most Top 500 supercomputers around the globe.


2. SLURM Commands

One must log in to the appropriate submit host (see Start Here in the graphics above) in order to run SLURM commands for the appropriate accounts and resources.

  • scontrol and squeue: Job control and monitoring. 
  • sbatch: Batch jobs submission. 
  • salloc: Interactive job sessions are request. 
  • srun: Command to launch a job. 
  • sinfo: Nodes info and cluster status.
  • sacct: Job and job steps accounting data.
  • Useful environment variables are $SLURM_NODELIST and $SLURM_JOBID.

3. SLURM User Accounts

In order to check your “default” SLURM account use the following command:

[@ ~]$ sacctmgr list user name=johndoe
      User   Def Acct       Admin  
----------  ---------- ----------
   johndoe    project       None 

To check “all” the SLURM accounts you are associated with use the following command:

[@ ~]$ sacctmgr list user name=johndoe withassoc       
User   Def Acct     Admin    Cluster    Account  Partition     Share   Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS 
---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- ---------    
johndoe   projectx      None     scicomp   projecta                    1                                                                            normal       normal

4. SLURM Environment Variables

Variable Name Description Example Value PBS/Torque analog
$SLURM_JOB_ID Job ID 5741192 $PBS_JOBID
$SLURM_JOBID Deprecated. Same as SLURM_JOB_ID    
$SLURM_JOB_NAME Job Name myjob $PBS_JOBNAME
$SLURM_SUBMIT_DIR Submit Directory /u/home/username $PBS_O_WORKDIR
$SLURM_JOB_NODELIST Nodes assigned to job farm190[1-5] cat $PBS_NODEFILE
$SLURM_SUBMIT_HOST Host submitted from ifarm1802.jlab.org $PBS_O_HOST
$SLURM_JOB_NUM_NODES Number of nodes allocated to job 2 $PBS_NUM_NODES
$SLURM_CPUS_ON_NODE Number of cores/node 8,3 $PBS_NUM_PPN
$SLURM_NTASKS Total number of cores for job 11 $PBS_NP
$SLURM_NODEID Index to node running on relative to nodes assigned to job 0 $PBS_O_NODENUM
$SLURM_LOCALID Index to core running on within node 4 $PBS_O_VNODENUM
$SLURM_PROCID Index to task relative to job 0 $PBS_O_TASKNUM – 1
$SLURM_ARRAY_TASK_ID Job Array Index 0 $PBS_ARRAYID

 


5. Additional useful information