Batch System (Slurm)

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. It has been used in many supercomputing sites and data centers across the world. JLab farm deployed Slurm in early 2019, but it was hidden from users by Auger system. Users now can access Slurm directly from farm interactive nodes.

Submitting Batch Jobs:
You can submit jobs from one of the interactive nodes or from within a running batch script. Batch jobs are submitted using slurm sbatch command with a valid project account.  You can specify options on the command line, or (recommended) put all of them into your batch script file. See sample scripts in one of the following sections.  In your batch script, please specify at least the following, plus other options useful to your workflow.
  1. account, using -A, --account=<account>
  2. partition, which contains a set of nodes and serves like a queue, using -p, --partition=<partition_names>
  3. resources needed (number of nodes, mode of nodes, etc), using  -C, --constraint=<list> for set of features of desired nodes, -N, --nodes=<num_node> for multiple node jobs, -n, --ntasks=1 --cpus-per-task=<numcores> for a single node job using multiple cores, and --mem-per-cpu=memsize (in unit of megbytes) for memory in megbytes for each core.
  4. wall time (specifying this more tightly than the default will improve your throughput), using  -t, --time=<time>
  • Note: a computing core denotes a virtual processing core (hyper-threading) not a physical core. A typical physical core contains two virtual processing cores.
Right now there are three partitions, general, production and priority, are configured. Please use Scicomp portal Job page for the status of active and recently finished jobs, as well as the most current partition information.

Job Status and Other Information:
Once jobs are submitted into the slurm system, one can use squeue command to check the status of the jobs,  utilize scancel command to cancel one or a list of jobs, and make use of scontrol command to hold jobs. For detailed information about slurm commands, please checkout slurm official documentation.