Slurm Accounts:
A slurm
account is used to charge farm CPU time to a correct computing project. Every user is a member of at least one slurm account if the user works with the slurm system . The account for a job must be included in a job submission script using "
#SBATCH --account=account_name". The list of accounts and the users who are in those accounts can be found in this web page
Slurm Accounts.
If you have not submitted jobs to slurm before, you may need to submit a request (or contact your hall computing coordinator) to have your username associated with a slurm account. Some users are associated with more than one account. This happens, for example, when a user works on experiments in more than one hall.
Slurm Partitions:
A partition in slurm is a collection of machines with common capabilities for running jobs and can be thought of as a queue since job is submitted to a partition. There are three major partitions available:
production, priority and
ifarm. The production partition, which is the default, should be used for most jobs. The priority queue is for quick turnaround of short running jobs. The ifarm partition is used for interactive access for one or more cores (
Note:
an interactive session may not be available instantly because an interactive slurm job competes with other interactive jobs for available interactive computing resources) . A user job can request a set of nodes in a particular partition by specifying the name of a partition in its job sbatch submission script with
"#SBATCH --partition=partition_name" option. For most updated partition configuration, please refer to
Slurm Partition Info.
Resources:
The farm is not completely uniform. Nodes vary in the amount of memory, the amount of local disk space, and other hardware capabilities such as GPU accelerators. To accommodate matching jobs that need specific resources to the farm nodes with those resources, slurm supports specifying features.
Farm nodes have features assigned, to specify which of these features are required using "
#SBATCH --constraint=feature1,feature2". Right now these features are available to use: general, centos77, gpu,T4, TitanRTX, xeon, amd, farm19, farm18, farm16, and farm14. This option is used to select OS (such as centos77) of the computing node, hardware type (such as farm18, farm16, farm14) and gpu node as well as gpu type (such as gpu,T4, TitanRTX). Using command /
site/bin/slurmHosts to display the most updated available resources info of every single node in the farm Slurm cluster. The node features can also be found at
Slurm Node Features page.
Memory Request:
Request physical memory per core for a job by specifying "#SBATCH --mem-per-cpu=xyzM" where M stands for megabytes. The previous statement of a job submission explicitly requests the xyz megabytes physical memory per core for the submitted job. If the option is not specified, the default memory per core of the partition to which a job is submitted, which is very small, will be used for the job.
Walltime Request:
Request a proper running time for a job by specifying "
#SBATCH --time=<time>" where <time> can be in the form of "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". Without specifying a time limit for a job, the default time limit of the partition to which the job is submitted will be used for the job. Using the default time limit of a partition in a job submission script will likely delay the start of the job. The default time limit for the partitions can be found at
Slurm Partition Info. If the specified time limit of a job is larger than the default time limit of the partition to which the job is submitted, the job will never run and will be in a pending state.
Comments
How does one "submit a request" for a slurm account?
HI - how do I submit a request for a slurm account? My hall computing coordinator (Bob Michaels) does not know.
Thanks!
David Armstrong (armd@jlab.org)
Update url for the web links
Update url for the web links