Auger-Slurm

The farm has been gradually migrating to the Slurm workload manager over the past few months, replacing the legacy PBS system. We are planning the final transition of farm nodes for Tuesday, June 4th. For most users this will not be a significant change because the primary submission mechanism for the farm will continue to be Auger or SWIF.

The Slurm farm nodes are currently running CentOS 7.2. They will be updated to 7.6 once the Slurm migration has been completed. This will be a subsequent change announced in advance.

How to use this system?

The auger-slurm commands are installed under /site/scicomp/auger-slurm/bin (the version in /site/bin will be linked to these after June 4) and can be accessed from ifarm or any host where /site is mounted. The jsub and jkill commands have the exactly same interfaces as those in the old auger-pbs,  therefore, the existing jsub scripts will work in the new Auger-Slurm. The slurmHosts replaces the old farmHost and it prints out very different information about the computing nodes. The slurmJobs replaces jobstat in the old pbs system, the notable change is that the output only contains jobs in the slurm system but not includes the jobs are held in the Auger queue, so a newly submitted job will appear in the output of slurmJobs command after auger submit it to slurm when there is less than few hundreds of pending jobs for the user (within few minutes) . The jobinfo is similar to the old jobinfo but outputs more useful information, such as MaxRss, MaxVMSize, AveDiskRead and AvgDiskWrite.

If you use swif to submit the job, add -slurm option when call add-job to direct the jobs to slurm. The -slurm option will become default after June 4 2019.

Where is the .out/.err files?

In the auger-slurm system, the default job .out and .err files are no longer copied to the home directory of a user, instead it will be directly written to the central location under /farm_out/<user>, name pattern will be JOB_NAME-AUGER_JOB_ID-HOSTNAME.out and JOB_NAME-AUGER_JOB_ID-HOSTNAME.err.  There is a software to manage this file system, it will delete all files that created two months ago. This change makes the .out and .err file available to user during the run time of a job,  so a user can easily exam the his/her batch job's progress.

When change the default log files location using the <Stdout> and <Stderr> tags, please make sure the log file directory is created and user has permission to write, otherwise the job will fail with error Job failed due to invalid stdout/stderr.

Are slurm jobs grouped by input files on tape library?

Yes, this is same as farm job in PBS cluster. But to make things simple, we made a assumption, the files requested in PBS cluster and in Slurm cluster are different sets of data, the tape scheduler will order the files requested in these two clusters separately. 

How does Slurm establish the environment for my job?

Different with PBS, Slurm processes are not run under a shell. The ~/.profile and /.bashrc scripts are not executed as part of the process launch. User have to explicitly prepare their environment before starts the work. Please reference Experimental Nuclear Physics Computing document for the detail information on which file to source.  Please remember to run this command (depends on shell) before call "module load" to load necessary module.

                   source /etc/profile.d/modules.sh        # bash,  sh
                   source /etc/profile.d/modules.csh      # csh, tcsh

How to request the memory?

Different with PBS, Slurm use the real memory to schedule and kill the job, so user can reduce their memory requested in the jsub script. For examples, one user need to ask 60GB for 16 core job in pbs, but only 20GB will be enough for the same job running on slurm.

What OS and NODE_TAG to use?

Users can use slurmHosts client to find out the features of each computing node, the OS and node tags are lists inside the feature. Users can use OS and NODE_TAG to request the correct nodes he/she wants the job to land on.

How to request a exclusive node for the job?

There is a new feature added to Auger-Slurm interface (not available in Auger-PBS). When CPU: 0 or <CPU core="0" /> tag is used, Auger will submit the job to slurm in exclusive mode without request any memory and disk resources. The exclusive job will only land on a empty computing nodes and no other job will schedule to the node.

How a farm job email is handled ?

We made a change to the email handling in auger/slurm in later May, now the batch farm will send each user at most one email per hour summarizing the disposition of jobs that completed during that time if email option is specified in the jsub script. This change is being expedited because of ongoing problems with email from jlab.org being rated as junk when a burst of messages are sent.

What are the new features add to Auger-Slurm?

These are the new features added to Auger-Slurm:
  1. Request a while node for a job using "CPU: 0 or <CPU core="0" />
  2. Slurm jsub will take more than one jsub script files when submitting jobs to slurm.
  3. A new optional attribute 'copyOption' (copy or link) is introduced to <Input> xml tag. Reference this Auger document for detail.
  4. In slurm the old PBS production queue is splitted to two partitions (queues), general and production. The jobs submitted from the production users (such as gluex, clas12 etc.)  will route to production queue which has higher priority than general queue.
Project and Track?

Please continue to use same project and track (the current tracks may be mapped to a different set of queues during the test).