Batch System (Auger)

The batch system provides a large computing resource to the JLab community.  It is a high throughput system, and not primarily an interactive system, although there are interactive nodes.  It is tuned to get as much work done per day as possible. This sometimes means compromising turn around time for a single user so as to achieve highest overall throughput.  The batch queuing system is configured to achieve some level of balance among all the competing demands upon the system, and is re-tuned on major changes in configuration or in science programs (e.g. installation of new hardware, start-up of a hall, unexpected physics opportunity, etc.).

Auger

Auger is the software that manages the batch farm. It utilizes Slurm as the underlying batch queuing system. Slurm is an open source resource manager providing control over batch jobs and computing nodes. Auger is a front end for the batch system that enables users to submit jobs to the batch system, and the ability to stage input files to and from the compute nodes. The auger commands should be in any CUE user's path as /site/bin.

In addition to Slurm queues and accounts, Auger introduces the notion of "tracks" such as "debug", "analysis" and "simulation".  Track is used for internal computing usage statistics and it also map to a batch system queue/partition.

There are three partitions, production, priority and ifarm. You can view their status at https://scicomp.jlab.org/scicomp/#/slurmFarmJobs/queueInfo.

production Partition

Jobs with tracks such as analysis, simulation, reconstruction and one_pass, will go to the production queue. The default and maximum walltime are 1 day (24 hours) and 3 days (72 hours).

priority Partition

Jobs submitted using debug and test tracks will go to the priority queue. The maximum cpu time is 4 hours for the priority partition.  The default and maximum walltime are 4 and 24 hours.There are a maximum of 32 concurrent priority jobs that can run at one time. There is a limit of 16 queued jobs at any one time per user.

ifarm Partition

This partition is for interactive job to use, user can allocate a node for testing and debugging code in a interactive mode. The default and maximum walltime are 4 hours and 1 days (24 hours).