The Auger batch system provides a large computing resource to the JLab community. It is a high throughput, not primarily an interactive system, although there are interactive nodes. It is tuned to get as much work done per day as possible. This sometimes means compromising turn around time for a single user so as to achieve highest overall throughput. The batch queuing system is configured to achieve some level of balance among all the competing demands upon the system, and is re-tuned on major changes in configuration or in science programs (e.g. installation of new hardware, start-up of a hall, unexpected physics opportunity, etc.).
Auger is the software that manages the batch farm. It utilizes SLURM as the underlying batch queuing system. SLURM is an open source resource manager providing control over batch jobs and computing nodes. Auger is a front end for the SLURM batch system that enables users to submit jobs to the SLURM batch system, and the ability to stage input files to and from the compute nodes. The auger commands should be in any CUE user's path as /site/bin.
In addition to SLURM queues and accounts, Auger introduces the notion of "tracks" such as "debug", "analysis", reconstruction, and "simulation". Track is used for internal computing usage statistics and it also map to a batch system queue/partition.
There are many partitions, production, priority and ifarm are few examples. You can view their up to date status at https://scicomp.jlab.org/scicomp/slurmJob/slurmInfo.
Detailed information about each partition is explained below:
Partition Name | Job types | Max Concurrent Running Jobs | Max Queued Jobs per user | Max CPU Time (hours) | Default Walltime (hours) | Max Walltime (hours) |
---|---|---|---|---|---|---|
production | Analysis, simulation, reconstruction and one_pass jobs. | - | - | - | 24 | 72 |
priority | Debug and test track jobs. | 32 | 16 | 4 | 4 | 24 |
ifarm | interactive jobs for testing and debugging code. | - | - | - | 4 | 24 |