Batch System

The batch system provides a large computing resource to the JLab community.  It is a high throughput system, and not primarily an interactive system, although there are interactive nodes.  It is tuned to get as much work done per day as possible. This sometimes means compromising turn around time for a single user so as to achieve highest overall throughput.  The batch queuing system is configured to achieve some level of balance among all the competing demands upon the system, and is re-tuned on major changes in configuration or in science programs (e.g. installation of new hardware, start-up of a hall, unexpected physics opportunity, etc.).

Auger

Auger is the software that manages the batch farm. It utilizes PBS (Portable Batch System) as the underlying batch queuing system. PBS is an open source resource manager providing control over batch jobs and computing nodes. Auger is a front end for the batch system that enables users to submit jobs to the batch system, and the ability to stage input files to and from the compute nodes. The auger commands should be in any CUE user's path as /site/bin.

In addition to PBS queues and accounts, Auger introduces the notion of "tracks" such as "debug", "analysis" and "simulation".  Track is used for internal computing usage statistics and it also map to a batch system queue.

There are three queues, prod64, priority and longJob. You can view their status at https://scicompnew.jlab.org/scicomp/#/operations/nodes.

prod64 Queue

Jobs with tracks such as analysis, simulation, reconstruction and one_pass, will go to the prod64 queue. The default and maximum walltime are 1 day (24 hours) and 3 days (72 hours)..

priority Queue

Jobs submitted using debug and test tracks will go to the priority queue. The maximum cpu time is 4 hours for the priority queue.  The default and maximum walltime are 4 and 24 hours.There are a maximum of 32 concurrent priority jobs that can run at one time. There is a limit of 16 queued jobs at any one time per user.

longJob Queue

Jobs submitted using the theory track will go to longJob queue. The maximum walltime for this queue is 7 days. At this moment, a maximum of 50 jobs overall and 20 jobs per user are configured for the longJob queue. Use this queue only when the job must run longer than 3 days.