FAQ

How does Auger handle different types of input file?

There are two different ways that a file is staged to a farm node. 
  1. If an input file is from tape library (/mss/xxx... or <Input src="mss:/mss/xxx..."), Auger will jcache the file first, then makes a link in PBS WORKING DIR to the file in /cache/mss/. In this case, the input file will not be counted as the disk allocation of the job.
  2. If an input file is on /home, /volatile, or /work, Auger will copy it to PBS WORKING DIR on the local disk of a node. In this case the input file does count as part of disk usage of the job. Please request enough disk space (to hold input and output files) when submitting a job.

What is the best practice to submit large number of IO-intensive jobs?

If a farm job reads/writes input/output files intensively, it is recommended to access the files stored on the local disk of a node. If an input file is from /home, /work or /volatile, please use jsub INPUT tag so that Auger will copy the file to the farm node. But if the input file is from the tape system and the job will access this file repeatedly, it is recommended to copy the file to the assigned farm node. One can easily accomplish this by adding these two lines of code at the beginning of the job: rm file-name; cp /cache/mss/.../file-name.  Otherwise, the job will read and write to and from this file directly using Lustre. If the I/O is small but intensive, the I/O operations will slowdown the Lustre file system dramatically.  This is especially important for running lots of copies of one type of job since the local disks in aggregate have 5 times the bandwidth and performance as the entire Lustre system.

Job memory - how much does my job need? Can I just ask for more than enough?

Users can increase system throughput by keep their jobs' memory footprint at or below 1 GB, since many compute nodes are memory lean at 0.7 GB per job slot, with a minority as large as 2 GB per job slot (single thread). To check usage, see your '$HOME/.farm_out/JOB_NAME.JOB_ID.out' results, which contains both the job's memory request and its actual usage. Asking for much more than is actually needed reserves the memory and keeps other jobs from running on the node even though the memory is actually available.

Use multi-threaded code rather than multiple single threaded jobs to efficiently use available memory, and allows jobs with a footprint as large as the entire node's memory (currently the largest is 32 GB).

Do I have to clean the .farm_out directory periodically?

Yes, Auger will always put a file named JOB_NAME.JOB_ID.out in the .farm_out directory under user's home directory after a job finishes. Although these files are small (normal size is a few KB and maximum size is 2MB), after thousands and tens of  thousands jobs these .out files will add up to a significant size, which may cause some unexpected problems (for example, a user may exceed his/her quota). Cleaning .farm_out directory once a month or after a massive of job is recommended.

Why is my job killed by the PBS server?

PBS sever will kill a job if it exceeds it's requested resource limit. Most common cases are exceeded time limit (walltime) and exceeded memory usage over its requested memory size. In these cases the following error message will appear in the file $HOME/.farm_out/JOBNAME.JOBID.err file.

 PBS: job killed: walltime xxx exceeded limit xxxx
The default walltime limit is set to 24 hours and maximum time limit is 72 hours (3 days). The TIME option must be used if the job will run more than one day. The default memory allocation is 256MB. If your job needs more memory, you have to use MEMORY option in the jsub file. Otherwise it will be killed by the pbs server and you can find this message in the $HOME/.farm_out/JOBNAME.JOBID.err file.
PBS: job killed: vmem xxxxx exceeded limit xxxxxxx

How do I submit a multi-threaded job ?

Using CPU: X tag or <CPU core="X"/> to request X number of cores in a job. See Auger Examples for more information.

How do I run a java program in a farm job?

Q: When I use the default java (64bit) to run a java program, it works on ifarm but fails in a farm job with this error:
    Error occurred during initialization of VM    
Could not reserve enough space for object heap
A: The 64bit java allocates a lot of virtual memory (>10GB) for its java VM. A batch job has a small memory size limit because of running inside a batch system PBS. Thus, please use the 32bit java under /apps with -Xmx option to specify a maximum heap size when submitting a java job. For example, to specify 512MB:
 /apps/scicomp/java/jdk1.7/bin/java -Xmx512m -version 
/apps/scicomp/java/jdk1.8/bin/java -Xmx512m -version
In addition, please add "MEMORY xxx MB" tag in your jsub script to request enough memory (200 MB more than VM maximum heap size requested by using -Xmx option).

Alternatively, use multi-threading and you can then run a large java virtual machine.