Keywords
All keyword fields are specified as strings. Where a field can contain a list, the list elements should normally be separated by spaces (or new lines), several elements may appear on a single line and the list may continue over several lines. There are only three mandatory keywords, all others are optional and default (where appropriate) to something reasonable. Blank lines are ignored, lines with '#' in column 1 are treated as comments (and ignored).
Keywords | Default | Function |
---|---|---|
Mandatory | ||
PROJECT | none | Project to which the time should be accounted. See Valid Projects for a list of allowed projects. |
TRACK | none | See Batch Job Tracks for information on allowed tracks. |
COMMAND | none | The command to be run. Can be a system command (like 'ls') or a user script. For user commands specify the full path name. |
Optional | ||
OS | centos77 | Which type OS the jobs should execute on, centos77 or general (for any OS variant). |
NODE_TAG | none | Which type of machine the jobs should execute on: farm14, farm16 or farm18, farm19 |
COMMAND_COPY | none | If present - copy the file specified by the COMMAND keyword to the local disk. Useful if the command is the executable rather than a script. |
JOBNAME | none | Name for the jobs. Used only as a label. Can be any string, but must not include spaces. Name must be 50 chars or less and must start with letter. |
none | List of e-mail addresses to which to send results. The first addressee will receive mails from each individual job running under LSF as well as a summary mail from the server, the remaining addressees will only receive the summary mail. If the keyword is missing (or contains no addresses) the submitting user will receive all mails. | |
TIME | 1440 | Time limit (minutes) for each individual job. The default, is 1440 minutes (24 hours), the maximum value allowed for use with the non debug/test/theory track is 72 hours. |
OPTIONS | none | Command line options associated with the command to be run. |
INPUT_FILES | none | A list of files to be processed. The full path names of the files should be given. The elements of the list should be separated with spaces, they may be all on one line or run over several lines. Each file will result in a job on the farm. The file will be copied to the local disk for processing. Please see This note for details on how the farm can automatically cache the input files for your jobs by specifying "/mss" stub paths as the input. |
SINGLE_JOB | false | Specify this keyword (no parameters) to force a single job to process all the input files. The default is to process each input file in a separate job. See note2 below |
MULTI_JOBS | 1 | If only 1 or no input files are given, run the job this many times. If the input file list contains 2 or more files this variable is ignored. See note2 below |
OTHER_FILES | none | Any other files that (for efficiency) should be copied to the farm node. These files will all be copied to all nodes. This may include an executable program that is for instance run by a user script given in the 'COMMAND' keyword. |
INPUT_DATA | The name of the input file on the farm node. Each file given in the 'INPUT_FILES' list will be copied to this name on the farm. The user program should take it's input from this filename. It is local to the farm and no pathname should be given.
If this keyword is not given then each input file given in the "INPUT_FILES" list will be copied to the local disk on the farm as itself. |
|
OUTPUT_DATA | none | The name of the output data file generated by the program (this is a local file on the farm - no pathname is needed). This key should be given with the OUTPUT_TEMPLATE key in pairs. See note3 below If the filename contains a wildcard character ( * ) then all matching files will be copied. Only one OUTPUT_TEMPLATE should be given and that must be in the form "/directory/path/@OUTPUT_DATA@" See note3 below |
OUTPUT_TEMPLATE | none | The template filename for each output file. Each file with the name given in 'OUTPUT_DATA' will be copied to a file with this name with the file extension given by the input file. If output is going to tape this template should be for the OSM stub files. If the filename part of the template is "@OUTPUT_DATA@" then the created file will have the same name as the output file on the farm node. If the filename part of the template is "*" or "@INPUT_DATA@" then the created file will have the same name as the input file. See the examples for clarity. See note3 below |
CPU | 1 | The number of CPU cores a job needs. |
DISK_SPACE | 4 GB | The amount of disk space that your job will require. Auger will ensure that the machine your job runs on will have at least the specified amount of disk space available. The disk space value you specify must be an integer and must have a unit (MB or GB) after the number with a space in between. (e.g. 15 GB). |
MEMORY | 512 MB | The amount of memory that your job will require. Auger will ensure that the machine your job runs on will have at least the specified amount of memory available. The memory value you specify must be an integer and must have a unit (MB) after the number with a space in between. |
Note:
1. Use of the INPUT_DATA keyword:
In order to copy a data file called /work/experiment/raw/run001.data to the local disk on the farm specify:
INPUT_FILES: /work/experiment/raw/run001.data
and copy a data file called /work/experiment/raw/run001.data to the local disk on the farm as a different name, specify:
INPUT_FILES: /work/experiment/raw/run001.data
INPUT_DATA: fort.11
and locally the file will be named fort.11. This is useful if you want to generate many jobs from a list of input files. One job will be generated for each input file and run on a different farm machine, but all jobs will expect to read data from a file called "fort.11" (or whatever you care to call it).
2. Single vs. multiple jobs.
By default the system will generate one job for each data file you specify with the INPUT_FILES keyword. Use the following to change this behaviour:
Will force all the input files specified to be processed in one single job. All the specified files will be copied to the farm machine local disk before the job starts executing. Remember that there is a limit of 4 GB disk space per job. It probably does not make sense to specify the INPUT_DATA keyword in this case (see note above).
If a single file is (or no files are) specified as INPUT_FILES, run the job this many times - one job per machine. This may be useful for simulations running from the same input file of generated events for example (up to you to make sure that the random number seeds ar different...), or testing.
3. Output files
The keywords OUTPUT_DATA and OUTPUT_TEMPLATE can be used in pairs to control the disposition of output files. Each OUTPUT_DATA keyword given should have a matching OUTPUT_TEMPLATE. The OUTPUT_DATA is the name of the output file locally on the farm machine disk. The corresponding OUTPUT_TEMPLATE directs the copying of the output file to a mass storage device (the silo or a work area). There are several points to note: