Text Command File

Wed, 10/02/2013 - 14:36 — ychen

Keywords

All keyword fields are specified as strings. Where a field can contain a list, the list elements should normally be separated by spaces (or new lines), several elements may appear on a single line and the list may continue over several lines. There are only three mandatory keywords, all others are optional and default (where appropriate) to something reasonable. Blank lines are ignored, lines with '#' in column 1 are treated as comments (and ignored).

Keywords	Default	Function
Mandatory
PROJECT	none	Project to which the time should be accounted. See Valid Projects for a list of allowed projects.
TRACK	none	See Batch Job Tracks for information on allowed tracks.
COMMAND	none	The command to be run. Can be a system command (like 'ls') or a user script. For user commands specify the full path name.
Optional
OS	centos77	Which type OS the jobs should execute on, centos77 or general (for any OS variant).
NODE_TAG	none	Which type of machine the jobs should execute on: farm14, farm16 or farm18, farm19
COMMAND_COPY	none	If present - copy the file specified by the COMMAND keyword to the local disk. Useful if the command is the executable rather than a script.
JOBNAME	none	Name for the jobs. Used only as a label. Can be any string, but must not include spaces. Name must be 50 chars or less and must start with letter.
MAIL	none	List of e-mail addresses to which to send results. The first addressee will receive mails from each individual job running under LSF as well as a summary mail from the server, the remaining addressees will only receive the summary mail. If the keyword is missing (or contains no addresses) the submitting user will receive all mails.
TIME	1440	Time limit (minutes) for each individual job. The default, is 1440 minutes (24 hours), the maximum value allowed for use with the non debug/test/theory track is 72 hours.
OPTIONS	none	Command line options associated with the command to be run.
INPUT_FILES	none	A list of files to be processed. The full path names of the files should be given. The elements of the list should be separated with spaces, they may be all on one line or run over several lines. Each file will result in a job on the farm. The file will be copied to the local disk for processing. Please see This note for details on how the farm can automatically cache the input files for your jobs by specifying "/mss" stub paths as the input.
SINGLE_JOB	false	Specify this keyword (no parameters) to force a single job to process all the input files. The default is to process each input file in a separate job. *See note2 below*
MULTI_JOBS	1	If only 1 or no input files are given, run the job this many times. If the input file list contains 2 or more files this variable is ignored. *See note2 below*
OTHER_FILES	none	Any other files that (for efficiency) should be copied to the farm node. These files will all be copied to all nodes. This may include an executable program that is for instance run by a user script given in the 'COMMAND' keyword.
INPUT_DATA		The name of the input file on the farm node. Each file given in the 'INPUT_FILES' list will be copied to this name on the farm. The user program should take it's input from this filename. It is local to the farm and no pathname should be given. If this keyword is not given then each input file given in the "INPUT_FILES" list will be copied to the local disk on the farm as itself. See note1 below
OUTPUT_DATA	none	The name of the output data file generated by the program (this is a local file on the farm - no pathname is needed). This key should be given with the OUTPUT_TEMPLATE key in pairs. *See note3 below* If the filename contains a wildcard character ( * ) then all matching files will be copied. Only one OUTPUT_TEMPLATE should be given and that must be in the form "/directory/path/@OUTPUT_DATA@" *See note3 below*
OUTPUT_TEMPLATE	none	The template filename for each output file. Each file with the name given in 'OUTPUT_DATA' will be copied to a file with this name with the file extension given by the input file. If output is going to tape this template should be for the OSM stub files. If the filename part of the template is "@OUTPUT_DATA@" then the created file will have the same name as the output file on the farm node. If the filename part of the template is "" or "@INPUT_DATA@" then the created file will have the same name as the input file. See the examples for clarity. See note3 below*
CPU	1	The number of CPU cores a job needs.
DISK_SPACE	4 GB	The amount of disk space that your job will require. Auger will ensure that the machine your job runs on will have at least the specified amount of disk space available. The disk space value you specify must be an integer and must have a unit (MB or GB) after the number with a space in between. (e.g. 15 GB).
MEMORY	512 MB	The amount of memory that your job will require. Auger will ensure that the machine your job runs on will have at least the specified amount of memory available. The memory value you specify must be an integer and must have a unit (MB) after the number with a space in between.

Note:
1. Use of the INPUT_DATA keyword:
In order to copy a data file called /work/experiment/raw/run001.data to the local disk on the farm specify:

         INPUT_FILES: /work/experiment/raw/run001.data

and copy a data file called /work/experiment/raw/run001.data to the local disk on the farm as a different name, specify:

INPUT_FILES: /work/experiment/raw/run001.data
INPUT_DATA: fort.11

and locally the file will be named fort.11. This is useful if you want to generate many jobs from a list of input files. One job will be generated for each input file and run on a different farm machine, but all jobs will expect to read data from a file called "fort.11" (or whatever you care to call it).

2. Single vs. multiple jobs.
By default the system will generate one job for each data file you specify with the INPUT_FILES keyword. Use the following to change this behaviour:

SINGLE_JOB

Will force all the input files specified to be processed in one single job. All the specified files will be copied to the farm machine local disk before the job starts executing. Remember that there is a limit of 4 GB disk space per job. It probably does not make sense to specify the INPUT_DATA keyword in this case (see note above).

MULTI_JOBS

If a single file is (or no files are) specified as INPUT_FILES, run the job this many times - one job per machine. This may be useful for simulations running from the same input file of generated events for example (up to you to make sure that the random number seeds ar different...), or testing.

3. Output files

The keywords OUTPUT_DATA and OUTPUT_TEMPLATE can be used in pairs to control the disposition of output files. Each OUTPUT_DATA keyword given should have a matching OUTPUT_TEMPLATE. The OUTPUT_DATA is the name of the output file locally on the farm machine disk. The corresponding OUTPUT_TEMPLATE directs the copying of the output file to a mass storage device (the silo or a work area). There are several points to note:

Using /mss as a prefix to the OUTPUT_TEMPLATE will automatically cause the file to be copied to the tape silo using jput.
If the template contains the character * then that character will be replaced by the name of the input file being processed.
If the template contains the string @INPUT_DATA@ will behave in the same way as the * character - the string being replaced by the input file name.
If the template contains the string @OUTPUT_DATA@, that string will be replaced by the local name of the output file - i.e. the name specified with the keyword OUTPUT_DATA

Main menu

Navigation

You are here

Main menu

Navigation

User login

You are here

Text Command File