Specifying Input Files

Fri, 10/04/2013 - 14:46 — chen

Here are examples of how to get input files to your farm jobs.

Analyzing One File

This will copy an individual file to the farm node for analysis.
PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /home/user/file.dat
The XML version of this script looks like:
<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file.dat" dest="file.dat"/>
  </Job>

</Request>
A new optional attribute 'copyOption' (copy or link) is introduced to <Input> later 2018, to allow user to specified file stage option. To change the default behavior (copy the file to farm node), just add copyOption="link" for the input file, Auger will not copy the file to farm node, just make a link to the source file. Please notice this new attribute is only available in Auger-Slurm cluster.
<Input src="/home/user/file.dat" dest="file.dat" copyOption="link"/>

Renaming Analysis File

This will copy an individual file to the farm node for analysis and give it a new local name.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /home/user/file.dat
INPUT_DATA: infile

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file.dat" dest="infile"/>
  </Job>

</Request>

Multiple Single File Analysis Jobs

This will create N farm jobs, one for each analysis file, copy the
corresponding analaysis file to the farm node for analysis and give it a
common local name.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: 
/home/user/file1.dat 
/home/user/file2.dat
/home/user/file3.dat
/home/user/file4.dat
INPUT_DATA: infile

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file1.dat" dest="infile"/>
  </Job>
  <Job>
    <Input src="/home/user/file2.dat" dest="infile"/>
  </Job>
  <Job>
    <Input src="/home/user/file3.dat" dest="infile"/>
  </Job>
  <Job>
    <Input src="/home/user/file4.dat" dest="infile"/>
  </Job>

</Request>

A more compact form of this, using List and ForEach can be done as follows.

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <List name="files">
   file1.dat
   file2.dat
   file3.dat
   file4.dat
  </List>
  <ForEach list="files">
    <Job>
      <Input src="/home/user/${files}" dest="infile"/>
    </Job>
  </ForEach>

</Request>

Specifying Additional Files

This specifies additional files that are common to all jobs which are needed in addition to the analysis job.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: 
/home/user/file1.dat 
/home/user/file2.dat
/home/user/file3.dat
/home/user/file4.dat
INPUT_DATA: infile
OTHER_FILES:
/home/user/script.sh
/home/user/configuration.dat

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Input src="/home/user/script.sh" dest="script.sh"/>
  <Input src="/home/user/configuration.dat" dest="configuration.dat"/>

  <List name="files">
   file1.dat
   file2.dat
   file3.dat
   file4.dat
  </List>
  <ForEach list="files">
    <Job>
      <Input src="/home/user/${files}" dest="infile"/>
    </Job>
  </ForEach>

</Request>

Multiple Analysis Files

This will create multiple farm jobs, with multiple analysis files
each. This can only be done using the XML based configuration file. When
specifying a time limit, the time applies to individual jobs and not
the total accumulation of jobs.

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/fileA1.dat" dest="infileA"/>
    <Input src="/home/user/fileB1.dat" dest="infileB"/>
    <Input src="/home/user/fileC1.dat" dest="infileC"/>
  </Job>

  <Job>
    <Input src="/home/user/fileA2.dat" dest="infileA"/>
    <Input src="/home/user/fileB2.dat" dest="infileB"/>
    <Input src="/home/user/fileC2.dat" dest="infileC"/>
  </Job>

  <Job>
    <Input src="/home/user/fileA3.dat" dest="infileA"/>
    <Input src="/home/user/fileB3.dat" dest="infileB"/>
    <Input src="/home/user/fileC3.dat" dest="infileC"/>
  </Job>

  <Job>
    <Input src="/home/user/fileA4.dat" dest="infileA"/>
    <Input src="/home/user/fileB4.dat" dest="infileB"/>
    <Input src="/home/user/fileC5.dat" dest="infileC"/>
  </Job>


</Request>

This can also be done using List and ForEach can be done as follows.

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <List name="suffix">1 2 3 4</List>
  <ForEach list="suffix">
    <Job>
      <Input src="/home/user/fileA${suffix}.dat" dest="infileA"/>
      <Input src="/home/user/fileB${suffix}.dat" dest="infileB"/>
      <Input src="/home/user/fileC${suffix}.dat" dest="infileC"/>
    </Job>
  </ForEach>

</Request>

Multiple Analysis Files Single Job

This will copy multiple files to the farm node for analysis.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: 
/home/user/file1.dat
/home/user/file2.dat
/home/user/file3.dat
/home/user/file4.dat
SINGLE_JOB: true

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file1.dat" dest="file.dat"/>
    <Input src="/home/user/file2.dat" dest="file.dat"/>
    <Input src="/home/user/file3.dat" dest="file.dat"/>
    <Input src="/home/user/file4.dat" dest="file.dat"/>
  </Job>

</Request>

Getting Files From Tape

To get a file from tape using the flat file format, you just reference the file like any other file, using its /mss/... /name.By default, auger will jget the file into /cache/.../name, then make a link from job's working directory to /cache/...
PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /mss/home/user/file.dat
In the XML version, you must precede the file name with mss:.
<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="mss:/mss/home/user/file.dat" dest="file.dat"/>
  </Job>

</Request>
A new optional attribute 'copyOption' (copy or link) is
introduced to <Input> later 2018, to allow user to specified file
stage option. To change the default behavior (make a link from farm
node to /cache/..), add copyOption="copy" for the input file, Auger will copy the file to farm node. Please notice 'copyOption' attribute is only available in Auger-slurm cluster.
<Input src="mss:/mss/home/user/file.dat" dest="file.dat" copyOption="copy"/>

Getting Files From Cache

When input files from tape are already on cache, you should still
refer to the tape version of the file. Auger and Jasmine will be smart
enough to use the file from cache if it is there.
PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /mss/home/user/file.dat
The XML version of this script looks like:
<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="mss:/mss/home/user/file.dat" dest="file.dat"/>
  </Job>

</Request>

Main menu

Navigation

You are here

Analyzing One File

Renaming Analysis File

Multiple Single File Analysis Jobs

Specifying Additional Files

Multiple Analysis Files

Multiple Analysis Files Single Job

Getting Files From Tape

Getting Files From Cache

Main menu

Navigation

User login

You are here

Specifying Input Files

Analyzing One File

Renaming Analysis File

Multiple Single File Analysis Jobs

Specifying Additional Files

Multiple Analysis Files

Multiple Analysis Files Single Job

Getting Files From Tape

Getting Files From Cache