Specifying Input Files

Here are examples of how to get input files to your farm jobs.


  • Analyzing One File

This will copy an individual file to the farm node for analysis.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /home/user/file.dat

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file.dat" dest="file.dat"/>
  </Job>

</Request>

A new optional attribute 'copyOption' (copy or link) is introduced to <Input> later 2018, to allow user to specified file stage option. To change the default behavior (copy the file to farm node), just add copyOption="link" for the input file, Auger will not copy the file to farm node, just make a link to the source file. Please notice this new attribute is only available in Auger-Slurm cluster.
<Input src="/home/user/file.dat" dest="file.dat" copyOption="link"/>


  • Renaming Analysis File

This will copy an individual file to the farm node for analysis and give it a new local name.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /home/user/file.dat
INPUT_DATA: infile

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file.dat" dest="infile"/>
  </Job>

</Request>

  • Multiple Single File Analysis Jobs

This will create N farm jobs, one for each analysis file, copy the corresponding analaysis file to the farm node for analysis and give it a common local name.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: 
/home/user/file1.dat 
/home/user/file2.dat
/home/user/file3.dat
/home/user/file4.dat
INPUT_DATA: infile

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file1.dat" dest="infile"/>
  </Job>
  <Job>
    <Input src="/home/user/file2.dat" dest="infile"/>
  </Job>
  <Job>
    <Input src="/home/user/file3.dat" dest="infile"/>
  </Job>
  <Job>
    <Input src="/home/user/file4.dat" dest="infile"/>
  </Job>

</Request>

A more compact form of this, using List and ForEach can be done as follows.

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <List name="files">
   file1.dat
   file2.dat
   file3.dat
   file4.dat
  </List>
  <ForEach list="files">
    <Job>
      <Input src="/home/user/${files}" dest="infile"/>
    </Job>
  </ForEach>

</Request>

  • Specifying Additional Files

This specifies additional files that are common to all jobs which are needed in addition to the analysis job.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: 
/home/user/file1.dat 
/home/user/file2.dat
/home/user/file3.dat
/home/user/file4.dat
INPUT_DATA: infile
OTHER_FILES:
/home/user/script.sh
/home/user/configuration.dat

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Input src="/home/user/script.sh" dest="script.sh"/>
  <Input src="/home/user/configuration.dat" dest="configuration.dat"/>

  <List name="files">
   file1.dat
   file2.dat
   file3.dat
   file4.dat
  </List>
  <ForEach list="files">
    <Job>
      <Input src="/home/user/${files}" dest="infile"/>
    </Job>
  </ForEach>

</Request>

  • Multiple Analysis Files

This will create multiple farm jobs, with multiple analysis files each. This can only be done using the XML based configuration file. When specifying a time limit, the time applies to individual jobs and not the total accumulation of jobs.

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/fileA1.dat" dest="infileA"/>
    <Input src="/home/user/fileB1.dat" dest="infileB"/>
    <Input src="/home/user/fileC1.dat" dest="infileC"/>
  </Job>

  <Job>
    <Input src="/home/user/fileA2.dat" dest="infileA"/>
    <Input src="/home/user/fileB2.dat" dest="infileB"/>
    <Input src="/home/user/fileC2.dat" dest="infileC"/>
  </Job>

  <Job>
    <Input src="/home/user/fileA3.dat" dest="infileA"/>
    <Input src="/home/user/fileB3.dat" dest="infileB"/>
    <Input src="/home/user/fileC3.dat" dest="infileC"/>
  </Job>

  <Job>
    <Input src="/home/user/fileA4.dat" dest="infileA"/>
    <Input src="/home/user/fileB4.dat" dest="infileB"/>
    <Input src="/home/user/fileC5.dat" dest="infileC"/>
  </Job>


</Request>

This can also be done using List and ForEach can be done as follows.

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <List name="suffix">1 2 3 4</List>
  <ForEach list="suffix">
    <Job>
      <Input src="/home/user/fileA${suffix}.dat" dest="infileA"/>
      <Input src="/home/user/fileB${suffix}.dat" dest="infileB"/>
      <Input src="/home/user/fileC${suffix}.dat" dest="infileC"/>
    </Job>
  </ForEach>

</Request>

  • Multiple Analysis Files Single Job

This will copy multiple files to the farm node for analysis.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: 
/home/user/file1.dat
/home/user/file2.dat
/home/user/file3.dat
/home/user/file4.dat
SINGLE_JOB: true

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="/home/user/file1.dat" dest="file.dat"/>
    <Input src="/home/user/file2.dat" dest="file.dat"/>
    <Input src="/home/user/file3.dat" dest="file.dat"/>
    <Input src="/home/user/file4.dat" dest="file.dat"/>
  </Job>

</Request>

  • Getting Files From Tape

To get a file from tape using the flat file format, you just reference the file like any other file, using its /mss/... /name.By default, auger will jget the file into /cache/.../name, then make a link from job's working directory to /cache/...

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /mss/home/user/file.dat

In the XML version, you must precede the file name with mss:.

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="mss:/mss/home/user/file.dat" dest="file.dat"/>
  </Job>

</Request>

A new optional attribute 'copyOption' (copy or link) is introduced to <Input> later 2018, to allow user to specified file stage option. To change the default behavior (make a link from farm node to /cache/..),  add copyOption="copy" for the input file, Auger will copy the file to farm node. Please notice 'copyOption' attribute is only available in Auger-slurm cluster.
<Input src="mss:/mss/home/user/file.dat" dest="file.dat" copyOption="copy"/>


  • Getting Files From Cache

When input files from tape are already on cache, you should still refer to the tape version of the file. Auger and Jasmine will be smart enough to use the file from cache if it is there.

PROJECT: MyProject
TRACK: MyTrack
JOBNAME: MyJob
COMMAND: ls -alF
INPUT_FILES: /mss/home/user/file.dat

The XML version of this script looks like:

<Request>
  <Email email="user@jlab.org" request="false" job="true"/>
  <Project name="MyProject"/>
  <Track name="MyTrack"/>
  <Name name="MyJob"/>
  <Command><![CDATA[
   ls -alF
  ]]></Command>

  <Job>
    <Input src="mss:/mss/home/user/file.dat" dest="file.dat"/>
  </Job>

</Request>