Job Input, Output and Working Directory

The Working Directory of a running slurm job is the directory where the job is running on the allocated nodes for the job. The directory can be specified using a sbatch option in a job submission script:
  • #SBATCH --chdir=particularDirectory
    • If the specified directory is not available on the host, the job will be executed in the /tmp directory.
If the above option is omitted, the working directory is the same as the job submission directory. This implies that the submission directory has to be available on all slurm nodes.

Coming soon:
If the above option is not specified or the specified directory is residing in the NFS home directory,  the submitted job will be executed in the directory /scratch/slurm/username where username is the name of the user who submits the job. There is an environment variable JLAB_SLURM_O_WORKDIR that points to the working directory of the running job. The created files of a job in the working directory will be removed after several days upon the completion of the job.

Unlike Auger, slurm does not manage input/output files for submitted jobs. Therefore, users have to manage the Input/Output files of submitted jobs within the jobs or before and after the jobs. In the future we may provide similar capabilities as what Auger is doing.

For the input files of a job, the user needs to jcache the files first if they are not on the cache disk. When the job starts, the user can copy the files to the computing nodes under the directory /scratch/slurm/username (a job can create this directory on any computing nodes) if the files are small files and will be accessed randomly. On the other hand, if the files are very large files and can be accessed efficiently (e.g. read/write sequentially in large I/O blocks) from the Lustre file system, the user can leave the input files on the cache or volatile disk and access the files within the job using fullpath names from the cache/volatile disk system.

For the output files generated from a job, the user needs to copy the files to the proper locations such as cache, volatile or work. If the files need to be copied to the tape library, just copy the files to a correct location in /cache system and the files will eventually be copied to the tape library. However, if one wishes to copy an output file to the tape library immediately, one can issue "jcache put" after the file is copied to a proper location.

Similar to the input and output files, a job can catch standard output and standard error during the run time using the following sbatch options:
  • #SBATCH --output=<filename pattern>
  • #SBATCH --error=<filename pattern>
The above options instruct slurm to connect the batch script's standard output/error directly  to  the  file name  specified  in  the  "filename  pattern".  By default both standard output and standard error are directed to the same file.  If the above options are not specified, the default file name is "slurm-%j.out", where the "%j"  is replaced by the job ID and the file will be created in the directory where the job is submitted.  See the filename pattern section in the sbatch man page.

Coming soon:
If the above options are not specified, the standard output file and standard error file are "%x-%j-%N.out" and "%x-%j-%N.err", respectively, where the "%x"  is replaced by the username, "%j" is replaced by the job id, and "%N" is replaced by the hostname where the job is running. The files will be created in the directory /farm_out/username which can be accessed from any JLab CUE machine.