You are here

Filesystems

logocolor4.png


  1. Filesystem Descriptions
  2. Automated Data Management Policies for /cache
  3. Automated Data Management Policies for /volatile
  4. Cache Manager Utilities

JLab-Cluster-Layout.png

Figure 1. JLab LQCD cluster hardware and file-system layout


Each project is allocated space on the following shared storage resources which are visible from both interactive and compute nodes:

File-System Description Backed Up High Quota Limit *
/home Every user has a home directory.
Intended to store non-date files, such as scripts and executables.
YES hard
/work Designed for project or group use. Limited size, project users managed area.
Intended to store software distributions and small data sets which don't need to be backed up.
NO hard
/cache Write-thru cache to tape with auto-removal of the disk copy as needed.
Intended to store data files (input and output) for batch jobs.
/cache is implemented above a Lustre file system and semi-managed with automatic file migration to tape after some period of time, with eventual automatic file removal from disk. Check the Automated Data Management Policies for /cache section for updated backup and deletion policy.
YES soft
/volatile

Auto-managed global scratch space (never full).
Intended to hold the output of one job that will later be consumed by a subsequent job. May also serve as an area to pack and unpack tar balls. In cases where users work with a large number of small files in one directory this is the best place for that type of data. (Note that if the files need to persist on disk for a long time, /work is a good alternative) /volatile is implemented above a Lustre file system. Check the Automated Data Management Policies for /volatile section for updated deletion policy.

NO soft
/scratch Local disk on each worker node. Suitable for writing all sorts of data created in a batch job. Any precious data needs to be copied off of this area before a job ends since the area is automatically purged at the end of a job. NO None
Globus Globus End-Point named jlab#gw12 which you can use to transfer data in and out of the above mentioned storage areas. Please refer to the documentation on globus.org on setting up a Globus Connect Personal if needed.    

There are two types of thresholds within the quota systems and these and current usage can be viewed through the cluster status portal under the File System menu:

* High Quota  this is a hard limit for /home and /work, and a soft limit for /cache and /volatile; a write to /cache or /volatile will always succeed, but afterwards, older files will be  deleted if you are over your quota, enforced by the in-house processes a few times a day.
Guarantee       this is not oversubscribed, and so you will always have at least this much space available to your project.

If a directory does not exist for your project, you may request one; if your quota is too small, you may request a larger quota (limited by the remaining available space). Quotas are oversubscribed, so projects cannot all use their full quota concurrently.


Automated Data Management Policies for /cache

When a project is over their High Quota limit the following data backup and deletion policies will be enforced.

/cache Backup policy:

  • The files which satisfy the criterion size between 3MB to 1TB and 12 days old will be automatically copied to tape. 

/cache Deletion policy:

  • Oldest files which satisfiy the criterion pin count = 0 AND backed up = Yes will be deleted.

Note: Small files are not written to tape and deleted but there is a soft limit of 1 million files per user. We strongly recommend users store only large permanent files under /cache. If a project generates a lots of small files, please put them under /work or /volatile disk areas.


Automated Data Management Policies for /volatile

The /volatile area is managed by an automated process that cleans up periodically in multiple steps as listed below. In the rules mentioned below the target threshold is defined as the "aggregate used space by all projects".

  1. All files that have not been used for more than 6 months will be deleted.
  2. If a project exceeds its quota and the target threshold is exceeded (>75%), the least-recently-used (LRU) files will be deleted until the total usage of the project is below their High Quota and the target threshold is met (<75%).
  3. If the target threshold is exceeded and reachs a critical point, additional aggressive file deletions may be performed using the rule mentioned in #2.

Cache Manager Utilities

These "storage resource manager" utilities can query status or move files between disk and tape, preserving the files canonical name; i.e. the file has the same name (path) on tape and on disk.

srmProjectInfo projectName
           get project information specified by projectName, information includes project quota, pin quota, available pin quota, and etc.

srmGet [-t life_time] [-e email] file_path_1 file_path_2 ...
srmGet [-t life_time] [-e email] [-r] directory_path 
            get file(s) from tape system to cache disk
            -t life_time     to specify the life time of file pinned (default is 10 days)
            -e email      to request cacheManager to send email when all files in srmGet finish.
            -r               recursively get all files under a named directory (recursive will only go one level)

srmPut [-d] file_path_1 file_path_2 ...
srmPut [-d] -r directory_path
            put file(s) with size larger than 1 MB into the tape silo system (without waiting the normal time delay).
            -r      recursively get all files under a named directory (recursive will only go one level)
            -d     delete the files after they have been put on tape (frees space faster)

srmLs [options] cache_path1 cache_path2 cache_path3 ....
           list file properties (cacheManager related meta data; details on next page)

srmPin [-t lifetime] cache_file_path1 cache_file_path2
srmPin [-r] [-t lifetime] cache_dir_path1
            Pin or mark a file or files in a given directory as in use and not to be deleted
            -r                 recursive
            -t life_time   set the life time for the pin

srmPinStatus [-l]
            list the pin status of given file(s)
            -l  long format

srmRequest request_id1 request_id2 ...
            get status information of srmGet/srmPut request(s)

srmCancel request_id1 request_id2 ...
            cancel request(s) specified by request-id

srmPendingRequest [user]
            list all pending and active request(s) submitted by this user or a given user

srmDupilcatedFile [user]
            list all duplicated file create by this user or a given user

srmTapeRemove path1 path2 ...
            remove given files (not directory) from tape library (files are marked deleted and will not be copied on a subsequent tape compress). When use tapeRemove first time, you must get a Jlab certificate. Please reference this page on how to get a new certificate. Finally please scp the file 
.scicomp/keystore to qcdi .scicomp/keystore.

srmChecksum path1
            get the crc32 checksum of a given file already on disk; this checksum is used by Jasmine system to verify data integrity on read from tape (all files in the Jasmine tape system have a crc32 checksum value stored in its stub file).


srmLs explained

 

Name

          srmLs - list file properties (cacheManager related meta data)

Syntax

    srmLs cache_path1 cache_path2 cache_path3 ....
    srmLs [options] cache_directory_path

Description

        If no argument is given, lists all larger files on tape but not on disk, owned by this user. Otherwise lists files in a given directory.

Parameters

        cache_path:  The full path of a file on cache disk, it should start as /cache/... Wild cards in the file name are supported . When using a wild card, path must be quoted and add backslash \ before * and the wild card must in the file name and not in the path directory.

Options

         [-h] or [--help]  -  to display the usage and exit.
         [-l]  -  to show long list including group and modification time.
         [--cache]  -  to list only the files in the cache 
         [--ncache]  -  to list only the file not in the cache 
         [--silo]  -  to list only the file on tape
         [--nsilo]  -  to list only the file not on tape 
         [--pin]  -  to list only the file pinned
         [--unpin]  -  to list only the file not on tape

Notes

Options [--cache], [--ncache],[--silo], [--nsilo], [--pin], [--unpin] can be used together. Any combination of the options is equivalent to an "AND" operation. For examples, with [--ncache] and [--silo] options only the files on tape and not on disk will be listed. A file larger than 2MB is defined as a large file, files smaller than 2MB will not be backed up to the tape system and are not reported by the srmLs utility.

Examples

  1. Get summary information:
    %srmLs - print the summary information of the files owned by this user.
    
  2. List file properties:
    %srmLs /cache/LHPC/NF0/aniso/test/foo - print specified file properties, such as pin count, space type, size, etc. 
    
  3. List some files' properties in a given cache directory:
    %srmLs --silo /cache/LHPC/NF0/aniso/test  - print the file which in tape system
    
  4. Get all files under a given directory into cache disk:
    %srmLs --cache --nsilo /cache/LHPC/NF0/aniso/test -  print the file which in cache disk but not in silo.
    

If you need support or have questions please use the following support web page.