Figure 1. JLab LQCD cluster hardware and file-system layout
Each project is allocated space on the following shared storage resources which are visible from both interactive and compute nodes:
File-System | Description | Backed Up | High Quota Limit * |
---|---|---|---|
/home | Every user has a home directory. Intended to store non-date files, such as scripts and executables. |
YES | hard |
/work | Designed for project or group use. Limited size, project users managed area. Intended to store software distributions and small data sets which don't need to be backed up. |
NO | hard |
/cache | Write-thru cache to tape with auto-removal of the disk copy as needed. Intended to store data files (input and output) for batch jobs. /cache is implemented above a Lustre file system and semi-managed with automatic file migration to tape after some period of time, with eventual automatic file removal from disk. Check the Automated Data Management Policies for /cache section for updated backup and deletion policy. |
YES | soft |
/volatile |
Auto-managed global scratch space (never full). |
NO | soft |
/scratch | Local disk on each worker node. Suitable for writing all sorts of data created in a batch job. Any precious data needs to be copied off of this area before a job ends since the area is automatically purged at the end of a job. | NO | None |
Globus | Globus End-Point named jlab#gw12 which you can use to transfer data in and out of the above mentioned storage areas. Please refer to the documentation on globus.org on setting up a Globus Connect Personal if needed. |
There are two types of thresholds within the quota systems and these and current usage can be viewed through the cluster status portal under the File System menu:
* High Quota this is a hard limit for /home and /work, and a soft limit for /cache and /volatile; a write to /cache or /volatile will always succeed, but afterwards, older files will be deleted if you are over your quota, enforced by the in-house processes a few times a day.
Guarantee this is not oversubscribed, and so you will always have at least this much space available to your project.
If a directory does not exist for your project, you may request one; if your quota is too small, you may request a larger quota (limited by the remaining available space). Quotas are oversubscribed, so projects cannot all use their full quota concurrently.
When a project is over their High Quota limit the following data backup and deletion policies will be enforced.
/cache Backup policy:
/cache Deletion policy:
Note: Small files are not written to tape and deleted but there is a soft limit of 1 million files per user. We strongly recommend users store only large permanent files under /cache. If a project generates a lots of small files, please put them under /work or /volatile disk areas.
The /volatile area is managed by an automated process that cleans up periodically in multiple steps as listed below. In the rules mentioned below the target threshold is defined as the "aggregate used space by all projects".
These "storage resource manager" utilities can query status or move files between disk and tape, preserving the files canonical name; i.e. the file has the same name (path) on tape and on disk.
srmProjectInfo projectName
get project information specified by projectName, information includes project quota, pin quota, available pin quota, and etc.
srmGet [-t life_time] [-e email] file_path_1 file_path_2 ...
srmGet [-t life_time] [-e email] [-r] directory_path
get file(s) from tape system to cache disk
-t life_time to specify the life time of file pinned (default is 10 days)
-e email to request cacheManager to send email when all files in srmGet finish.
-r recursively get all files under a named directory (recursive will only go one level)
srmPut [-d] file_path_1 file_path_2 ...
srmPut [-d] -r directory_path
put file(s) with size larger than 1 MB into the tape silo system (without waiting the normal time delay).
-r recursively get all files under a named directory (recursive will only go one level)
-d delete the files after they have been put on tape (frees space faster)
srmLs [options] cache_path1 cache_path2 cache_path3 ....
list file properties (cacheManager related meta data; details on next page)
srmPin [-t lifetime] cache_file_path1 cache_file_path2
srmPin [-r] [-t lifetime] cache_dir_path1
Pin or mark a file or files in a given directory as in use and not to be deleted
-r recursive
-t life_time set the life time for the pin
srmPinStatus [-l]
list the pin status of given file(s)
-l long format
srmRequest request_id1 request_id2 ...
get status information of srmGet/srmPut request(s)
srmCancel request_id1 request_id2 ...
cancel request(s) specified by request-id
srmPendingRequest [user]
list all pending and active request(s) submitted by this user or a given user
srmDupilcatedFile [user]
list all duplicated file create by this user or a given user
srmTapeRemove path1 path2 ...
remove given files (not directory) from tape library (files are marked deleted and will not be copied on a subsequent tape compress). When use tapeRemove first time, you must get a Jlab certificate. Please reference this page on how to get a new certificate. Finally please scp the file .scicomp/keystore to qcdi .scicomp/keystore.
srmChecksum path1
get the crc32 checksum of a given file already on disk; this checksum is used by Jasmine system to verify data integrity on read from tape (all files in the Jasmine tape system have a crc32 checksum value stored in its stub file).
srmLs - list file properties (cacheManager related meta data)
srmLs cache_path1 cache_path2 cache_path3 .... srmLs [options] cache_directory_path
If no argument is given, lists all larger files on tape but not on disk, owned by this user. Otherwise lists files in a given directory.
cache_path: The full path of a file on cache disk, it should start as /cache/... Wild cards in the file name are supported . When using a wild card, path must be quoted and add backslash \ before * and the wild card must in the file name and not in the path directory.
[-h] or [--help] - to display the usage and exit.
[-l] - to show long list including group and modification time.
[--cache] - to list only the files in the cache
[--ncache] - to list only the file not in the cache
[--silo] - to list only the file on tape
[--nsilo] - to list only the file not on tape
[--pin] - to list only the file pinned
[--unpin] - to list only the file not on tape
Options [--cache], [--ncache],[--silo], [--nsilo], [--pin], [--unpin] can be used together. Any combination of the options is equivalent to an "AND" operation. For examples, with [--ncache] and [--silo] options only the files on tape and not on disk will be listed. A file larger than 2MB is defined as a large file, files smaller than 2MB will not be backed up to the tape system and are not reported by the srmLs utility.
%srmLs - print the summary information of the files owned by this user.
%srmLs /cache/LHPC/NF0/aniso/test/foo - print specified file properties, such as pin count, space type, size, etc.
%srmLs --silo /cache/LHPC/NF0/aniso/test - print the file which in tape system
%srmLs --cache --nsilo /cache/LHPC/NF0/aniso/test - print the file which in cache disk but not in silo.