You are here

/volatile disk pool

TL;DR (Summary):

  • DO use /volatile for large file storage
    • Largest "general-purpose" file system on-site (Petabyte scale)
    • High performance for large files
  • Analysis / Simulation output goes here
    • Check your files and, if good, push them to tape (and access them through /cache down the line)
  • Reserved space
    • If you stay within this limit, your files are ONLY subject to the 6 month 'stale file' deletion policy
    • You can ask your Hall Computed Coordinator to set Quota == Reserved to avoid unexpected file deletions (other than the 6m limit)
  • Quota space
    • If you exceed this limit, then files will be deleted following policy below
  • NOT backed up; Files auto-cleaned if quota fills
    • Files auto-cleaned based on quota/ reservation/ and filesystem pressure (see details below).
  • Rough Deletion algorithm (if over quota):
    • Files not accessed (atime) in the last 6 month removed first,
    • Then oldest files (by creation) time removed next.

The volatile disk is for temporary storage of large files. It uses a Lustre filesystem and is thus inefficient for reading and writing of small files (those that are a few megabytes or smaller). Files are automatically deleted by a deletion algorithm that runs periodically. The deletion algorithm ensures that space is always available for new files to be written to volatile and that disk space is fairly shared among the experimental physics groups at the lab.

Definitions:
group (or hall): disk space on volatile is shared among several groups, for example, halla, hallb, etc.
subgroup (or project): Each group may or may not have subgroups that share the disk space of the group, for example a-apex and a-ar40 under the halla group. If a group has no subgroups it is treated as the one and only subgroup of that group. Global maximum is defined as maximum amount of volatile disk space summed over all groups. The deletion algorithm tries to keep actual space used must be below this.
subgroup quota: maximum amount of usage allowed for a subgroup. The deletion algorithm will try to reduce actual space used to be below this. The sum of all subgroup quotas generally exceeds the global size of the volatile disk by a substantial amount to allow burst usage by subgroups.
subgroup reservation: amount of disk space exempt from deletion. If actual used space is below this, none of the files in this subgroup are subject deleted. If actual use is above this, files are subject to deletion by the deletion algorithm. The sum of all subgroup reservations is approximately equal to the global size of the volatile disk.

For a listing of groups, subgroups, the global maximum, quotas, and reservations see the SciComp Web Portal

The deletion algorithm runs once a hour. It will delete files to enforce the constraints imposed by the global maximum, the subgroup quotas, and the subgroup reservations. The algorithm proceeds in four steps:
Step 1: All files which are not accessed more than six months will be deleted (use the file access time).
Step 2: All subgroups are brought under their respective quotas. Oldest files by modification age are deleted first.
Step 3: If the total space (summed over all groups/subgroups) is below the global maximum, nothing is done. If total space exceeds the global maximum, then files are deleted to bring the total space used below it. Files with the highest deletion index are deleted first. The deletion index is a product of two factors, the subgroup factor and the file factor.

  deletion factor = (subgroup factor) * (file factor)

- The subgroup factor is calculated using its space used, reservation, and quota and is the same for all files in a subgroup.
- The file factor is the age of the file by modification time. Once the usage of a subgroup is below its reservation, no more files from that subgroup are deleted during the run of the algorithm.

Under rare circumstances the deletion algorithm may have to be run in addition to the regular scheduled runs, for example due to hardware re-configurations that reducing the total Lustre disk pool.