There are two thresholds used to control how the files are deleted.
- When Lustre production pool usage
is below the first threshold (80%), there will have no major
deletion
take place even when the disk pool usage is over its allocation. But
there is a exception, when a project usage is over 1.5 of its quota, the
oldest files will be deleted.
- When Lustre production pool usage is above the first threshold and below the second threshold (83%), only the disk pool which the usage is over the allocation will delete the files. The amount of data each pool to remove is calculated by the formula 1.
- When Lustre production pool usage is above the second threshold (83%), all pools will delete the files up to ~30-40% of excess amount every 5 minutes even their usage is below the allocation. The amount of data each pool to remove is calculated by the formula 2.
formula 1:
x1 = min of (diskPool_over_size, 0.3*lustre_over_size)
x2 = min(0.04*lustre_over_size, 0.02*diskPool_total_size)
The deletion amount will be max of x1 and x2.
formula 2:
each pool has a deletion rate: pool_rate = 1- (pool_size - pool_used) / pool_size;
The deletion amount will be 0.3 x pool_rate x lustre_over_size.
Algorithm to determine how to select files to delete in a pool:
step1: delete all files in the preferred deletion file queue, these are the files jcache by farm jobs, Scicomp cache only.
step 2:
find all projects which are over its quota, and distribute the pool
deletion size among these projects up to its over used. Delete the
oldest files until the target is reach.
step3:
find all projects which are over its reserved (guarantee), collect the
old files of these project up to over guarantee amount, calculate the
deletion index of all files, order them in a deletion queue. Delete the
files in the deletion queue until the target meet.
File Deletion Index = (project_over_reserved / project_reserved) x file_ago