Allocations & fair share

Allocations
Every active project is given an allocation, and consumption of these allocations are tracked throughout a project year, from July 1 through June 30.  The size of the allocations are used to determine the users' fair share as a percent of the total resource.  We do not use Slurm to enforce an allocation limit, however.  As allocations are consumed, we adjust fair shares (roughly monthly).  If an allocation is completely consumed, then we drastically lower the priority of the project so that it can almost only run jobs if no other project with a remaining allocation has jobs in the queue.

We track allocations in units of x86 core hours for standard cluster, GPU hours for GPU accelerated nodes, and MIC hours for MIC accelerated nodes.  Nodes of different generations carry different weights in proportion to their performance, and the basis of the weighting (what item has a weight of one) can change from one year to the next.  Most of the graphical displays in the portal show unnormalized hours, but allocation consumption includes the weights.  For example, each hour of a K20 GPU costs 2 GPU hours (it has a weight of 2 because it has twice the performance of this year's reference, the C2050).

Fair Share
A fair share value is assigned to each project (based upon allocations), and Slurm attempts to deliver to a project that fraction of the resource in each fair share period (currently 5 days).  Job priority is high when you have not yet reached your period target, and low once you have exceeded it.  Priority is also influenced by how much you were over or under in the last several periods, with an exponential decay in the impact of recent history.  Note that fair share is not achieved on an instantaneous basis (what fraction of the cores are you using right at this current instant), but only over a window of time.

Within a single project, different users can also have shares, and by default everyone has the same share, so Slurm will tend to share the resource equally within a group.

There are many parameters within Slurm that effect scheduling decisions, and there are always cases where it will seem like it is not doing the right thing.  Just remember that over a longer period, it does get it right, and if it is behaving particularly far from what we want, we will re-tune it.  It is open source software, so perfection can't be expected.

Please use the LQCD Portal to check on the progress of your project work.