You are here

Allocations & fair share


Allocations
Every active project is given an allocation, and
consumption of these allocations are tracked throughout a project year,
from July 1 through June 30.  The size of the allocations are used to
determine the users' fair share as a percent of the total resource.  We
do not use Slurm to enforce an allocation limit, however.  As allocations
are consumed, we adjust fair shares (roughly monthly).  If an
allocation is completely consumed, then we drastically lower the
priority of the project so that it can almost only run jobs if no other
project with a remaining allocation has jobs in the queue.

We
track allocations in units of x86 core hours for standard cluster, GPU
hours for GPU accelerated nodes, and MIC hours for MIC accelerated
nodes.  Nodes of different generations carry different weights in
proportion to their performance, and the basis of the weighting (what
item has a weight of one) can change from one year to the next.  Most of
the graphical displays in the portal show unnormalized hours, but
allocation consumption includes the weights.  For example, each hour of a
K20 GPU costs 2 GPU hours (it has a weight of 2 because it has twice
the performance of this year's reference, the C2050).

Fair Share
A
fair share value is assigned to each project (based upon allocations),
and Slurm attempts to deliver to a project that fraction of the resource
in each fair share period (currently 5 days).  Job priority is high when
you have not yet reached your period target, and low once you have
exceeded it.  Priority is also influenced by how much you were over or
under in the last several periods, with an exponential decay in the
impact of recent history.  Note that fair share is not achieved on an
instantaneous basis (what fraction of the cores are you using right at
this current instant), but only over a window of time.

Within a
single project, different users can also have shares, and by default
everyone has the same share, so Slurm will tend to share the resource
equally within a group.

There are many parameters within Slurm that effect scheduling decisions, and there are always cases where
it will seem like it is not doing the right thing.  Just remember that
over a longer period, it does get it right, and if it is behaving
particularly far from what we want, we will re-tune it.  It is open
source software, so perfection can't be expected.

Please use the
LQCD Portal to check on the progress of your project work.