Tape Scheduling

About once a minute the global scheduler examines the current backlog of reads and writes.  Because the tape library contains a mix of media (currently LTO4,LTO5,LTO6) and drives (LTO5 and LTO6) the scheduler's first task is to generate a capability matrix that maps pending I/O requests to the read/write compatibility of drives and tapes.  In addition to hardware compatibility, the scheduler also considers some dynamic constraints that can restrict certain tapes to certain drives in order to reduce hardware malfunctions.  Administrative constraints can further limit activity based on volume set, I/O type, etc.

Once the capability matrix is generated, data movers claim work based upon the system state as captured by the last scheduler run.
  • Work is apportioned to movers based upon the notion of a "job set".  A job set describes the set of tape jobs that share: request type (jget or jput), volume set, volume serial.
  • Job sets are assigned priorities based upon the priorities of their constituent jobs, and augmented with dynamic properties.
  • A data mover chooses the highest priority job set that matches its capability matrix.
  • Having chosen a job set, the data mover will retain an affinity for it until one of the following happens:
    • No more jobs exist in the set (i.e. no new jobs of matching type,volume set, volume serial)
    • It discovers that some other job set has higher priority.
  • The data mover grabs enough jobs to comprise about 10GB of work, if possible.  It will process tape I/O for them completely or yield after 30 minutes.


Job priorities:

  • Priorities are integers with smaller values indicating higher priority
  • The priority of each job is comprised of a base value and administrative overrides
  • The base value of a jput job = 10
  • The base value of a jget job = 20
  • The base value is modified +- 3 based upon the job's associated user, volume set, and accounting category nudges, which are configured using administrative tools (presently jadmin).

Dynamic job set priority adjustments:

  • Each (job set X user) gains a priority nudge that is based upon the amount of time the oldest unfulfilled request has been outstanding.  Specifically, the priority is nudged 
    • -log2(ceil(number of 15 minute intervals since the request was made))
  • Each (job set X user) loses a priority nudge that us based upon the amount of time (expressed in tape-minutes) that drives have been devoted to doing their I/O since the oldest unfulfilled request was made.  Specifically, the priority is nudged 
    • log2(ceil(number of 15 tape-minutes since the request was made))
  • Each (user x volume set) is docked a nudge for each tape drive that it is currently occupying

The effect of these two adjustments is to elevate waiting jobs asymptotically, tempered by a factor of resource utilization.  You can see the effect of these policies in the following table generated by the jasmine scheduler:

+--------------+----------+--------------------------+---------------+----------+---------------+---------------------+--------------+-------+------------+-----------+----------+--------------+-----------+------------+----------+
| request_type | user     | vs_name                  | category_name | vol_name | base_priority | submit              | bytes        | files | user_nudge | cat_nudge | vs_nudge | recent_nudge | hog_nudge | wait_nudge | priority |
+--------------+----------+--------------------------+---------------+----------+---------------+---------------------+--------------+-------+------------+-----------+----------+--------------+-----------+------------+----------+
| jput         | manos    | c-qweak-rootfiles-pass5b | production    |          |            10 | 2013-10-02 20:04:29 |   8149887350 |     1 |          0 |        -2 |        0 |            0 |         0 |          0 |        8 |
| jget         | stepanya | eg3a-pro                 | production    | 501601   |            20 | 2013-10-02 12:07:04 |  61073799728 |    87 |         -3 |         0 |        0 |            2 |         1 |         -5 |       15 |
| jget         | hewittc  | b-eg1dvcs-raw.lto5       | raw           | 503860   |            20 | 2013-10-02 13:52:05 |  10732011520 |     5 |         -1 |         0 |        0 |            2 |         1 |         -5 |       17 |
| jget         | goetz    | g12-pro                  | production    | 503559   |            20 | 2013-10-02 10:22:33 |  24820428621 |    16 |          0 |         0 |        0 |            2 |         2 |         -5 |       19 |
| jget         | goetz    | g12-pro                  | production    | 503597   |            20 | 2013-10-02 13:02:22 |  44936736014 |    30 |          0 |         0 |        0 |            2 |         2 |         -5 |       19 |
| jget         | goetz    | g12-pro                  | production    | 503594   |            20 | 2013-10-02 13:02:08 |  26226708461 |    18 |          0 |         0 |        0 |            2 |         2 |         -5 |       19 |
| jget         | clasg14  | b-g14b-raw               | raw           | 501138   |            20 | 2013-10-02 15:03:30 |  17171218432 |     8 |         -1 |         0 |        0 |            2 |         2 |         -4 |       19 |
| jget         | clasg14  | b-g14b-raw               | raw           | 501171   |            20 | 2013-10-02 19:03:32 |  93543923712 |    47 |         -1 |         0 |        0 |            1 |         2 |         -2 |       20 |
| jget         | fersch   | home                     | userdata      | 501807   |            20 | 2013-09-30 16:25:16 | 153985679360 |  4425 |          3 |         0 |        0 |            7 |         3 |         -8 |       25 |
| jget         | fersch   | home                     | userdata      | 501804   |            20 | 2013-09-30 16:25:05 | 255725273088 |  8845 |          3 |         0 |        0 |            7 |         3 |         -8 |       25 |
| jget         | fersch   | home                     | userdata      | 501796   |            20 | 2013-09-30 17:57:16 |  62728372224 |  1297 |          3 |         0 |        0 |            7 |         3 |         -8 |       25 |
| jget         | fersch   | home                     | userdata      | 501817   |            20 | 2013-09-30 16:31:09 |  25071529984 |  1773 |          3 |         0 |        0 |            7 |         3 |         -8 |       25 |
+--------------+----------+--------------------------+---------------+----------+---------------+---------------------+--------------+-------+------------+-----------+----------+--------------+-----------+------------+----------+