924 – (A-924) Implement Slack Groups

PCR 924 (A-924) - Implement Slack Groups

Summary: Implement Slack Groups

Status:	NEW

Alias:	A-924

Product:	Kernel
Classification:	Deos
Component:	Kernel (show other PCRs)
Version:	mainline
Hardware:	TBD Deos

Priority:	Hold
Severity:	Enhancement
Target Milestone:	mainline
Assignee:	.Kernel

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2003-02-21 00:00 MST by bcronk
Modified:	2023-08-14 11:30 MST (History)
CC List:	1 user (show)

See Also:
Impact Assessment:	---
Organization:	---

Flags:	bcronk: Requirements? (bcronk) alarson: Code? alarson: TestCases? alarson: TestProcedures? deosbugs.ccb: Other-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this PCR.

Description alarson 2005-09-22 16:56:45 MST

Several of our SHARP components, as well as the usefulness of slack for the
general user community, could benefit greatly by the addition of slack groups to
the Deos scheduling paradigm.

This PCR is being raised to assess whether or not this feature will ever be
added and if so, an approximate indication of when.

RR20030227 - posted from email
A while back I was thinking about this feature and I think I have a solution
that might work I called "budget pools" that would work to solve the users
problem.  I need to talk with Joe some more about my idea but basically it works
like this:

In the IT a user specifies a group of threads + a budget pool (threads can not
be a member of more than one group). At run time when a thread is switched in it
gets its budget + the pools budget. When a thread completes it's remaining
budget is put in the pool (instead of going to slack). When a thread waits on an
event it will work just like thread budget transfer except the budget is
transferred to the pool.

Some limitations of my solution:
All threads sharing a budget pool would have to execute at the same rate.  
Unlike slack a thread would never wait for budget to be added to the pool.
Unused budget would not give up to slack (with Joes latest slack improvements
this not an issue).

JAS20030715
Slack group notes

Purpose:  Allow a user to specify a collection of threads which consume slack
generated
by other threads in the collection.  Ensure that slack generated by a thread
within the
collection is not consumed by a thread outside the collection.  (Or, at least
ensure the
slack is not consumed until all threads in the collection somehow relinquish
their claims
to the slack.)

Initial thoughts on implementation:
(This section is my [jas] attempt at summarizing a discussion on the topic among
myself,
Ryan, and Dennis on 7/14/03.)

1.) Eliminate variable slack waiting times.  This is already a problem on the
current
system.  Usually, when threads wait for slack, they are waiting for at least
CSPD worth
of slack.  The only exception is a thread needing slack to lock a mutex, in
which case
the thread is waiting for an amount of slack equal to the priority inheritance
time of
the mutex.  This leads to a case where a slack requestor trying to lock a
particularly
long mutex can "eclipse" other slack requestors at the same rate.  That's
because the
scheduler, when examining the slack requestor list as part of choosing the next
thread
to run, only looks at the head of the list for a given priority - if the slack
available
is insufficient, it goes on to the next priority.  It does that because to do
otherwise
(looking on down the given priority list) is an O(n) operation.

We didn't push all the way through the solution to this problem, but for the
most part
assumed it could be done.  It seems that a thread attempting to convert slack to
lock
a mutex could still wait for CSPD slack; when granted it then looks to see if
available
slack is sufficient to lock the mutex, just like in the case where it discovers
its
remaining fixed budget is insufficient.

2.) Change the semantics of the slack requestor list slightly.  Today, being on
the
slack requestor list simply means there wasn't enough slack for me when I last
requested
it, but says nothing about the current state of things - which is why the slack
requestor
list must be consulted during scheduling.  The proposed semantic change is that
being
on the slack requestor list means "I need a slack donation to occur before
bothering
with whether I should run again".  What this semantic change would allow us to
do is
to move the onus of evaluating the slack requestor list from scheduling time to
slack
donation time.  When a thread dontates at least CSPD to slack (or a cumulative
donation
now adds up to at least CSPD), go ahead and move the head of the slack requestor
list
to the runnable list and convert CSPD to fixed budget.  When the thread runs, it
quickly
times out (just like today), and attempts to get available slack.  If the thread
completes,
the remaining budget is another slack donation which moves the next requestor
onto
runnable.  If the thread again times out, it requests slack again.  Once
accomplished,
this means the scheduler would no longer need to consult the slack requestor
list at
scheduing time.

3.) Eliminate events' slack requestor list.  First, note that SDT waits already
have
nothing to do with this list.  So we're only interested in long duration or
indefinite
waits (long waits for short :).  Long waiters which are slack consumers go on
the event's
SRL (slack requestor list) - regardless of whether they're currently executing
on fixed
budget or on slack - because long waiters by definition give up claims to any
fixed budget.
They also go on their SOP (start of period) list, so that they have an
opportunity each
period to decrement their timeout counters.  The reason to not *just* put them
on their SOP
lists is to give them an opportunity to run the same period the pulse occurs,
even if the
pulse occurs later than the timeout processing.  The reason the event's SRL,
upon event
pulsing, can't be merged directly to runnable, is that the waiters are not
required to have
CSPD fixed budget.  But what if we did require them to?  Then, long waiters
could be placed
on the same list as SDT waiters, and merged directly to runnable when the pulse
occurs.  I
think this might lead to situations where the same thread is on runnable twice,
but I'm not
sure this is logically an error.  (One of them is selected first, ensuring it's
not on any
list is sufficient.)  I think the saving grace is that different list nodes are
used - I
think this makes it work even if the thread is logically its own successor - but
this needs
to be checked.  If the thread doing the long wait doesn't have CSPD fixed, or
can't get it
from slack, we could either a.) just put it on the SOP list (after all, there
was no
guarantee he'd get attention the same period the pulse occurred anyway), or b.)
it could
wait for slack immediately, and not go on the event's list until after it had
secured CSPD.
(I think this leads to a difficult test condition - when returning from the
slack wait,
we'd have to check whether the event had been pulsed in the meantime.)

4.) Think of a slack requestor list and slack accumulators for each slack group.
That
means, given the semantic change of 2.), that a thread making a donation to a
given group
can cause the highest priority waiting thread in that group to get moved to
runnable in
O(1) time.

5.) We need some mechanism to link the logical slack requestor lists from each
group
into a single list, so that the slack requestor to runnable merge (that occurs
each
period boundary) can occur in O(1) time.

6.) There are some issues that arise when combining the concepts of slack groups
with the
slack concepts new to Agave.  The new concept is that there comes a point in any
given
period when there are no longer any threads *pending* (ie, no threads with an
outstanding
promise to fixed budget access) when it's OK for slack consumers in that period
to consume
slack accumulated in a slower period.  When applied to a slack  group, the first
inclination is to say that whenever all fixed budget promises within the group
are
satisfied, we'll allow slack consumers within the group to consume from slower
period slack
within the group.  However, the notion of holding off slower period slack
consumption until
the period is no longer pending is a concept that must be honored system-wide.
This raises
the notion of deferring the use of a slack donation.  The deferral scenario is
basically
this:  slack has accumulated in a slack group's slow period accumulator.  At the
group's
fast period, the last thread consumes or relinquishes its fixed budget, thus
creating a
"slack donation" in the form of making the slow slack available to the fast
period.  There
is a fast thread in the group waiting for slack. It is not OK for the fast
thread to be
moved directly to runnable, because the fast period (from a system point of
view) is still
pending.  However, there will not be another donation to the group which will
cause the
thread to then get moved to runnable.  The solution was to have such a donation
move the
thread to a common deferred list, and then merge the deferred list to runnable
when the
period becomes not pending.  (Ed note: I think this behavior is ill-defined
anyway -- see
following discussion.)

========= End of discussion summary ==========================

After further consideration, I believe there's no resolution to the issue raised
in
6.) above.  With the new Agave behavior, it is possible to ripple slack forward,
across
numerous short periods, until late in a long period.  With the ability to
consume from
a slower rate, we can certainly have the situation where the available slack
exceeds
the time remaining until deadline for a given (faster) consumer.  If this faster
thread
is the only consumer in town, this simply means it is now too late to consume
all the
slack that was generated.  When viewed in the context of slack groups, we could
easily
have two groups in this state - a thread in one group would get slack at the
expense of
the other.

Not a very clear description I admit.  I think the bottom line is this -- slack
groups
cannot be kept out of each others' hair if they exhibit Agave-like behavior.
However,
I think they can be kept clear if slack groups exhibit Cardon-like slack
behavior, and
if system slack does not consider a period to be non-pending until all fixed
budget
commitments have been met (for the period) and all slack group requestors (for
the
period) have been satisfied.

I'm not sure whether this restriction is too limiting or not.  In the case where
the
slack group is a process, the solution seems pretty straightforward -- give all
the
process' budget to the fastest thread (probably main).   

from Jeff Novacek (BRGA) 20030826:

Have there been any requests to allow CPU Budget Transfers between threads of
different rates?

If no, I believe that Platforms (CBIT in particular) could take advantage of
this feature if it could transfer budget from a 80Hz thread to a 1Hz thread.

Let me know if I missed something and this is already supported  or  if this
request is something that DEOS will take under consideration.

Thanks,
Jeff Novacek
Office: (602) 436-4551
Pager: (602) 360-8202
jeff.novacek@honeywell.com


TRACK Comments:

MD20030831: Defer until after Agave certification.  SAS classification -
Enhancement

20030716 Estimate:
  rapid prototype:
    1.) 3 days
	2.) 2 days
	3.) 3 days (need semaphore's list also)
	4/5.) 2 days
	misc) 5 days
  total:  3 wks

  reqs: 1 wk
  tracing / code cleanup:  1 wk
  tests:  ? wks
  reviews: 3 wks
Total: ? wks => ~3 mos

CCB20030701: CCB needs estimate
MD20030623: SHARP maintainers would like to see this in the 6.14 cert.
SHARP Components need a disposition of this PCR by 28-FEB-2003.


PCR moved from Track to bugzilla by Aaron.Larson@Honeywell.com 2005-09-22 16:56

Comment 1 sb 2006-05-04 12:07:45 MST

Update Target Milestone per 3/2006 Deos ADIRU Final EASA audit.

Comment 2 deosbugs.ccb 2010-02-03 15:22:11 MST

CCB visited this PCR on 2010-02-03

Comment 3 deosbugs.ccb 2010-02-25 13:35:52 MST

CCB visited this PCR on 2010-02-25.

Comment 4 deosbugs.ccb 2010-03-31 12:11:52 MST

CCB 2 visited this PCR on  2010-03-31

Comment 5 deosbugs.ccb 2010-05-11 15:05:09 MST

CCB visited this PCR on 2010-05-11

Comment 6 deosbugs.ccb 2010-09-30 14:51:05 MST

CCB 2 visited this PCR on 2010-09-30.

Comment 7 deosbugs.ccb 2010-12-17 10:17:04 MST

CCB visited this PCR on 2010-12-17.

Comment 8 deosbugs.ccb 2011-03-07 17:38:44 MST

CCB visited this PCR on 2011-03-07

Comment 9 deosbugs.ccb 2011-04-11 14:37:18 MST

CCB visited this PCR on 2011-04-11

Comment 10 deosbugs.ccb 2011-05-16 15:19:53 MST

CCB2 visited this PCR on 2011-05-16

Comment 11 deosbugs.ccb 2011-08-17 15:10:14 MST

CCB visited this PCR on 2011-08-17

Comment 12 deosbugs.ccb 2011-11-04 15:52:03 MST

CCB 2 visited this PCR on 2011-11-04

Comment 13 deosbugs.ccb 2012-01-31 13:45:00 MST

CCB visited this PCR on 2012-01-31

Comment 14 deosbugs.ccb 2012-05-15 08:39:52 MST

CCB visited this PCR on 2012-05-15

Comment 15 deosbugs.ccb 2012-06-08 13:16:12 MST

CCB 2 visited this PCR on 2012-06-08

Comment 16 deosbugs.ccb 2012-08-24 09:17:03 MST

CCB visited this PCR on 2012-08-24

Comment 17 deosbugs.ccb 2012-09-07 10:38:05 MST

CCB visited this PCR on 2012-09-07

Comment 18 rroffelsen 2012-09-17 15:59:57 MST

CCB visited this PCR on 2012-09-17

Comment 19 deosbugs.ccb 2012-11-19 12:19:13 MST

CCB visited this PCR on 2012-11-19

Comment 20 deosbugs.ccb 2012-11-26 17:45:26 MST

CCB visited this PCR on 2012-11-26

Comment 21 deosbugs.ccb 2013-02-12 21:18:36 MST

CCB visited this PCR on 2013-02-12

Comment 22 deosbugs.ccb 2013-03-18 14:34:55 MST

CCB visited this PCR on 2013-03-18

Comment 23 deosbugs.ccb 2013-05-15 10:11:47 MST

CCB visited this PCR on 2013-05-15

Comment 24 deosbugs.ccb 2013-07-19 11:25:37 MST

CCB visited this PCR on 2013-07-19

Comment 25 deosbugs.ccb 2013-11-15 17:39:52 MST

CCB visited this PCR on 2013-11-15

Comment 26 deosbugs.ccb 2014-05-20 10:13:45 MST

CCB 2 visited this PCR on 2014-05-20

Comment 27 deosbugs.ccb 2014-07-14 15:11:28 MST

CCB visited this PCR on 2014-07-14

Comment 28 deosbugs.ccb 2014-11-11 13:16:25 MST

CCB 2 visited this PCR on 2014-11-11

Comment 29 deosbugs.ccb 2014-11-17 08:57:34 MST

CCB visited this PCR on 2014-11-17

Comment 30 deosbugs.ccb 2016-04-18 12:56:09 MST

CCB visited this PCR on 2016-04-18

Comment 31 deosbugs.ccb 2016-06-20 13:09:03 MST

CCB visited this PCR on 2016-06-20

Comment 32 deosbugs.ccb 2017-02-01 09:38:19 MST

CCB visited this PCR on 2017-02-01-59501

Comment 33 deosbugs.ccb 2017-06-28 13:01:24 MST

CCB visited this PCR on 2017-06-28-69227

Comment 34 deosbugs.ccb 2017-07-06 11:07:23 MST

CCB visited this PCR on 2017-07-06-58325

Comment 35 deosbugs.ccb 2021-03-26 09:36:49 MST

CCB visited this PCR on 2021-03-26-57787

Comment 36 deosbugs.ccb 2021-04-05 09:53:55 MST

CCB visited this PCR on 2021-04-05-59141

Comment 37 deosbugs.ccb 2023-08-14 11:06:26 MST

CCB visited this PCR on 2023-08-14-64795

Comment 38 deosbugs.ccb 2023-08-14 11:30:13 MST

PCR to remain on HOLD for kismet.