Skip to content

Condition of opening work packets #774

Open
@wks

Description

@wks

I recently found a synchronisation issue: #770 Although a workaround exists (#772), it reveals some problems in the design of the work packet system.

  • When can more buckets be opened?
  • How to prevent opening new work packets if more work packets are being added?

Currently, buckets form a linear (totally ordered) relation. A bucket can be opened if (1) all workers are idle, and (2) the coordinator doesn't have pending work, and (3) all previous buckets are empty.

Condition (1) exists because a bucket is only considered finished if all of its work packets are finished, but a worker may be executing one of its work packet.

Condition (1) and (2) exist because one work packet (coordinator packet or ordinary packet) may add more work packets to buckets. If a subsequent bucket is opened before work packets are added to a prior bucket, it will be an error. That's why we use the counter pending_coordinator_packets to prevent buckets from being opened if there are coordinator packets.

The conditions (1) (2) and (3) are reasonably sound, but are too complicated to test. As can be seen in the GCWorkScheduler::poll_slow function, the condition variable worker_monitor.1 is not guarded by any concrete variables. It is very different from typical Condvar use cases like the following:

let mut guard = mutex.lock();
while !can_open_more_buckets {
    guard = condvar.wait(guard);
}

Actually, all other lines of code in poll_slow other than the worker_monitor.1.wait() itself are computing the condition of "can_open_more_buckets". Those lines of code include:

  1. Getting schedulable work (self.poll_schedulable_work(worker))
  2. Getting worker-specific work (self.worker_group.has_designated_work())
  3. Checking if any open bucket has sentinels (self.schedule_sentinels())
  4. Checking if the opening condition of any bucket is met (self.update_buckets()). Currently, the opening condition is scheduler.are_buckets_drained(&cur_stages) where cur_stages includes all previous stages.

All of them need to be checked every time the last worker has parked. Even with such care, we still encounter synchronisation bugs like #770

I think the root problem is that the opening condition is way too complicated than it should be.

A previous issue #546 identified that the synchronisation is too complicated, and proposed a major refactoring where there is only one BinaryHeap to hold all poll-able work.

Here I am making an orthogonal proposal of when to open new buckets.

Proposal

Each bucket now has the following states:

  • For closing a bucket:
    • pending_packets: AtomicUsize The number of work packets added to the bucket but has not been finished.
    • sentinel: Option<GCWork> Sentinel of the bucket (if exists)
    • finished: bool A boolean variable that explicitly records whether this bucket has finished.
  • For opening a bucket:
    • unfinished_predecessors: usize How many predecessors need to finish before this bucket can be opened? (Only count immediate predecessors.)
    • is_open: bool A boolean variable that explicitly records whether this bucket is open.

Every time a packet is added to a bucket, increment pending_packets by one. Every time a work packet is removed from a bucket, do not decrement anything! Every time a worker finished executing a work packet, decrements pending_packets by one and test if it is reduced to zero. When reduced to zero, set finished to true. The sentinel is counted in pending_packets. When setting sentinel, increment the pending_packets counter by one; when the bucket is empty but pending_packets is still 1, there must be a sentinel..

To prevent the pending_packets from incrementing from zero after opening, work packets are only allowed to be added to buckets that are either not yet opened or the same bucket as the current packet. It is forbidden to add more work packets to finished buckets. It is always possible to add packets to the special "Unconstrainted" bucket because it is never finished.

When a bucket is finished, decrement the unfinished_predecessors counter of all successors by one. When a bucket's unfinished_predecessors is decremented to 0, open that bucket.

By doing this, we no longer require all workers to park before opening new work packets.

This solves the "opening buckets" problem. To wake up parked workers, we can use a global pending_packets counter for that purpose, and the condition variable workers waits on can be notified when the global pending_packets counter is incremented from 0.

Extending from linear buckets structure

With a predecessor count attached to each bucket, it is possible to express the dependencies between work buckets in a non-linear manner.

Some buckets have unusual opening conditions. For example,

  • The Prepare bucket is opened when all mutators have stopped.
  • The Closure bucket is opened after the StopMutators coordinator work is finished. Currently, we prevent the Closure bucket from opening by requiring the pending_coordinator_packets counter to be zero. I think we can express the same thing as "the Closure work packet requires not only the Prepare bucket to finish, but also the StopMutators coordinator packet to finish". We can express this by setting the unfinished_predecessors of the Closure bucket to 2, one is the Prepare bucket and the other is the StopMutators coordinator packet.
  • The ScheduleCollection coordinator packet basically prevents all other work buckets from opening. We can express this by setting the unfinished_predecessors of Prepare to 2, one is "mutators stopped", and the other is "the ScheduleCollection coordinator packet is finished".

So the actual work bucket dependency of our current system is (simplified, omitted built-in weak processing buckets and mark-compact buckets):


  ┌─────────────────┐
  │ScheduleCollector│
  │packet finished  ├──┐
  └─────────────────┘  │
                       │
  ┌───────────────┐    │  ┌───────┐
  │mutator stopped├────┴──►Prepare│
  └───────────────┘       └──┬────┘
                             │
                             │
  ┌───────────────┐       ┌──▼────┐
  │StopMutators   ├───────►Closure│
  │packet finished│       └──┬────┘
  └───────────────┘          │
                             │
                          ┌──▼─────────┐
                          │VMRefClosure│
                          └──┬─────────┘
                             │
                             │
                          ┌──▼────┐
                          │Release│
                          └──┬────┘
                             │
                             │
                          ┌──▼──┐
                          │Final│
                          └─────┘

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-work-queueArea: Work queueC-enhancementCategory: EnhancementP-lowPriority: Low. A low-priority issue won't be scheduled and assigned. Any help is welcome.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions