-
Notifications
You must be signed in to change notification settings - Fork 76
Workaround a synchronization bug. #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If a worker thread spuriously wakes up, it may observe the pending coordinator work counter going from 0 to 1 when the EndOfGC work packet starts to be executed. That'll trigger an assertion failure. It is a workaround because a full solution should explicitly address the need to prevent opening more work packets when the coordinator (or any worker) is adding work packets to related buckets.
I would say it's better that we don't close the issue if the fix is just a workaround. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a pretty hacky workaround. But if this is necessary to make sure we do not break bindings, it is okay with me.
src/scheduler/controller.rs
Outdated
// packets for the workers. The only two other coordinator, i.e. `ScheduleCollection` and | ||
// `StopMutators`, belong to that category. `EndOfGC` doesn't add new work packets, so it | ||
// doesn't need to prevent the workers from opening buckets (as there are none to be | ||
// opened.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we still want to use initiate_coordinator_work
here, right? Can you put the link to both issues in the comments, and add a FIXME
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. The initiate_coordinator_work
function is intended for preventing workers from opening more packets. Maybe run_coordinator_work_and_prevent_opening_buckets
is a more meaningful (but verbose) function name. From the diagram I drew in #774, ScheduleCollection
and StopMutators
are preconditions of some buckets, but EndOfGC
is not a precondition of anything. So it may not make sense to wrap all CoordinatorWork with the function currently named initiate_coordinator_work
.
Anyway, I'll add links to both issues so that we can fix it more properly later.
If a worker thread spuriously wakes up, it may observe the pending coordinator work counter going from 0 to 1 when the EndOfGC work packet starts to be executed. That'll trigger an assertion failure.
If a worker thread spuriously wakes up, it may observe the pending coordinator work counter going from 0 to 1 when the EndOfGC work packet starts to be executed. That'll trigger an assertion failure.
If a worker thread spuriously wakes up, it may observe the pending coordinator work counter going from 0 to 1 when the EndOfGC work packet starts to be executed. That'll trigger an assertion failure. (cherry picked from commit 7750fc0)
If a worker thread spuriously wakes up, it may observe the pending coordinator work counter going from 0 to 1 when the EndOfGC work packet starts to be executed. That'll trigger an assertion failure. (cherry picked from commit 7750fc0)
If a worker thread spuriously wakes up, it may observe the pending coordinator work counter going from 0 to 1 when the EndOfGC work packet starts to be executed. That'll trigger an assertion failure.
It is a workaround because a full solution should explicitly address the need to prevent opening more work packets when the coordinator (or any worker) is adding work packets to related buckets.
Closes: #770