Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve thread scheduler memory ordering #43418
Improve thread scheduler memory ordering #43418
Changes from all commits
7b636ff
9c7cfa9
6df6105
0eaf35a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you technically need
acq_rel
ordering or a manual fence for the success path for a happens-before edge tojl_atomic_store_relaxed(&ptls->sleep_check_state, sleeping)
injl_task_get_next
. But it would be very counter-intuitive if you need one...(I also wondered if it can disrupt the Dekker-like pattern in
jl_task_get_next
andjl_wakeup_thread
. But IIUC this is fine since the RMW edge extends the release sequence edge.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not release this elsewhere, so I think that ordering would be invalid here. In fact, I think the only valid ordering for this operation is relaxed, because we use relaxed stores elsewhere.
The relaxed swap is not permitted to fail spuriously, so if we get here, we know at least one thread will complete the transition from sleeping->not_sleeping, and that thread will also ensure that to wake the condition signal. (we might get more than one thread completing this transition, if we managed to be very slow here, and the thread was already awake for other reasons, finished processing the work, and then went back to sleep already. But we are not worried about spurious wake-ups.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right..., I just realized that fence for
jl_atomic_store_relaxed(&ptls->sleep_check_state, sleeping)
is on the opposite side so it can't start a synchronizes-with edge (unless we put a fence beforejl_atomic_store_relaxed(&ptls->sleep_check_state, sleeping)
as well or strengthen its ordering).I believe it's OK. We can get happens-before edges even with mixed memory orderings as long as they are at least relaxed (not non-atomic) and we put fences before the write and after the read. See fence-atomic and atomic-fence synchronizations in https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence
I think the difference between strong and weak CASes is orthogonal to the memory ordering, at least for the memory model of the abstract machine.
compare_exchange_weak
andcompare_exchange_strong
both have memory ordering arguments. Also, in the formal declarative memory model (by Lahav et al. 2017, which is the basis of C++20 memory model), the distinction of strong and weak CAS do not exist; IIUC, failed weak CASes just appear in the execution graph as repeated read immediately followed by a write (i.e., failed writes are ignored).I was more worried about the case that somehow the wake-up is missed. Intuitively, I thought you need to send the signal "after" you know that
jl_atomic_store_relaxed(&ptls->sleep_check_state, sleeping)
has happened. But since relaxed memory read/write doesn't enforce any ordering, we technically can't assume what the other task (that is supposedly about to sleep) is actually doing. I cannot help wondering if I'm either taking C/C+ committee's FUD way too seriously and/or I'm still misunderstanding how C++20 memory model works. It's really bizarre that we can't rely on the "causality" through relaxed memory accesses alone. IIRC, Hans Boehm has a very intricate example involving GVN and speculative execution but I don't know if that applies here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That page on fences doesn't appear to mention seq_cst, and that is what we require here. Merely knowing acquire and release is insufficient to provide any interesting constraint here on the behavior of the program. What we need to prove is that the release on thread 1 (the store to sleep_check_state) has a global consistent ordering with the release on thread 2 (the store to the workqueue) relative to the subsequent checks on those threads for the contents of the workqueue or the sleep_check_state, respectively.
For the seq_cst properties, we have that the description of the fence on https://en.cppreference.com/w/cpp/atomic/memory_order:
Once we reach this point, we are into the land of locks and condition objects, which are hopefully much simpler to analyze and declare safe in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a bit stale info on cppreference side. That's actually one of the major improvements in the C++20 memory model. P0668R4: Revising the C++ memory model includes a patch to remove similar sentences from the standard. In particular, the first paragraphs in the section Strengthen SC fences provides nice background information.
[Notes for people who want to dig into the original resource; e.g., future me] The relevant definition in Lahav et al. is:
The
[E^sc]
prefix and suffix inpsc_base
indicate that arbitrary SC evens, including reads and writes, can start/end the partial SC relation, provided that read/write starts or ends the SC-before relationscb
. On the other hand, fences ([F^sc]
) can work in more general setting wherescb
is prefixed/suffixed by the happens-before relationhb
(i.e., which is indicated by[F^sc]; hb^?
andhb^?; [F^sc]
). For example, this indicates that reads-beforerb
relation starting with a SC read and ends with a SC fence ([E^sc]; rb; [F^sc]
) is a part ofpsc_base
and hencepsc
.Yes, I agree. I was just curious to know how do I reason about this if I want to do it rigorously.