-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Reduce atomics macro API contracts for future extensions #41269
Conversation
It forbids it because those are not quite equivalent syntactical expressions, not because that is invalid to use a 64-byte load. In particular, according to the C memory model, it sounds like you are thinking of the definition of volatile (which constrains the number of bytes loaded, and is not exposed in Julia), which is distinct from atomic (which constrains the happens-before ordering of the bytes loaded, and is the operation defined here). As I also asserted in #37847, this seems likely already legal to describe a 64-byte atomic load:
|
I don't think volatile is relevant because volatile in C/C++ sense is nothing to do with atomics. If you meant to point it out that this is not incorporated in the C/C++ memory model yet, I agree. That's why I was linking the ongoing discussion in C++ committee. But my point is that it is a conceivable extension and the current API does not allow adding this later in a non-breaking manner.
This a 128 bit load followed by field extraction of the first 64 bit component. It requires
Expressions inside a macro can (and very commonly does) have non-standard semantics. I think it's reasonable to keep the opportunity to provide concise expressions for the useful atomic operations. |
Huh, that seems like an LLVM problem. Even for
it refuses to eliminate that dead load instruction. But there is no happens-before or other ordering (it is even nosync) that requires it to exist. |
The C++ proposal I linked seems to insist on that this is not optimizable
But re-reading your comments makes me realize that I don't understand exactly why this is the case. |
The c++ proposal explains at the bottom that this is not a valid optimization (or in fact a valid portion of the proposal) since I also thought aligned AVX instructions would probably give a non-tearable read, but I can understand the reluctance to assume that is promised architecturally, and to instead use cx16. |
Thanks. So, in principle, it seems like the compiler can optimize out intptr_t old_l = n->load(std::memory_order_relaxed).l;
intptr_t old_r = n->load(std::memory_order_relaxed).r;
Node old(old_l, old_r); But what about tearable stores? It seems hard to express tearable stores @atomic :monotonic xs[i].key = key
@atomic :monotonic xs[i].value = value by p0 = @atomic :monotonic xs[i]
p1 = (key = key, value = p0.value)
@atomic :monotonic xs[i] = p1
p2 = @atomic :monotonic xs[i]
p3 = (key = p2.key, value = value)
@atomic :monotonic xs[i] = p2 (expecting the optimizations). This is used in the work-stealing deque example. The C++ proposal mentions that the performance advantage "is more likely to be significant" (sounds easy to expect). But, actually, going back to the main topic in the OP, if we have the direct way to express tearable atomics like @atomic :tearable xs[i] = (; key, value) # or `:nonatomic` as in C++ then arguably it's cleaner than assigning Maybe a more problematic part is that |
This would be a breaking change now |
Currently,
@atomicreplace a[1].x expected => desired
is equivalent toHowever, it forbids future extensions such as lowering
to a 64 bit (not 128 bit) load.
As I explained in #37847, this kind of "torn read" is a common technique when dealing with 128 bit entries (e.g., https://doi.org/10.1145/3309206 p.6 and https://doi.org/10.1145/2442516.2442527 Fig. 3, line 37). It looks like this is discussed as "Tearable Atomics" in C++ http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0690r1.html
Other expressions such as
x.y.z
andf(x).y
also have a possible well-defined interpretation as a memory location using the notion of lens a la Accessors.jl.The point of this PR is not for discussing if the above suggestions are adequate (although I understand that dismissing them as out of the scope of Julia is an unfortunate but possible outcome).
Rather, I suggest avoid expanding API contracts prematurely and leaving these expressions unsupported, for possible extensions in the future.