Description
Proposed Changes
Worked on this proposal with @Snektron
Add an optional scope to atomic builtins.
pub const AtomicScope = enum {
thread, // see "Why do we need a single thread's scope?"
subgroup, // equivalent terms: warp, wavefront
group, // equivalent terms: workgroup, thread block
device,
global, // cross-device
};
pub const AtomicOptions = struct {
scope: AtomicScope = .global, // see "Why is the default scope global?"
};
@atomicLoad(usize, ptr, .relaxed, .{ .scope = .subgroup })
Rationale
Unlike cpus, where all cores are ~equal, gpus have a hierarchical execution structure. These 'levels' have different kinds of shared memory, and atomics need different synchronization primitives depending on the scope. All gpu backends (currently: amdgpu, nvptx, spirv) would need something similar, additional builtins, or some inline assembly to have this feature.
Some atomic operations, such as @cmpxchgStrong
, have two memory orders: one for success and one for failure. We argue these don't need a different atomic scope as we access the same memory location. With the same argument, std.atomic.Value(...)
would only need one scope.
Why is the default scope global?
The default should be the one that gives the most guarantees. Currently, we only consider PCIe transactions global in scope.
Why do we need a single thread's scope?
Niche, but would enable a user to force the compiler to not reorder memory accesses without littering the code with @fence
.