Description
Continuing a discussion in #321 that starts around here and ends here.
I think that this proposal would work, but it is essentially equivalent to the read-dont-modify-write approach.
I agree it's equivalent to read-dont-modify-write. But phrasing it the way I did provides insight about how it can be efficiently implemented. I claim that a load-release can be implemented on most architectures as a fence instruction followed by a load instruction, and store-acquire can be implemented as a store instruction followed by a fence instruction. Notably:
This matches how Linux actually implements seqlocks: "load-release", "store-acquire".- edit: no it doesn't, it requires a stronger fence than what Linux uses, see below.
- It puts the fence on the opposite side of the memory access compared to load-acquire and store-release.
- However, it may require a different type of fence instruction depending on which instructions the architecture offers.
It implies, for example, that readers also synchronize with each other, i.e. R3 -sw> R1', which linearizes the accesses and corresponds to bounding an exclusive cache line around and a loss of the unlimited read-side scaling property that is the whole point of seqlocks.
I could have made a mistake, but this is intended to match seqlocks' implementation, and to not require exclusive cache line access on typical CPUs – unlike a naive implementation of read-dont-modify-write in terms of compare_exchange or fetch_add, which would. The idea is that on those typical CPUs, memory accesses actually are implemented in a way that provides full sequential consistency, in terms of when the accesses actually occur. It's just that the order of "when the accesses actually occur" is different from program order thanks to reordering. (But this is just a motivation, not an attempt to prove correctness. The proof of correctness would come from the memory models in each architecture manual.)