Description
The LLVM LangRef doesn't document how !nontemporal
stores are intended to interact with concurrency primitives. The current interactions are extremely surprising, basically making !nontemporal
stores even less ordered than "non-atomic" stores:
Thread A:
store i32 %v, ptr %p, !nontemporal !13
fence release
// set some global flag (relaxed write)
Thread B:
// wait till global flag is set (relaxed read)
fence acquire
%_0 = load i32, ptr %p, !noundef !11
According to all the usual concurrency rules, that last load must see the store. However, the way LLVM compiles this program on x86, it has a data race: the fences become NOPs and the relaxed accesses become regular MOV, so we end up with MOVNT; MOV
in thread A, which the CPU is allowed to reorder (see e.g. this long and detailed post on MOVNT) -- meaning that thread B might see the flag write but then fail to see the data store!
In other words, MOVNT
violates TSO, but the compilation scheme LLVM (and everyone else) uses for release/acquire synchronization relies on TSO. Together this leads to rather unpredictable semantics. Are nontemporal stores meant to completely bypass normal memory model rules (in which case they are super dangerous to use anywhere), or are they meant to follow the usual rules (in which case LLVM needs to ensure there is an sfence between each nontemporal store and later release operations)?