Description
LowerWGScope pass generates copies between private and shared memory.
Logic is to share private value from leader work item to other work
items through shared memory. Example in pseudo code:
...
if (Leader work item)
store %PrivateValue to @SharedGlobal -> leader shares the value
memory_barrier()
load %PrivateValue from @SharedGlobal -> all WIs load the shared value
...
Generated load/store operations are not supposed to be moved across
memory barrier but barrier intrinsics like @llvm.nvvm.barrier0() are
not handled specially by LLVM middle end passes and recognized only by
PTX backend. So, middle end optimizations could perform code movement
resulting in load before store. For example, GVN could perform LoadPRE
based on GlobalsAA:
...
crit_edge:
load %PrivateValue from @SharedGlobal -> all WIs load the shared value
if (Leader work item)
store %PrivateValue to @SharedGlobal -> leader shares the value
memory_barrier()
...
It turns out that LLVM does not really have barrier intrinsics, it requires the "fence" instruction.
It looks like, for example, this barrier intrinsic llvm.nvvm.barrier0() call is only recognized by the PTX backend. In the LLVM middle end, it just looks like a regular llvm intrinsic,
I have attached real example.
Input IR:
before_gvn.txt
Ouput IR:
after_gvn.txt
opt -globals-aa -gvn before_gvn.ll -S > after_gvn.ll
vimdiff before_gvn.ll after_gvn.ll