You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove unnecessary memory fence after a CUDA memory barrier
(__syncthreads).
The emitted `bar.sync 0` PTX instruction ensures that all memory
accesses of threads involved in the barrier `0` have been performed and
that no new memory accesses happen before the barrier completes.
The removed memory fence reduced performance without adding any
functionality to the barrier memory behavior.
Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Co-authored-be: Victor Lomuller <victor@codeplay.com>
0 commit comments