Skip to content

Commit e2fc1b8

Browse files
authored
[SYCL][CUDA] Remove unnecessary memfence (#1935)
Remove unnecessary memory fence after a CUDA memory barrier (__syncthreads). The emitted `bar.sync 0` PTX instruction ensures that all memory accesses of threads involved in the barrier `0` have been performed and that no new memory accesses happen before the barrier completes. The removed memory fence reduced performance without adding any functionality to the barrier memory behavior. Signed-off-by: Bjoern Knafla <bjoern@codeplay.com> Co-authored-be: Victor Lomuller <victor@codeplay.com>
1 parent b7a34be commit e2fc1b8

File tree

1 file changed

+0
-1
lines changed
  • libclc/ptx-nvidiacl/libspirv/synchronization

1 file changed

+0
-1
lines changed

libclc/ptx-nvidiacl/libspirv/synchronization/barrier.cl

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,4 @@ _CLC_OVERLOAD _CLC_DEF _CLC_CONVERGENT void
1717
__spirv_ControlBarrier(unsigned int scope, unsigned int memory,
1818
unsigned int semantics) {
1919
__syncthreads();
20-
__spirv_MemoryBarrier(memory, semantics);
2120
}

0 commit comments

Comments
 (0)