Skip to content

[BUG] Missing threadfence in grid sync #4157

Open
@AKKamath

Description

@AKKamath

In /cpp/src_prims/common/grid_sync.cuh only the masterThread() executes a threadfence (line 191).
As per CUDA documentation, threadfence semantics are only guaranteed for the calling thread.
For other threads, there is no guarantee that global writes are visible to all threads.
The threadfence at line 192 should be moved out of the if condition to line 189.

An example can be seen in CUB's equivalent implementation of a grid barrier.
https://github.com/NVIDIA/cub/blob/main/cub/grid/grid_barrier.cuh
You'll notice a threadfence at line 78 executed by all threads participating in the barrier.
The comment at line 76 also confirms that this fence is to ensure the visibility of global writes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions