Skip to content

Commit

Permalink
Added missing synchronization to avoid WAR hazards between tiles. (NV…
Browse files Browse the repository at this point in the history
  • Loading branch information
kerrmudgeon authored Dec 20, 2021
1 parent 0dc3ba6 commit 288af36
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions include/cutlass/gemm/kernel/gemm_grouped.h
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,9 @@ struct GemmGrouped {
// Compute threadblock-scoped matrix multiply-add
int gemm_k_iterations = (problem_size.k() + Mma::Shape::kK - 1) / Mma::Shape::kK;

// Wait for all threads to finish their epilogue phases from the previous tile.
__syncthreads();

// Compute threadblock-scoped matrix multiply-add
mma(
gemm_k_iterations,
Expand Down

0 comments on commit 288af36

Please sign in to comment.