Skip to content

Commit

Permalink
Fix bug in single source GEMM with residual + streamk (NVIDIA#1249)
Browse files Browse the repository at this point in the history
Followup to NVIDIA#1224.

A change in the stream-k threadblock swizzle ctor since 3.3 breaks
single source GEMM with fused epilogue and stream-k. Multi-source was
already corrected.

Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
  • Loading branch information
alihassanijr and Ali Hassani authored Dec 7, 2023
1 parent f188f9b commit f4a0216
Showing 1 changed file with 5 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1552,14 +1552,17 @@ struct GemmStreamkWithFusedEpilogue<Mma_, Epilogue_, ThreadblockSwizzle_, true>

// Initialize the block mapping structure
block_mapping = ThreadblockSwizzle(
typename ThreadblockSwizzle::template KernelTraits<GemmStreamkWithFusedEpilogue>(),
args.mode,
args.problem_size,
{ThreadblockShape::kM, ThreadblockShape::kN, ThreadblockShape::kK},
args.batch_count,
sm_occupancy,
device_sms,
avail_sms);
avail_sms,
sizeof(ElementA),
sizeof(ElementB),
sizeof(ElementC),
Epilogue::kAccumulatorFragments);
}

/// Returns the workspace size (in bytes) needed for these parameters
Expand Down

0 comments on commit f4a0216

Please sign in to comment.