Skip to content

Conversation

dhernandez0
Copy link
Contributor

@dhernandez0 dhernandez0 commented Oct 3, 2025

Motivation

Fix some bugs related to RockPipeline pass. This will be needed for attention pipelining (outer loop).

Technical Details

  • RockMultibuffer.h: Make cpp and .h name of variables match
  • GridwiseGemmToBlockwise.cpp: Use different registers for input and output (postProcessFirstGemmSplat), otherwise multibuffering had some issues.
  • RockMultibuffer.cpp: Remove limitation of not multibuffering non-i8 + one dimensions buffers. This was to only let LDS buffers get multibuffered (I guess), because we define LDS buffers in this particular way.
  • RockPipeline.cpp: Fix bug due to uninitialized thisMultiBuffers[key], if the key doesn't exist, it's init to 0, instead of 1. Also added some checks to make the pass fail.
  • loweringUtils.cpp: Fix bug when handling rock.extract_multibuffer, it must do modulo of the index.

Test Plan

Tests pass.

Test Result

All tests pass.

Submission Checklist

@dhernandez0 dhernandez0 self-assigned this Oct 3, 2025
@dhernandez0 dhernandez0 requested a review from causten as a code owner October 3, 2025 11:14
@dhernandez0 dhernandez0 requested review from stefankoncarevic, umangyadav, pabloantoniom and Copilot and removed request for causten October 3, 2025 11:14
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes multiple bugs in the RockPipeline pass that are needed for attention pipelining functionality. The fixes address issues with multibuffering, variable naming consistency, and proper handling of register allocation.

  • Fix modulo bug in rock.extract_multibuffer index handling
  • Remove restrictions on multibuffering non-i8 buffers to support LDS buffers
  • Use separate input/output registers in postProcessFirstGemmSplat to avoid multibuffering conflicts
  • Improve error handling and variable naming consistency

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
mlir/lib/Dialect/Rock/utility/loweringUtils.cpp Fixes modulo operation for multibuffer index and adds static specifier
mlir/lib/Dialect/Rock/Transforms/RockMultibuffer.cpp Removes i8 buffer restrictions and updates comments
mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp Separates input/output buffers and improves register handling
mlir/lib/Dialect/Rock/Transforms/RockPipeline.cpp Adds error handling and fixes dependency handling
mlir/include/mlir/Dialect/Rock/Transforms/RockMultibuffer.h Updates parameter names for consistency
mlir/test/Dialect/Rock/test-fusion-and-pipeline.mlir Updates test expectations for new multibuffer behavior
mlir/test/Dialect/Rock/gridwise_attention_accel_lowering.mlir Updates test expectations for buffer naming

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

unsigned multiBufferingFactor,
bool skipOverrideAnalysis) {
LLVM_DEBUG(DBGS() << "Start multibuffering: " << allocOp << "\n");
if (!allocOp.getType().getElementType().isInteger(8)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a MLIR test checking that now multi buffering works for non int8 inputs?

LLVM_DEBUG(DBGS() << "-- Not a int8 buffer -> fail\n");
return failure();
}
if (allocOp.getType().getShape().size() != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a MLIR test checking that now multi buffering works for inputs with more than one dimension?


template <typename AllocType>
FailureOr<AllocType> findAlloc(Value value) {
static FailureOr<AllocType> findAlloc(Value value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy suggested to make it static. My understanding is that this function is used only in loweringUtils.cpp, so this makes sure that's the case. So, the compiler knows it's only supposed to be used here, and could do some optimizations. Or at least it's not visible outside of this file.

Copy link
Contributor

@justinrosner justinrosner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good!

@dhernandez0 dhernandez0 force-pushed the fix_pipeline_bugs branch 3 times, most recently from 5a9e343 to a481cfb Compare October 6, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants