Skip to content

Conversation

dhernandez0
Copy link
Contributor

@dhernandez0 dhernandez0 commented Sep 15, 2025

Motivation

The purpose of this PR is to encapsulate the logic of loads from memory (+ store to LDS and load from LDS). The goal is to be able to use the same logic for attention.

Technical Details

  • make LDS inputs optional for BlockwiseGemmAccelOp
  • GemmLoadTileType to describe loading behaviors (Default, DoubleBuffer, BypassLDS)
  • new BlockwiseLoadTileOp to encapsulate the loading logic
  • BlockwiseLoadTileToThreadwisePass pass to lower BlockwiseLoadTileOp to the desired loading behavior (based on GemmLoadTileType)
  • ToBlockwise: refactor gemm and attention to use common functions and use BlockwiseLoadTileOp to encapsulate loading for gemm.

Test Plan

  • Add tests for new pass
  • Add test for ToBlockwise (as it has changed significantly)
  • Compare performance with develop branch. To verify no slowdown is introduced by this PR.

Test Result

Tests pass

Submission Checklist

@dhernandez0 dhernandez0 self-assigned this Sep 15, 2025
Base automatically changed from remove_reversegrid to develop September 15, 2025 16:08
@dhernandez0 dhernandez0 force-pushed the 1947-part1 branch 2 times, most recently from 9479ede to 5d9b627 Compare September 23, 2025 11:21
@dhernandez0 dhernandez0 force-pushed the 1947-part1 branch 2 times, most recently from bf499a0 to 0308ca1 Compare October 6, 2025 09:35
RockGemmFeaturesInterface>]>,
Arguments<(ins MemRefOf<LdsBufferTypes>:$matrixA,
MemRefOf<LdsBufferTypes>:$matrixB, I32Attr:$inMPerThread,
: Rock_Op<"blockwise_gemm_accel", [AttrSizedOperandSegments,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want for this to also implement MemoryEffectsOpInterface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, GemmOp and ThreadwiseAccelGemmOp do implement it, not sure why this one doesn't. I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and BlockwiseGemmOp implements it as well. This looks like a bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

//
//===----------------------------------------------------------------------===//
//
// This pass lowers `rock.blockwise_load_tile` to rock.threadwise_* ops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a more detailed description of what this pass does here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pabloantoniom
Copy link
Contributor

Code LGTM! Did you check that there are is no performance variation?

@dhernandez0
Copy link
Contributor Author

dhernandez0 commented Oct 9, 2025

Code LGTM! Did you check that there are is no performance variation?

Yes, I just finished doing that. Most of the tier1 gemm list problem configs generate the exact same assembly. For some problem configs (~18%), there are some with minor assembly changes but no performance difference.

@dhernandez0 dhernandez0 merged commit a9de961 into develop Oct 14, 2025
8 of 16 checks passed
@dhernandez0 dhernandez0 deleted the 1947-part1 branch October 14, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants