-
Notifications
You must be signed in to change notification settings - Fork 49
Add rock.blockwise_load_tile to encapsulate the logic of loads #1988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
878b9bd
to
94053d5
Compare
9479ede
to
5d9b627
Compare
bf499a0
to
0308ca1
Compare
RockGemmFeaturesInterface>]>, | ||
Arguments<(ins MemRefOf<LdsBufferTypes>:$matrixA, | ||
MemRefOf<LdsBufferTypes>:$matrixB, I32Attr:$inMPerThread, | ||
: Rock_Op<"blockwise_gemm_accel", [AttrSizedOperandSegments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want for this to also implement MemoryEffectsOpInterface
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, GemmOp and ThreadwiseAccelGemmOp do implement it, not sure why this one doesn't. I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and BlockwiseGemmOp implements it as well. This looks like a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This pass lowers `rock.blockwise_load_tile` to rock.threadwise_* ops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more detailed description of what this pass does here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Code LGTM! Did you check that there are is no performance variation? |
6f08a8c
to
86578df
Compare
Yes, I just finished doing that. Most of the tier1 gemm list problem configs generate the exact same assembly. For some problem configs (~18%), there are some with minor assembly changes but no performance difference. |
86578df
to
2c72ac3
Compare
…from device memory
2c72ac3
to
85413bb
Compare
Motivation
The purpose of this PR is to encapsulate the logic of loads from memory (+ store to LDS and load from LDS). The goal is to be able to use the same logic for attention.
Technical Details
Test Plan
Test Result
Tests pass
Submission Checklist