-
Notifications
You must be signed in to change notification settings - Fork 2k
[Blackwell] Refactor/slightly generalize warp specialization #6597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
7fc06c0
start introducing tokens
Mogball 4023111
hoist tmem alloc
Mogball 56e1c8c
cleanup
Mogball 225f241
add test for sinking into conditional
Mogball 94d991c
fix tests and some bugs
Mogball 9b742a0
fix repl token
Mogball 720a700
fix aws test
Mogball 107daf0
fix test
Mogball 19e37d4
fixing tests, remove TMEM tokens
Mogball c09d3c5
separate pass for removing TMEM tokens
Mogball d2acd19
fix tests
Mogball 8521ccf
schedule loops
Mogball db65c51
bench
Mogball 4a9a5ed
fix compile only test
Mogball c7cd1c4
delete dead code
Mogball ad6d7de
unused forward decl
Mogball 303df26
Merge remote-tracking branch 'origin/main' into mogball/tmem_toks
Mogball 7bc72fc
[Blackwell] Support DescriptorLoadOp when deciding to use shared memo…
csullivan 7e608e3
[Bench][Blackwell] Support optional scale TMAs in warp specialization…
csullivan 0dffe75
hoisttmemalloc checks that tokens are present
Mogball 8aa9165
add doc about tokens to op definitions
Mogball 1349758
Merge branch 'mogball/tmem_toks' into mogball/fmha
Mogball 54884c4
Merge remote-tracking branch 'origin/csullivan/support_block_scales_i…
Mogball 7767409
simplify util
Mogball 9d4ebe2
refactor LoadMMASpecialization to support any number of loads
Mogball 9e018e5
fix handling cycle in user partition
Mogball 4a72ab8
refactor loads into loadgroups
Mogball 695eb2a
Merge branch 'main' into mogball/tmem_toks
Mogball d63cb82
fix
Mogball bffcb5b
cleanup packLL utilities
Mogball e8f28b4
Merge branch 'main' into mogball/tmem_toks
Mogball d6a78f4
WIP refactoring...
Mogball 63b7da0
Merge remote-tracking branch 'origin/main' into mogball/fmha
Mogball 9f1fc29
Revert "[Blackwell] Support DescriptorLoadOp when deciding to use sha…
Mogball b9b73f9
Revert "[Bench][Blackwell] Support optional scale TMAs in warp specia…
Mogball fa8b255
Merge branch 'main' into mogball/tmem_toks
Mogball c5f8cb4
fix conflict
Mogball 8342d7a
Merge branch 'mogball/tmem_toks' into mogball/fmha
Mogball 2f74f02
loads work
Mogball 6e5b526
mmas are a pain
Mogball 2af0311
Merge remote-tracking branch 'origin/main' into mogball/tmem_toks
Mogball 5860abe
Merge branch 'mogball/tmem_toks' into mogball/fmha
Mogball 1f507ac
done but does it work?
Mogball dda423b
it deadlocks
Mogball d965b3b
works but ends too early
Mogball b4cb1af
fix regular matmul
Mogball 1140e1c
fix
Mogball 4eff58e
fixed
Mogball 38072a6
forgot to handle P
Mogball 8358c0d
fix optzn
Mogball 50c3b55
dep dialect
Mogball b4a8612
savepoint: OAI benchmarks look good
Mogball 2b90525
rename op
Mogball a912113
put scales into smem
Mogball 698e94f
put local load in user partition
Mogball 7ffbd86
add another test
Mogball ee8eda3
add another test
Mogball 93e42ba
Merge remote-tracking branch 'origin/main' into HEAD
Mogball 4fba3f6
refactor pipelineMMA
Mogball 1615916
handle peeled wait
Mogball d8887ac
Merge branch 'main' into mogball/fmha
Mogball File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -288,6 +288,7 @@ bool mlir::triton::getDisallowAccMultiBuffer(scf::ForOp forOp) { | |
std::pair<OpResult, int64_t> | ||
mlir::triton::getDefinitionAndDistance(scf::ForOp forOp, Value value) { | ||
int64_t distance = 0; | ||
DenseSet<Value> seen; | ||
while (auto arg = dyn_cast<BlockArgument>(value)) { | ||
// Ignore implicit captures. | ||
if (arg.getOwner() != forOp.getBody()) | ||
|
@@ -297,6 +298,8 @@ mlir::triton::getDefinitionAndDistance(scf::ForOp forOp, Value value) { | |
return {nullptr, 0}; | ||
++distance; | ||
value = forOp.getYieldedValues()[arg.getArgNumber() - 1]; | ||
if (!seen.insert(value).second) | ||
return {nullptr, 0}; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This also doesn't feel like refactoring :] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some of the refactoring exposed a bug :P |
||
return {cast<OpResult>(value), distance}; | ||
} | ||
|
@@ -358,14 +361,15 @@ Value mlir::triton::createScalarAlloc(ImplicitLocOpBuilder &rewriter, Type type, | |
} | ||
|
||
// Create an allocation and init the mbarriers. | ||
Value mlir::triton::createBarrierAlloc(scf::ForOp forOp, int numBarriers) { | ||
Value mlir::triton::createBarrierAlloc(scf::ForOp forOp, int numBarriers, | ||
int arriveCount) { | ||
ImplicitLocOpBuilder rewriter(forOp.getLoc(), forOp); | ||
|
||
Value barrierAlloc = | ||
createScalarAlloc(rewriter, rewriter.getI64Type(), numBarriers); | ||
for (unsigned i = 0; i < numBarriers; i++) { | ||
Value barrierView = createSingleBufferView(rewriter, barrierAlloc, i); | ||
rewriter.create<ttng::InitBarrierOp>(barrierView, 1); | ||
rewriter.create<ttng::InitBarrierOp>(barrierView, arriveCount); | ||
} | ||
// Invalidate and deallocate the barriers. | ||
rewriter.setInsertionPointAfter(forOp); | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of refactoring? Or is it addressing a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the refactor. Load groups can have multiple consumers