-
Notifications
You must be signed in to change notification settings - Fork 290
[Refactor] Merge bulk copy into copy and improve layout inference for bulk copy #746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
3d4be5c
[Refactor] Merge bulk copy into copy and refactor layout inference fo…
LeiWang1999 c9a6980
lint fix
LeiWang1999 eb098c6
Merge branch 'main' of https://github.com/tile-ai/tilelang into tma_0821
LeiWang1999 ca4a10f
Fix typos in intrinsic names and remove unused print statement in blo…
LeiWang1999 92fa010
remove bulk copy
LeiWang1999 a55f951
Refactor copy and atomic add operations to support TMA lower configur…
LeiWang1999 5871edb
Enhance TMA bulk copy logic in `LowerBulkCopy` method
LeiWang1999 e33e343
lint fix
LeiWang1999 f51eb48
Remove fallback logging for non-swizzled global layout in `LowerBulkC…
LeiWang1999 3f167ef
Enhance reshape kernel compilation in `run_reshape` and `run_reshape_…
LeiWang1999 cf807cb
Update `native_sparse_attention` function to include TMA configuratio…
LeiWang1999 d54d1aa
Refactor JIT decorator formatting in `native_sparse_attention` function
LeiWang1999 7d0bd4b
Enhance thread management and logging in TileLang compilation
LeiWang1999 b268e47
Add warp specialization scope and refactor register management in Til…
LeiWang1999 00b2fd8
Refactor test for InjectSetMaxNReg pass in TileLang
LeiWang1999 ee01a93
Enhance bulk copy and store checks in `Copy` class
LeiWang1999 0f11130
lint fix
LeiWang1999 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Replacing specialized AtomicAdd vectorizer with generic VectorizeLoop likely breaks vectorization semantics
The previous path used
VectorizeAtomicAddwhich:call_extern("AtomicAdd", ...)intoAtomicAddx2/AtomicAddx4with pointer-to-src conventions.Switching to
VectorizeLoop(thread_loop)bypasses that rewriter. UnlessVectorizeLoopknows how to handle AtomicAdd and emitAtomicAddx{2,4}, vectorization will either be skipped or produce invalid IR for the extern call.Reinstate the specialized vectorizer here until
VectorizeLoopsubsumes it:Confirm that
AtomicAddx2/AtomicAddx4are defined/linked in the runtime:If you intend to keep
VectorizeLoop, point me to the lowering that handlesAtomicAddso I can review that path.🏁 Script executed:
Length of output: 206
🏁 Script executed:
Length of output: 4981
🏁 Script executed:
Length of output: 6630
Reinstate the specialized AtomicAdd vectorizer to preserve correct semantics
The change from
VectorizeAtomicAddto a plainVectorizeLoopbypasses the custom rewriting that emitsAtomicAddx2/AtomicAddx4, leading to either skipped vectorization or invalid IR for the extern call.• We verified that
AtomicAddx2andAtomicAddx4are indeed defined and used in the codebase:tilelang/language/customize.py.src/transform/atomicadd_vectorize.ccpushes"AtomicAddx2"/"AtomicAddx4"into the call arguments based on vector size .src/tl_templates/cuda/common.h, underlined byTL_DEVICE void AtomicAddx2(…)and…AtomicAddx4(…).• The generic
VectorizeLoopimplementation (insrc/transform/loop_vectorize.ccandloop_vectorize.cc) contains no special handling for extern calls to"AtomicAdd"—it merely vectorizes based on arithmetic patterns, so it won’t rewrite or emit the correctAtomicAddxNcalls.Action: Revert to the specialized vectorizer call in
src/op/atomic_add.ccuntil the generic path is extended to coverAtomicAdd.📝 Committable suggestion
🤖 Prompt for AI Agents