-
Notifications
You must be signed in to change notification settings - Fork 290
[Enhancement] Update async intrinsic handling in inject_fence_proxy #1068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughMarks tile-level wgmma intrinsics Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Traversal as IR traversal
participant Intrinsic as IsAsyncIntrinsic
participant ExternMap as Extern mapper
Note over Traversal: Inspect call nodes (call_intrin / call_extern)
Traversal->>Intrinsic: is call_intrin name == tl_gemm / tl_gemm_sp?
alt intrinsic is wgmma
Intrinsic-->>Traversal: Async
else not wgmma
Traversal->>ExternMap: is call_extern known?
alt known extern
ExternMap-->>Traversal: MappedKind (Async/Generic/etc.)
else unknown extern
ExternMap-->>Traversal: Generic (fallback)
end
end
Note over Traversal: Resolved kind drives fence-proxy insertion decision
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)testing/python/transform/test_tilelang_transform_inject_fence_proxy.py (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
* Added support for wgmma async intrinsics in IsAsyncIntrinsic function. * Changed handling of unknown externs to treat them as Generic instead of Async, improving accuracy in proxy kind determination.
|
It seems that we have to fix legacy tests, where gemm is still represented as extern🤔 |
|
I'll take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
testing/python/transform/test_tilelang_transform_inject_fence_proxy.py (1)
34-38: Use T.evaluate for consistency (optional).The switch to tl.tl_gemm intrinsic looks right for async handling. For consistency with other spots in this file (e.g., Lines 97–101 and 131), consider wrapping the intrinsic in T.evaluate.
Apply this minimal change:
- T.call_intrin("handle", tir.op.Op.get("tl.tl_gemm"), + T.evaluate(T.call_intrin("handle", tir.op.Op.get("tl.tl_gemm"), "tl::gemm_ss<128, 128, 32, 4, 1, 0, 0, 0, 32, 128, 0, 0, true>", T.tvm_access_ptr(T.type_annotation("float16"), A_shared.data, 0, 2048, 1), T.tvm_access_ptr(T.type_annotation("float16"), B_shared.data, 0, 2048, 1), - T.tvm_access_ptr(T.type_annotation("float32"), C_local.data, 0, 32, 3)) + T.tvm_access_ptr(T.type_annotation("float32"), C_local.data, 0, 32, 3)))Optionally add a sibling test using tl_gemm_sp to lock in its async classification as well.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
testing/python/transform/test_tilelang_transform_inject_fence_proxy.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
testing/python/transform/test_tilelang_transform_inject_fence_proxy.py (2)
tilelang/language/tir/op.py (1)
call_intrin(119-144)src/transform/inject_fence_proxy.cc (16)
op(145-159)op(145-145)op(176-201)op(176-176)op(203-225)op(203-203)op(227-231)op(227-227)op(233-242)op(233-233)op(244-250)op(244-244)op(252-257)op(252-252)op(259-269)op(259-259)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (1)
testing/python/transform/test_tilelang_transform_inject_fence_proxy.py (1)
49-53: After-path matches intended fence insertion before async tl_gemm.The updated intrinsic call in the “after” IR matches the transform’s new behavior and keeps the expected fence placement. LGTM.
testing/python/transform/test_tilelang_transform_inject_fence_proxy.py
Outdated
Show resolved
Hide resolved
really appreciate your help |
…proxy.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Fix #1002
This pull request updates the logic for identifying and handling asynchronous intrinsic calls in the
inject_fence_proxytransformation. The main improvements are to more accurately classify certain intrinsic operations and to refine the handling of unknown external calls, reducing unnecessary conservatism.Key changes include:
Intrinsic classification improvements:
tl_gemmandtl_gemm_spas asynchronous intrinsics inIsAsyncIntrinsic, ensuring these are properly treated as async operations.Extern call handling refinement:
AsynctoGenericinProxyFenceInjector, based on the guarantee that GEMM operations are always represented as intrinsic calls and not as externs. This prevents over-insertion of fences for extern calls that are not actually asynchronous.Summary by CodeRabbit
Bug Fixes
Tests