- 
                Notifications
    You must be signed in to change notification settings 
- Fork 292
[Bugfix] Do not force inline let stmt #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Caution Review failedThe pull request is closed. WalkthroughAdds tests and docs for LetStmt inlining, introduces a TL_FORCE_LET_INLINE config flag and conditional inlining, parameterizes examples/tests, and implements extensive transform and lowering changes for wrapper-aware pipeline injection/planning, buffer remapping/layout handling, shared-memory merge logic, multi-version buffer rewriting, and LetStmt simplification heuristics. Changes
 Sequence Diagram(s)sequenceDiagram
  autonumber
  participant For as ForLoop
  participant Planner as PipelinePlanner
  participant Unwrapper as WrapperUnwrapper
  participant Builder as PipelineBuilder
  participant Injector as Injector
  Note over For,Injector: Wrapper-aware pipeline injection (new flow)
  For->>Planner: identify pipeline_body_root
  Planner->>Unwrapper: traverse Let/If/Attr wrappers to find SeqStmt
  Unwrapper-->>Planner: SeqStmt + wrappers(stack)
  Planner->>Builder: build stages from SeqStmt
  Builder->>Injector: transformed pipeline body
  Injector->>Injector: reapply wrappers LIFO around body
  Injector-->>For: attach wrapped pipeline body
sequenceDiagram
  autonumber
  participant Lower as lower_tile_op
  participant Remap as RemapResolver
  participant Layout as LayoutFinder
  participant Analyzer as Analyzer
  participant IR as IRTransformer
  Note over Lower,IR: Buffer remap/layout resolution and Let bindings
  Lower->>Remap: Resolve BufferLoad/Var -> remapped buffer?
  Remap-->>Layout: query layout info
  alt remap found
    Lower->>IR: rewrite access_ptr/offset/mma with remapped expr
  else
    Lower->>IR: keep original access
  end
  Lower->>Analyzer: bind BufferLoad values into let_bindings_ for LetStmt scope
  Analyzer-->>Lower: supply bound values for later passes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
 Suggested reviewers
 Poem
 Pre-merge checks and finishing touches✅ Passed checks (3 passed)
 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (5)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
| 👋 Hi! Thank you for contributing to the TileLang project. Please remember to run  We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
testing/python/issue/test_tilelang_issue_814.py (2)
7-21: Consider adding a docstring.Adding a docstring would clarify the purpose of this kernel factory function and its parameters.
def test_tmp_var(N, block_N, dtype="float"): + """ + Create a kernel that tests temporary variable handling in TileLang. + + The kernel computes tmp = max(A[idx], 1) and uses it twice to ensure + the temporary is not over-inlined, which would cause redundant computation. + + Parameters + ---------- + N : int + Tensor size + block_N : int + Elements processed per block + dtype : str + Data type (default: "float") + """
14-19: Align thread count to block_N and drop manual bounds guard
- Safe memory access checks are automatically inserted by the
LegalizeSafeMemoryAccesspass, so a manualif idx < Nguard isn’t needed.- Replace
threads=128withthreads=block_Nto maintain one-to-one thread-to-element mapping and avoid mismatch. [line 14]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
- testing/python/issue/test_tilelang_issue_814.py(1 hunks)
- tilelang/engine/phase.py(0 hunks)
- tilelang/language/utils.py(0 hunks)
💤 Files with no reviewable changes (2)
- tilelang/engine/phase.py
- tilelang/language/utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
testing/python/issue/test_tilelang_issue_814.py (4)
tilelang/language/parallel.py (1)
Parallel(8-28)tilelang/jit/__init__.py (1)
compile(33-86)tilelang/transform/pass_config.py (1)
PassConfigKey(6-101)tilelang/utils/tensor.py (1)
torch_assert_close(220-312)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-test-amd
- GitHub Check: bot-task
- GitHub Check: format-check
🔇 Additional comments (2)
testing/python/issue/test_tilelang_issue_814.py (2)
24-49: LGTM! Test logic is sound.The test correctly validates temporary variable handling:
- Compiles the kernel with appropriate pass configs (disabling warp specialization and TMA lowering)
- Creates reference computation using PyTorch
- Validates both output tensors with reasonable tolerances
- The key test:
tmpis computed once and used twice (for both B and A updates), ensuring the compiler doesn't over-inline the let binding
52-54: LGTM! Clear test entry point.The docstring accurately describes the test's purpose.
…checks - Changed SKV parameter from 32768 to 8192 in sparse MLA backward and forward tests. - Added check_correctness parameter to test functions for validation of outputs. - Updated test cases to reflect new SKV values and correctness checks.
…in example_fusedmoe_tilelang.py - Added a new section for compiler internals in the documentation. - Refactored the main function in example_fusedmoe_tilelang.py to accept parameters for hidden dimensions, expert configurations, and batch/sequence sizes, improving flexibility and readability.
| a fix for issue #814 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (2)
examples/fusedmoe/example_fusedmoe_tilelang.py (2)
1-1: Fix the formatting issue flagged by CI.The pipeline failures indicate this file needs reformatting. Please run the project's formatter and commit the changes.
536-536: Aligngenerate_inputparameters with config keys
The function signature uses names likedhidden,dexpert,nroutedexperts, etc., while the config dict usesd_hidden,d_expert,n_routed_experts, etc., causing keyword argument mismatches. Rename either the config keys or the function parameters so they match exactly.
🧹 Nitpick comments (4)
docs/compiler_internals/letstmt_inline.md (4)
11-22: Clarify current pipeline behavior (no forced inlining).Given this PR stops forcing Let inlining in LowerAndLegalize, add a short note here that Simplify may still inline under these rules, but the pipeline no longer forces it. This avoids confusion about “when” inlining happens.
15-21: Avoid fragile line-number references; link to source or name symbols instead.Hardcoding source line numbers will drift. Replace “(lines 315–326)”, “line 357”, and “(lines 168–206)” with:
- Symbol names (e.g., CanInlineLetStmt, used_in_buffer_def_, VarUseDefAnalyzer)
- A GitHub permalink to the file/anchor
Also applies to: 47-55, 75-79
1-8: Add a brief “As of PR #947” note for traceability.Add one sentence noting that the pipeline no longer forces Let inlining as of PR #947, and this page documents the Simplify pass rules.
141-144: Use GitHub permalinks for Related Files Replace the plain paths under “Related Files” in docs/compiler_internals/letstmt_inline.md with clickable permalinks to the exact lines in src/transform/simplify.cc (e.g. line 315 for CanInlineLetStmt) and in testing/python/issue/test_tilelang_issue_814.py.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
- docs/compiler_internals/letstmt_inline.md(1 hunks)
- docs/index.md(1 hunks)
- examples/fusedmoe/example_fusedmoe_tilelang.py(1 hunks)
- examples/fusedmoe/test_example_fusedmoe.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
examples/fusedmoe/test_example_fusedmoe.py (1)
examples/fusedmoe/example_fusedmoe_tilelang.py (1)
main(524-545)
examples/fusedmoe/example_fusedmoe_tilelang.py (1)
examples/warp_specialize/example_warp_specialize_flashmla.py (1)
main(385-399)
🪛 GitHub Actions: CI Test on AMD
examples/fusedmoe/test_example_fusedmoe.py
[error] 1-1: Reformatted files. Please review and stage the changes.
examples/fusedmoe/example_fusedmoe_tilelang.py
[error] 1-1: Reformatted files. Please review and stage the changes.
🪛 GitHub Actions: CI Test on Metal
examples/fusedmoe/test_example_fusedmoe.py
[error] 1-1: Code reformatted by formatter. Changes not staged for commit. Please run 'git add' and commit the changes, or revert formatting if undesired.
examples/fusedmoe/example_fusedmoe_tilelang.py
[error] 1-1: Code reformatted by formatter. Changes not staged for commit. Please run 'git add' and commit the changes, or revert formatting if undesired.
🔇 Additional comments (1)
docs/index.md (1)
38-44: Section addition looks good; verify link resolves.New COMPILER INTERNALS toctree is consistent with existing style. Please confirm docs/compiler_internals/letstmt_inline.md builds and the link renders in the sidebar.
| If this causes issues (e.g., `A[idx]` being read twice with different values due to the first write), it indicates a potential problem with the inlining heuristic or the code pattern. | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revise the hazard explanation; current sequence doesn’t demonstrate a read-after-write issue.
In the example, both reads occur before the write to A[idx], so “due to the first write” isn’t accurate. Consider a case where A[idx] is written between two reads to illustrate why duplicated loads from inlining can change behavior, or rephrase to note duplicate loads impact performance but not semantics here.
-If this causes issues (e.g., `A[idx]` being read twice with different values due to the first write)...
+Be aware that inlining duplicates loads. If a write to the same location occurs between two uses, re‑evaluation can observe updated memory and change behavior. In this example, both reads occur before the write, so semantics are unchanged, though duplicated loads may affect performance.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| If this causes issues (e.g., `A[idx]` being read twice with different values due to the first write), it indicates a potential problem with the inlining heuristic or the code pattern. | |
| Be aware that inlining duplicates loads. If a write to the same location occurs between two uses, re-evaluation can observe updated memory and change behavior. In this example, both reads occur before the write, so semantics are unchanged, though duplicated loads may affect performance. | 
🤖 Prompt for AI Agents
In docs/compiler_internals/letstmt_inline.md around lines 126 to 127, the hazard
explanation is incorrect because the shown sequence has both reads before the
write so it does not demonstrate a read-after-write semantic change; either
replace the example with one where A[idx] is written between two reads (so the
first write can change the value seen by the second read) to illustrate the
semantic hazard, or rephrase the sentence to say that duplicated loads in this
specific sequence only affect performance (not semantics) and clarify when
duplicated loads can change behavior (i.e., when a write to the same location
occurs between the reads).
| def main(d_hidden=7168, d_expert=2048, n_routed_experts=8, n_shared_experts=1, n_experts_per_token=4, batch_size=1, seq_len=8192): | ||
| config = { | ||
| "dhidden": 7168, | ||
| "dexpert": 2048, | ||
| "nroutedexperts": 8, | ||
| "nsharedexperts": 1, | ||
| "nexpertspertoken": 4, | ||
| "bs": 1, | ||
| "seqlen": 8192, | ||
| "dhidden": d_hidden, | ||
| "dexpert": d_expert, | ||
| "nroutedexperts": n_routed_experts, | ||
| "nsharedexperts": n_shared_experts, | ||
| "nexpertspertoken": n_experts_per_token, | ||
| "bs": batch_size, | ||
| "seqlen": seq_len, | ||
| "seed": 81394 | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Config dictionary keys don't match usage throughout the codebase.
The config dictionary uses lowercase keys without underscores ("dhidden", "dexpert", "nroutedexperts", etc.), but the Expert, MoEGate, and MoE classes expect keys with underscores ("d_hidden", "d_expert", "n_routed_experts", etc.). This will cause KeyError exceptions at runtime.
For example:
- Line 282: self.d_hidden: int = config["d_hidden"]
- Line 283: self.d_expert: int = config["d_expert"]
- Line 300: self.top_k: int = config["n_experts_per_token"]
- Line 301: self.num_experts: int = config["n_routed_experts"]
Apply this diff to fix the config keys:
 def main(d_hidden=7168, d_expert=2048, n_routed_experts=8, n_shared_experts=1, n_experts_per_token=4, batch_size=1, seq_len=8192):
     config = {
-        "dhidden": d_hidden,
-        "dexpert": d_expert,
-        "nroutedexperts": n_routed_experts,
-        "nsharedexperts": n_shared_experts,
-        "nexpertspertoken": n_experts_per_token,
-        "bs": batch_size,
-        "seqlen": seq_len,
+        "d_hidden": d_hidden,
+        "d_expert": d_expert,
+        "n_routed_experts": n_routed_experts,
+        "n_shared_experts": n_shared_experts,
+        "n_experts_per_token": n_experts_per_token,
+        "batch_size": batch_size,
+        "seq_len": seq_len,
         "seed": 81394
     }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def main(d_hidden=7168, d_expert=2048, n_routed_experts=8, n_shared_experts=1, n_experts_per_token=4, batch_size=1, seq_len=8192): | |
| config = { | |
| "dhidden": 7168, | |
| "dexpert": 2048, | |
| "nroutedexperts": 8, | |
| "nsharedexperts": 1, | |
| "nexpertspertoken": 4, | |
| "bs": 1, | |
| "seqlen": 8192, | |
| "dhidden": d_hidden, | |
| "dexpert": d_expert, | |
| "nroutedexperts": n_routed_experts, | |
| "nsharedexperts": n_shared_experts, | |
| "nexpertspertoken": n_experts_per_token, | |
| "bs": batch_size, | |
| "seqlen": seq_len, | |
| "seed": 81394 | |
| } | |
| def main(d_hidden=7168, d_expert=2048, n_routed_experts=8, n_shared_experts=1, n_experts_per_token=4, batch_size=1, seq_len=8192): | |
| config = { | |
| "d_hidden": d_hidden, | |
| "d_expert": d_expert, | |
| "n_routed_experts": n_routed_experts, | |
| "n_shared_experts": n_shared_experts, | |
| "n_experts_per_token": n_experts_per_token, | |
| "batch_size": batch_size, | |
| "seq_len": seq_len, | |
| "seed": 81394 | |
| } | 
🤖 Prompt for AI Agents
In examples/fusedmoe/example_fusedmoe_tilelang.py around lines 524 to 534, the
config dict uses keys like "dhidden", "dexpert", "nroutedexperts", etc., but the
rest of the code expects underscored keys (e.g., "d_hidden", "d_expert",
"n_routed_experts", "n_shared_experts", "n_experts_per_token", "bs", "seqlen",
"seed"); update the dict keys to match those expected names (replace
"dhidden"->"d_hidden", "dexpert"->"d_expert",
"nroutedexperts"->"n_routed_experts", "nsharedexperts"->"n_shared_experts",
"nexpertspertoken"->"n_experts_per_token") and keep "bs", "seqlen", and "seed"
as-is so the Expert/MoEGate/MoE classes access config without KeyError.
- Changed the condition for buffer access from less than (<) to less than or equal to (<=) to allow access at the same scope level. - Adjusted the logic for determining the access level when touching buffers to ensure correct handling of scope levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
- src/transform/merge_shared_memory_allocations.cc(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/transform/merge_shared_memory_allocations.cc (1)
src/transform/storage_rewrite.cc (6)
buf(247-254)
buf(247-247)
buf(505-522)
buf(505-505)
buf(1715-1740)
buf(1715-1715)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: format-check
- GitHub Check: format-check
- GitHub Check: bot-task
🔇 Additional comments (2)
src/transform/merge_shared_memory_allocations.cc (2)
219-220: LGTM!The relaxation of the boundary check is consistent with the fix in
BufferLoadNodeand correctly handles buffer variable references at the same scope level. The comment clearly explains the change.
197-201: LGTM with consistency check The update in merge_shared_memory_allocations.cc is correct. storage_rewrite.cc still uses strictICHECK_LT(it->second.level, scope_.size()). Confirm whether it should also be relaxed toICHECK_LEand use theaccess_levellogic to prevent out-of-bounds.
| // When accessing at the same level, use that level | ||
| size_t access_level = std::min(it->second.level, scope_.size() - 1); | ||
| scope_[access_level].touched.push_back(buf); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against unsigned underflow in access_level calculation.
The expression scope_.size() - 1 can underflow when scope_.size() == 0 (wrapping to SIZE_MAX for size_t). While the ICHECK_LE guard makes this scenario unlikely in practice, relying on wraparound behavior is a code smell that could mask bugs or cause issues in unexpected edge cases.
Consider using a safer calculation that explicitly handles the zero case:
         if (enable_aggressive_merge) {
           scope_[scope_.size() - 1].touched.push_back(buf);
         } else {
           // When accessing at the same level, use that level
-          size_t access_level = std::min(it->second.level, scope_.size() - 1);
+          size_t access_level = (scope_.size() == 0) ? 0 : std::min(it->second.level, scope_.size() - 1);
           scope_[access_level].touched.push_back(buf);
         }Apply the same fix to both locations (lines 207-209 and lines 226-228).
Alternatively, add a runtime check:
           // When accessing at the same level, use that level
+          ICHECK_GT(scope_.size(), 0) << "Scope stack should not be empty when accessing buffers";
           size_t access_level = std::min(it->second.level, scope_.size() - 1);Also applies to: 226-228
🤖 Prompt for AI Agents
In src/transform/merge_shared_memory_allocations.cc around lines 207-209 and
226-228, the computation scope_.size() - 1 can underflow when scope_.size() ==
0; change the calculation to guard the zero case (e.g. compute size_t max_index
= scope_.empty() ? 0 : scope_.size() - 1; then use std::min(it->second.level,
max_index) for access_level) and replace both occurrences; optionally keep or
add a runtime check/ICHECK that scope_ is non-empty where logically required to
catch unexpected states.
- inline let-bound BufferLoad nodes via resolver helpers and structured return - remap layouts/buffers using original data vars and only rewrite when needed - update pipeline planner to understand let-bound address_of buffers - document the new inline behaviour in docs/let_inline_fix.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/transform/lower_tile_op.cc (1)
8-8: Remove unused#include <tvm/tir/op.h>(line 8). Notvm::tiroperator functions or symbols from this header are referenced inlower_tile_op.cc.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
- src/transform/lower_tile_op.cc(10 hunks)
- src/transform/pipeline_planning.cc(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/transform/pipeline_planning.cc (1)
src/transform/inject_pipeline.cc (20)
buffer_region(113-132)
buffer_region(114-114)
op(166-182)
op(166-166)
op(184-197)
op(184-184)
op(199-212)
op(199-199)
op(214-220)
op(214-214)
op(840-1020)
op(840-840)
op(1022-1039)
op(1022-1022)
op(1041-1058)
op(1041-1041)
buffer(398-453)
buffer(398-399)
buffer(461-470)
buffer(461-461)
src/transform/lower_tile_op.cc (3)
src/transform/layout_inference.cc (4)
expr(389-407)
expr(389-389)
buffer(409-415)
buffer(409-409)src/transform/inject_pipeline.cc (2)
call(134-164)
call(134-135)src/transform/pipeline_planning.cc (14)
op(48-57)
op(48-48)
op(87-143)
op(87-87)
op(166-184)
op(166-166)
op(186-204)
op(186-186)
op(206-260)
op(206-206)
op(262-272)
op(262-262)
op(703-712)
op(703-703)
🔇 Additional comments (10)
src/transform/pipeline_planning.cc (4)
209-222: LGTM! Enhanced buffer region resolution.The refactored
address_ofhandling now properly resolves bothBufferLoadandVarcases, making buffer region tracking more robust. The defensive programming with the early checkif (buffer_region.defined())prevents unnecessary additions.
413-449: LGTM! Robust unwrapping of pipeline body wrappers.The new unwrapping loop properly handles
LetStmtandIfThenElse(without else) wrappers around the pipeline body, aligning with the PR objective of not forcing inline let statements. The error message accurately describes the supported wrapper types.
455-455: Verify thatpipeline_body_rootis used consistently.Ensure that downstream processing uses
pipeline_body_seqfor iterating the pipeline stages, aspipeline_body_rootmay still contain wrappers. Based on line 458, this appears correct.
5-5: Retain the <tvm/tir/op.h> include – it provides definitions for tir::builtin::if_then_else() and tir::attr constants.src/transform/lower_tile_op.cc (6)
323-426: LGTM! Well-structured refactoring with AccessPtrResult.The introduction of
AccessPtrResultstruct and the refactored flow clearly separates cases where rewriting occurs from cases where the original expression is preserved. The use of helper functions (ResolveBufferLoad,FindRemapBuffer,FindLayout) improves modularity and readability.
428-440: LGTM! ResolveBufferLoad properly handles let-bound expressions.This helper correctly resolves variables to their let-bound
BufferLoadexpressions when available, enabling support for non-inlined let statements as per the PR objective.
442-478: LGTM! Comprehensive buffer and layout resolution.Both
FindRemapBufferandFindLayoutimplement thorough lookup strategies:
- Direct map lookup
- Lookup via
buffer_map_by data var- Fallback to same_as comparisons by data or name
This handles various buffer aliasing scenarios robustly.
502-543: LGTM! Consistent use of ResolveBufferLoad in PTX paths.The ptx_ldmatrix and mma_store handling now properly resolve let-bound buffer loads before transformation, ensuring correct remapping when buffers are wrapped in let statements.
594-616: LGTM! Proper LetStmt handling with scope management.The new
VisitStmt_forLetStmtNode:
- Records
BufferLoadvalues inlet_bindings_for later resolution- Binds pure expressions in the analyzer for constant folding
- Properly erases bindings after visiting the body to maintain correct scope
This implementation correctly supports non-inlined let statements as intended by the PR.
708-709: LGTM! let_bindings_ properly scoped.The
let_bindings_member is correctly managed:
- Added in
VisitStmt_(LetStmtNode)when visiting the let statement- Erased after visiting the body to maintain lexical scope
- Used by
ResolveBufferLoadfor resolutionNo memory leaks or scope issues detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
- src/transform/multi_version_buffer_rewriter.cc(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
PR: tile-ai/tilelang#794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.
Applied to files:
- src/transform/multi_version_buffer_rewriter.cc
🧬 Code graph analysis (1)
src/transform/multi_version_buffer_rewriter.cc (1)
src/transform/warp_specialized_rewriter.cc (36)
seq_stmt(917-990)
seq_stmt(918-919)
seq_stmt(1033-1090)
seq_stmt(1033-1033)
op(38-43)
op(38-38)
op(73-83)
op(73-73)
op(85-93)
op(85-85)
op(95-100)
op(95-95)
op(102-110)
op(102-102)
op(134-146)
op(134-134)
op(148-177)
op(148-148)
op(179-189)
op(179-179)
stmt(51-62)
stmt(51-51)
stmt(121-124)
stmt(121-121)
stmt(126-130)
stmt(126-126)
stmt(132-132)
stmt(132-132)
stmt(230-230)
stmt(230-230)
stmt(271-271)
stmt(271-271)
stmt(302-306)
stmt(302-302)
stmt(333-339)
stmt(333-334)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-test-metal
- GitHub Check: build-test-amd
| bool in_scope = false; | ||
| for (const StmtNode *ancestor : stmt_stack_) { | ||
| if (ancestor == lca) { | ||
| in_scope = true; | ||
| break; | ||
| } | ||
| } | ||
| if (!in_scope) | ||
| continue; | ||
| // Only double-buffer shared allocations; locals do not need versioning. | ||
| auto scope = buffer.scope(); | ||
| if (!(scope == "shared" || scope == "shared.dyn")) | ||
| continue; | ||
| if (seen.insert(buffer.get()).second) { | ||
| scoped_buffers.push_back(buffer); | ||
| } | ||
| } | ||
| for (auto it = stmt_stack_.rbegin(); it != stmt_stack_.rend(); ++it) { | ||
| if (!(*it)->IsInstance<BlockNode>()) | ||
| continue; | ||
| const auto *block = static_cast<const BlockNode *>(*it); | ||
| auto map_it = block_alloc_buffers_.find(block); | ||
| if (map_it == block_alloc_buffers_.end()) | ||
| continue; | ||
| for (const Buffer &buffer : map_it->second) { | ||
| auto scope = buffer.scope(); | ||
| if (!(scope == "shared" || scope == "shared.dyn")) | ||
| continue; | ||
| if (seen.insert(buffer.get()).second) { | ||
| scoped_buffers.push_back(buffer); | ||
| } | ||
| } | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix LCA scope tracking for Let/If wrappers.
We only push For/Block nodes into stmt_stack_, so when DetectBufferAccessLCA says a buffer’s LCA is a LetStmt or IfThenElse (exactly the wrappers we now keep instead of inlining), the ancestor scan never sees it. in_scope stays false, the buffer drops out of scoped_buffers, and we silently skip versioning—regressing double buffering for the new “non-inlined let” pipelines. Please push those wrapper nodes before visiting their bodies.
+  Stmt VisitStmt_(const LetStmtNode *op) final {
+    stmt_stack_.push_back(op);
+    Stmt stmt = StmtExprMutator::VisitStmt_(op);
+    stmt_stack_.pop_back();
+    return stmt;
+  }
+
+  Stmt VisitStmt_(const IfThenElseNode *op) final {
+    stmt_stack_.push_back(op);
+    Stmt stmt = StmtExprMutator::VisitStmt_(op);
+    stmt_stack_.pop_back();
+    return stmt;
+  }🤖 Prompt for AI Agents
In src/transform/multi_version_buffer_rewriter.cc around lines 301 to 333, the
ancestor scan for LCAs always checks stmt_stack_ which only contains For/Block
nodes, so LCAs that are LetStmt or IfThenElse wrappers are missed causing
buffers to be excluded from scoped_buffers; update the traversal to push LetStmt
and IfThenElse nodes onto stmt_stack_ before visiting their bodies (and pop them
after visiting) so DetectBufferAccessLCA finds wrapper LCAs and buffers get
properly considered for double-buffering.
| PR Summary via GPT-5 Pro – LetStmt-Aware Pipeline Fixes (Issue #814)Overview
 Compiler Pass Updates
 Examples & Tests
 Verification
 | 
| have to modify some codes for aws related pass, @chengyupku should take a look. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/transform/multi_version_buffer_rewriter.cc (1)
286-291: Fix LCA scope tracking: push Let/If/Attr onto stmt_stack_.When LCA is a LetStmt or IfThenElse (now preserved), in-scope detection fails because stmt_stack_ only tracks For/Block. Buffers with wrapper LCAs get skipped for versioning.
Add visitors that push/pop these wrappers, mirroring Block/For.
Stmt VisitStmt_(const BlockNode *op) final { stmt_stack_.push_back(op); Stmt stmt = StmtExprMutator::VisitStmt_(op); stmt_stack_.pop_back(); return stmt; } Stmt VisitStmt_(const ForNode *op) final { stmt_stack_.push_back(op); ... stmt_stack_.pop_back(); return for_node; } + + Stmt VisitStmt_(const LetStmtNode *op) final { + stmt_stack_.push_back(op); + Stmt stmt = StmtExprMutator::VisitStmt_(op); + stmt_stack_.pop_back(); + return stmt; + } + + Stmt VisitStmt_(const IfThenElseNode *op) final { + stmt_stack_.push_back(op); + Stmt stmt = StmtExprMutator::VisitStmt_(op); + stmt_stack_.pop_back(); + return stmt; + } + + Stmt VisitStmt_(const AttrStmtNode *op) final { + stmt_stack_.push_back(op); + Stmt stmt = StmtExprMutator::VisitStmt_(op); + stmt_stack_.pop_back(); + return stmt; + }Also applies to: 293-406, 474-485
🧹 Nitpick comments (5)
examples/blocksparse_attention/test_example_blocksparse_attention.py (1)
10-11: Consider parameterizing the remaining tests for consistency.Tests
test_block_sparse_attn_triton,test_example_tilelang_block_sparse_attn,test_example_triton_sparse_gqa_decode_varlen_indice, andtest_example_triton_sparse_gqa_decode_varlen_maskstill callmain()without explicit parameters. If their default configurations also result in long CI runtimes, consider applying similar parameterization for consistency and efficiency.Also applies to: 14-15, 26-31
testing/python/transform/test_tilelang_transform_multi_version_buffer.py (2)
108-135: Good regression test for let-bound producer/consumer.Covers non-inlined let across pipeline stages and matches expected double buffering (k % 2). Consider adding complementary cases:
- Single-branch IfThenElse wrapper around the pipeline body.
- AttrStmt wrapper (e.g., pragma) above SeqStmt.
This will guard the additional wrapper paths tightened in the pass.
114-121: Optional: zero‑init accum to avoid undefined reads if executed.If this TIR is ever run, accum[i] += shared[...] reads undefined initial values. You can add a small init loop before k.
accum = T.alloc_buffer((8,), "float32", scope="local") + for i in T.serial(8): + accum[i] = T.float32(0) for k in T.serial(4, annotations={"num_stages": T.int32(2)}):Also applies to: 126-134
src/transform/multi_version_buffer_rewriter.cc (2)
142-173: Consider unwrapping single-branch IfThenElse in collect_stmts.collect_stmts flattens Let/Attr/Block wrappers but leaves IfThenElse. For trivial guards inserted around stages, flattening the then_case (when else is absent) can stabilize stage detection and access analysis.
Optional improvement: unwrap IfThenElse without else same as Let/Attr.
if (const auto *let = stmt.as<LetStmtNode>()) { collect_stmts(let->body); return; } if (const auto *attr = stmt.as<AttrStmtNode>()) { collect_stmts(attr->body); return; } + if (const auto *ite = stmt.as<IfThenElseNode>()) { + if (!ite->else_case.defined()) { + collect_stmts(ite->then_case); + return; + } + }
434-472: Also rewrite address_of to include version offset (optional).Only tvm_access_ptr is adjusted. Calls like tir.address_of(buffer[...]) or builtin::tvm_address_of may appear after letting/intrinsic lowering and need the version stride added.
Add a branch similar to tvm_access_ptr to rewrite address_of indices.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
- examples/blocksparse_attention/test_example_blocksparse_attention.py(1 hunks)
- src/transform/multi_version_buffer_rewriter.cc(6 hunks)
- src/transform/simplify.cc(2 hunks)
- testing/python/transform/test_tilelang_transform_multi_version_buffer.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
PR: tile-ai/tilelang#794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.
Applied to files:
- src/transform/multi_version_buffer_rewriter.cc
🧬 Code graph analysis (4)
src/transform/simplify.cc (2)
src/transform/lower_tile_op.cc (2)
expr(428-440)
expr(428-428)src/transform/layout_inference.cc (2)
expr(389-407)
expr(389-389)
testing/python/transform/test_tilelang_transform_multi_version_buffer.py (2)
tilelang/language/ast/ir.py (1)
alloc_buffer(441-508)src/transform/storage_rewrite.cc (1)
i(705-841)
examples/blocksparse_attention/test_example_blocksparse_attention.py (2)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py (2)
main(173-186)
main(381-468)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (2)
main(162-173)
main(371-449)
src/transform/multi_version_buffer_rewriter.cc (1)
src/transform/warp_specialized_rewriter.cc (17)
stmt(51-62)
stmt(51-51)
stmt(121-124)
stmt(121-121)
stmt(126-130)
stmt(126-126)
stmt(132-132)
stmt(132-132)
stmt(230-230)
stmt(230-230)
stmt(271-271)
stmt(271-271)
seq_stmt(917-990)
seq_stmt(918-919)
seq_stmt(1033-1090)
seq_stmt(1033-1033)
marker(1181-1181)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-metal
🔇 Additional comments (5)
examples/blocksparse_attention/test_example_blocksparse_attention.py (2)
18-19: LGTM! Explicit parameterization reduces CI runtime.The explicit parameters (
batch=1,max_cache_seqlen=2048) reduce the test workload from the defaults (batch=8, max_cache_seqlen=8192), aligning with the PR's goal of parameterizing examples for faster CI runs.
22-23: LGTM! Consistent parameterization with the other test.The parameters match line 19, ensuring consistency across the modified tests.
src/transform/simplify.cc (3)
8-8: LGTM!The inclusion of
tvm/tir/analysis.hsupports the new buffer alias analysis functionality.
15-15: LGTM!The
<optional>header is appropriate for C++17 optional types used in this file.
356-370: Verify the non-unit dimension counting aligns with intended removal criteria.The current logic counts non-unit dimensions to decide on buffer alias removal. Please confirm:
Should a single-dimension vectorized access (e.g.,
buffer[ramp(0, 1, 4)]representing a 4-lane vector) trigger removal? The comment mentions "or carries vector lanes" as a separate condition, but the currentnon_unit_dims > 1check would NOT remove such cases.
Is the threshold
non_unit_dims > 1(2+ dims) intentional, or should it be>= 2or some other value?If single-dimension vectorized accesses should also be removed, consider detecting Ramp nodes explicitly:
Array<Range> ranges = get_ranges(value); if (!ranges.empty()) { int non_unit_dims = 0; + bool has_ramp = false; for (const Range &range : ranges) { PrimExpr extent = analyzer_->Simplify(range->extent); if (is_const_int(extent, 1) || analyzer_->CanProveEqual(extent, 1)) { continue; } + // Check if this came from a Ramp (vectorized access) + if (value.as<BufferLoadNode>()) { + auto load = value.as<BufferLoadNode>(); + for (const PrimExpr &index : load->indices) { + if (index.as<RampNode>()) { + has_ramp = true; + } + } + } ++non_unit_dims; if (non_unit_dims > 1) { remove_buffer_alias = true; break; } } + if (has_ramp) { + remove_buffer_alias = true; + } }
| const SeqStmtNode *pipeline_body_seq = nullptr; | ||
| { | ||
| // Traverse trivial wrappers (let/if) to find the actual SeqStmt body. | ||
| Stmt current = pipeline_body_root; | ||
| while (true) { | ||
| if (const auto *seq_stmt = current.as<SeqStmtNode>()) { | ||
| pipeline_body_seq = seq_stmt; | ||
| break; | ||
| } | ||
| if (const auto *if_then_else = current.as<IfThenElseNode>()) { | ||
| ICHECK(!if_then_else->else_case.defined()) | ||
| << "MultiVersionBuffer: Can't handle the body of the loop " | ||
| "because the IfThenElse node has an else branch"; | ||
| current = if_then_else->then_case; | ||
| continue; | ||
| } | ||
| if (const auto *let_stmt = current.as<LetStmtNode>()) { | ||
| current = let_stmt->body; | ||
| continue; | ||
| } | ||
| LOG(FATAL) | ||
| << "MultiVersionBuffer: Can't handle the body of the loop because " | ||
| << "it is not a SeqStmt, IfThenElse without else, " | ||
| << "or LetStmt wrapping them, but got " << current->GetTypeKey(); | ||
| } | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unwrap AttrStmt while locating pipeline body.
The finder unwraps IfThenElse and LetStmt but not AttrStmt. If For->body is an AttrStmt wrapping a SeqStmt (common after LetInline removal), this path LOG(FATAL)s.
Add AttrStmt unwrapping alongside Let/If.
       while (true) {
         if (const auto *seq_stmt = current.as<SeqStmtNode>()) {
           pipeline_body_seq = seq_stmt;
           break;
         }
         if (const auto *if_then_else = current.as<IfThenElseNode>()) {
           ICHECK(!if_then_else->else_case.defined())
               << "MultiVersionBuffer: Can't handle the body of the loop "
                  "because the IfThenElse node has an else branch";
           current = if_then_else->then_case;
           continue;
         }
+        if (const auto *attr_stmt = current.as<AttrStmtNode>()) {
+          current = attr_stmt->body;
+          continue;
+        }
         if (const auto *let_stmt = current.as<LetStmtNode>()) {
           current = let_stmt->body;
           continue;
         }
         LOG(FATAL)
             << "MultiVersionBuffer: Can't handle the body of the loop because "
             << "it is not a SeqStmt, IfThenElse without else, "
-            << "or LetStmt wrapping them, but got " << current->GetTypeKey();
+            << "or LetStmt/AttrStmt wrapping them, but got "
+            << current->GetTypeKey();
       }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const SeqStmtNode *pipeline_body_seq = nullptr; | |
| { | |
| // Traverse trivial wrappers (let/if) to find the actual SeqStmt body. | |
| Stmt current = pipeline_body_root; | |
| while (true) { | |
| if (const auto *seq_stmt = current.as<SeqStmtNode>()) { | |
| pipeline_body_seq = seq_stmt; | |
| break; | |
| } | |
| if (const auto *if_then_else = current.as<IfThenElseNode>()) { | |
| ICHECK(!if_then_else->else_case.defined()) | |
| << "MultiVersionBuffer: Can't handle the body of the loop " | |
| "because the IfThenElse node has an else branch"; | |
| current = if_then_else->then_case; | |
| continue; | |
| } | |
| if (const auto *let_stmt = current.as<LetStmtNode>()) { | |
| current = let_stmt->body; | |
| continue; | |
| } | |
| LOG(FATAL) | |
| << "MultiVersionBuffer: Can't handle the body of the loop because " | |
| << "it is not a SeqStmt, IfThenElse without else, " | |
| << "or LetStmt wrapping them, but got " << current->GetTypeKey(); | |
| } | |
| } | |
| const SeqStmtNode *pipeline_body_seq = nullptr; | |
| { | |
| // Traverse trivial wrappers (let/if) to find the actual SeqStmt body. | |
| Stmt current = pipeline_body_root; | |
| while (true) { | |
| if (const auto *seq_stmt = current.as<SeqStmtNode>()) { | |
| pipeline_body_seq = seq_stmt; | |
| break; | |
| } | |
| if (const auto *if_then_else = current.as<IfThenElseNode>()) { | |
| ICHECK(!if_then_else->else_case.defined()) | |
| << "MultiVersionBuffer: Can't handle the body of the loop " | |
| "because the IfThenElse node has an else branch"; | |
| current = if_then_else->then_case; | |
| continue; | |
| } | |
| if (const auto *attr_stmt = current.as<AttrStmtNode>()) { | |
| current = attr_stmt->body; | |
| continue; | |
| } | |
| if (const auto *let_stmt = current.as<LetStmtNode>()) { | |
| current = let_stmt->body; | |
| continue; | |
| } | |
| LOG(FATAL) | |
| << "MultiVersionBuffer: Can't handle the body of the loop because " | |
| << "it is not a SeqStmt, IfThenElse without else, " | |
| << "or LetStmt/AttrStmt wrapping them, but got " | |
| << current->GetTypeKey(); | |
| } | |
| } | 
🤖 Prompt for AI Agents
In src/transform/multi_version_buffer_rewriter.cc around lines 319 to 344, the
traversal that unwraps trivial wrappers when locating the SeqStmt body handles
IfThenElse and LetStmt but omits AttrStmt, causing a fatal error when the loop
body is an AttrStmt-wrapped SeqStmt; add a branch that recognizes AttrStmtNode
and sets current = attr_stmt->body (continuing the loop) alongside the existing
Let/If handling so AttrStmt wrappers are unwrapped before the SeqStmt check.
| // TileLang emits aliases like `X_shared = buffer[0:128, 0:32]` to annotate | ||
| // fragment types. TVM currently reinterprets vectorized/shared accesses as | ||
| // Let-bound BufferLoad/BufferRegion nodes. If these bindings survive, later | ||
| // passes (Layout rewrite, FlattenBuffer) substitute them with vector lanes | ||
| // that our layout can't handle. Force-inline (by dropping the let) whenever | ||
| // the alias spans more than 2 dims or carries vector lanes. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the comment to match implementation logic.
The comment states removal triggers when the alias "spans more than 2 dims or carries vector lanes", but the implementation at line 365 checks non_unit_dims > 1, which means 2+ non-unit dimensions. This creates two mismatches:
- "More than 2 dims" typically means 3+, but the code triggers on 2+.
- The "or carries vector lanes" suggests a vectorized single-dimension access should also trigger removal, but it won't because a single Ramp index contributes only 1 non-unit dimension.
Apply this diff to align the comment with the actual logic:
-    // TileLang emits aliases like `X_shared = buffer[0:128, 0:32]` to annotate
-    // fragment types. TVM currently reinterprets vectorized/shared accesses as
-    // Let-bound BufferLoad/BufferRegion nodes. If these bindings survive, later
-    // passes (Layout rewrite, FlattenBuffer) substitute them with vector lanes
-    // that our layout can't handle. Force-inline (by dropping the let) whenever
-    // the alias spans more than 2 dims or carries vector lanes.
+    // TileLang emits aliases like `X_shared = buffer[0:128, 0:32]` to annotate
+    // fragment types. TVM currently reinterprets vectorized/shared accesses as
+    // Let-bound BufferLoad/BufferRegion nodes. If these bindings survive, later
+    // passes (Layout rewrite, FlattenBuffer) substitute them with vector lanes
+    // that our layout can't handle. Force-inline (by dropping the let) whenever
+    // the alias has 2 or more non-unit dimensions (counting vector lanes).Alternatively, if the intent is to also catch single-dimension vectorized accesses, update the logic:
     if (!ranges.empty()) {
       int non_unit_dims = 0;
+      bool has_vector_lanes = false;
       for (const Range &range : ranges) {
         PrimExpr extent = analyzer_->Simplify(range->extent);
         if (is_const_int(extent, 1) || analyzer_->CanProveEqual(extent, 1)) {
           continue;
         }
+        if (is_const_int(extent) && *as_const_int(extent) > 1) {
+          has_vector_lanes = true;
+        }
         ++non_unit_dims;
         if (non_unit_dims > 1) {
           remove_buffer_alias = true;
           break;
         }
       }
+      if (has_vector_lanes) {
+        remove_buffer_alias = true;
+      }
     }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // TileLang emits aliases like `X_shared = buffer[0:128, 0:32]` to annotate | |
| // fragment types. TVM currently reinterprets vectorized/shared accesses as | |
| // Let-bound BufferLoad/BufferRegion nodes. If these bindings survive, later | |
| // passes (Layout rewrite, FlattenBuffer) substitute them with vector lanes | |
| // that our layout can't handle. Force-inline (by dropping the let) whenever | |
| // the alias spans more than 2 dims or carries vector lanes. | |
| // TileLang emits aliases like `X_shared = buffer[0:128, 0:32]` to annotate | |
| // fragment types. TVM currently reinterprets vectorized/shared accesses as | |
| // Let-bound BufferLoad/BufferRegion nodes. If these bindings survive, later | |
| // passes (Layout rewrite, FlattenBuffer) substitute them with vector lanes | |
| // that our layout can't handle. Force-inline (by dropping the let) whenever | |
| // the alias has 2 or more non-unit dimensions (counting vector lanes). | 
| if (remove_buffer_alias) { | ||
| Stmt body = this->VisitStmt(op->body); | ||
| bool used = UsesVar( | ||
| body, [&](const VarNode *var) { return var == op->var.get(); }); | ||
| ICHECK(!used) << "Let binding of BufferLoad is expected to be unused " | ||
| "before removal " | ||
| << op->var << " : " << op->value << " ."; | ||
| return body; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ICHECK could crash on valid code; consider graceful fallback.
The ICHECK at line 375 will crash the compiler if the let-bound variable is used in the body. While the comment states "expected to be unused", this expectation may not always hold—especially during incremental refactoring or in edge cases not covered by tests. A compiler crash is a poor user experience.
Consider one of these alternatives:
Option 1 (preferred): Fall back to normal inlining if the variable is used:
     if (remove_buffer_alias) {
       Stmt body = this->VisitStmt(op->body);
       bool used = UsesVar(
           body, [&](const VarNode *var) { return var == op->var.get(); });
-      ICHECK(!used) << "Let binding of BufferLoad is expected to be unused "
-                       "before removal "
-                    << op->var << " : " << op->value << " .";
-      return body;
+      if (!used) {
+        return body;
+      }
+      // Variable is used; fall back to normal inlining logic below
+      remove_buffer_alias = false;
     }Option 2: Perform actual inlining (substitute the variable) instead of just removal:
     if (remove_buffer_alias) {
-      Stmt body = this->VisitStmt(op->body);
-      bool used = UsesVar(
-          body, [&](const VarNode *var) { return var == op->var.get(); });
-      ICHECK(!used) << "Let binding of BufferLoad is expected to be unused "
-                       "before removal "
-                    << op->var << " : " << op->value << " .";
-      return body;
+      // Inline the buffer alias by substituting it in the body
+      analyzer_->Bind(op->var, value);
+      Stmt body = this->VisitStmt(op->body);
+      return body;
     }Option 3: If the ICHECK is truly an invariant, add a comment explaining why the variable is guaranteed to be unused (e.g., due to upstream pass behavior).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (remove_buffer_alias) { | |
| Stmt body = this->VisitStmt(op->body); | |
| bool used = UsesVar( | |
| body, [&](const VarNode *var) { return var == op->var.get(); }); | |
| ICHECK(!used) << "Let binding of BufferLoad is expected to be unused " | |
| "before removal " | |
| << op->var << " : " << op->value << " ."; | |
| return body; | |
| } | |
| if (remove_buffer_alias) { | |
| Stmt body = this->VisitStmt(op->body); | |
| bool used = UsesVar( | |
| body, [&](const VarNode *var) { return var == op->var.get(); }); | |
| if (!used) { | |
| return body; | |
| } | |
| // Variable is used; fall back to normal inlining logic below | |
| remove_buffer_alias = false; | |
| } | 
🤖 Prompt for AI Agents
In src/transform/simplify.cc around lines 371-379, replace the ICHECK that
aborts when the let-bound variable is actually used with a graceful fallback:
detect when used==true and in that case do not crash but return a
preserved/updated Let binding (e.g., reconstruct the original Let with the
visited value and visited body or simply return a let node built from op->var,
the visited value and the visited body) so the pass continues; only remove the
let when used==false. Ensure no crash occurs and the original behavior (removal
when unused) is kept.
| ci has failed due to limited storage, but tests and examples have been passed in my local environment. | 
- register the new config in builtin headers/registration - add helper to pipeline enabling LetInline based on pass context - document LetStmt inlining controls and usage
as title.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests