- 
                Notifications
    You must be signed in to change notification settings 
- Fork 292
          [Language] Support tilelang alloc_var(dtype, init=x) 
          #1092
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…attach it to
    generated Allocates and the PrimFunc attrs
  - thread the map through FlattenBuffer and StorageRewrite so flattened/merged
    allocations keep their tl.local_var_init annotations
  - teach annotation handling to accept scalar initializers, resolve buffers, and merge
    with existing stat
    | 👋 Hi! Thank you for contributing to the TileLang project. Please remember to run  We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 | 
| WalkthroughAdds a new attribute constant  Changes
 Sequence Diagram(s)sequenceDiagram
    participant User
    participant TileLang
    participant Passes
    participant Codegen
    User->>TileLang: alloc_var(dtype, init=expr)
    TileLang->>TileLang: parse args, attach attr(kLocalVarInit)
    TileLang->>Passes: PrimFunc (with attrs.kLocalVarInit)
    rect rgba(200,220,240,0.3)
    Note over Passes: Transform passes extract and forward<br/>local_var_init_map_ through flatten, opaque lowering, storage rewrite
    Passes->>Passes: populate local_var_init_map_
    Passes->>Passes: attach annotations to Allocate nodes / buffers
    Passes->>Codegen: lowered IR with Allocate annotations
    end
    rect rgba(220,200,240,0.3)
    Note over Codegen: CUDA codegen checks kLocalVarInit for local.var
    Codegen->>Codegen: if present -> cast init -> emit initializer
    Codegen->>User: Generated CUDA kernel with custom init
    end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested reviewers
 Poem
 Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
tilelang/language/allocate.py (1)
70-117: Consider simplifying the argument parsing logic.The current implementation handles multiple calling patterns for backward compatibility, but the complexity makes it harder to maintain and understand. The logic is correct, but consider these alternatives:
Option 1 (Recommended): Deprecate the string-as-scope positional pattern in favor of explicit keyword arguments:
def alloc_var(dtype, init=None, scope="local.var"): buffer = T.alloc_buffer([1], dtype, scope=scope) if init is not None: block_attr({"tl.local_var_init": {buffer.data: init}}) return bufferThis is cleaner and more explicit, though it breaks backward compatibility with
alloc_var(dtype, "shared").
Option 2: Keep the current approach but add validation to ensure string initializers aren't silently treated as scope:
if len(args) == 1: arg = args[0] if isinstance(arg, str): if parsed_init is None and scope == "local.var": parsed_scope = arg else: raise TypeError("String arguments are treated as scope for backward compatibility. Use init= keyword for initializers.")The current implementation is functional, but the backward compatibility burden may outweigh the benefits. Consider documenting the deprecation path in the docstring.
src/transform/lower_opaque_block.cc (1)
219-261: Clarify the buffer resolution fallback behavior.The
ResolveLocalVarBufferlogic uses a fallback strategy that may be surprising:
- First, it looks for a buffer with
local.varscope (lines 251-255)- Then, it falls back to the last buffer in
alloc_buffersif no local.var is found (lines 257-259)Potential Issue: If a block allocates multiple buffers (e.g.,
sharedandlocalbuffers), and the user provides a barePrimExprinitializer, the fallback may attach the initializer to the wrong buffer.Example scenario:
with T.block(): shared_buf = alloc_shared((128,), "float32") local_buf = alloc_local((16,), "float32") block_attr({"tl.local_var_init": 1.0}) # Which buffer gets initialized?The fallback would choose
local_buf(last buffer), which may or may not be the intended target.Recommendations:
- Document this fallback behavior clearly in the code comments and user documentation
- Consider making the bare
PrimExprform require that there is exactly onelocal.varbuffer, failing with a clear error otherwise- Alternatively, always require the explicit
Map<Var, PrimExpr>form when multiple buffers are presentThe current implementation logs a warning (line 229), which is good, but users may not see warnings during compilation. An error would be safer for ambiguous cases.
To verify the current behavior, you could add a test case with multiple buffer allocations to confirm the fallback selects the expected buffer:
@T.prim_func def test_multi_buffer_init(): with T.block(): buf1 = T.alloc_buffer((128,), "float32", scope="shared") buf2 = T.alloc_buffer((16,), "float32", scope="local") T.block_attr({"tl.local_var_init": 42.0}) # Which buffer is initialized?testing/python/language/test_tilelang_language_alloc.py (3)
97-101: Redundant copy operation preceding overwrite.Line 99 copies
AtoB, but lines 100-101 immediately overwrite all elements ofBwithtmp. The copy serves no purpose in this test and may confuse readers about the test's intent.Consider removing the redundant copy:
with T.Kernel(T.ceildiv(N, block_N), threads=block_N) as bx: tmp = T.alloc_var(dtype, init_value) - T.copy(A[bx * block_N], B[bx * block_N]) for i in T.Parallel(block_N): B[bx * block_N + i] = tmp
136-141: Redundant copy operation preceding overwrite.Similar to the single-variable test, line 139 copies
AtoB, but line 141 immediately overwrites all elements ofB. The copy is unnecessary.Apply this diff:
with T.Kernel(T.ceildiv(N, block_N), threads=block_N) as bx: tmp0 = T.alloc_var(dtype, 1) tmp1 = T.alloc_var(dtype, 2) - T.copy(A[bx * block_N], B[bx * block_N]) for i in T.Parallel(block_N): B[bx * block_N + i] = tmp0 + tmp1
106-118: String-based code verification is fragile.Lines 117, 156, and 157 verify initializers by searching for exact string patterns in generated code (e.g.,
"= 5;","= 1;"). This approach is brittle—minor formatting changes in code generation would break tests despite correct functionality.Consider adding runtime execution tests that verify actual behavior rather than relying solely on string matching. For example:
def test_alloc_var_with_initializer_runtime(): import numpy as np N, block_N, init_value = 256, 64, 5 program = alloc_var_with_initializer(N, block_N, "int32", init_value) kernel = tilelang.compile(program, out_idx=[1]) a_np = np.random.randint(0, 100, (N,), dtype="int32") b_np = np.zeros((N,), dtype="int32") kernel(a_np, b_np) # Verify that all elements of b_np are set to init_value assert np.all(b_np == init_value), f"Expected all {init_value}, got {b_np}"If code-level verification is still desired, consider using a more flexible pattern like
re.search(rf'\b\w+\s*=\s*{init_value}\s*;', code)to tolerate whitespace variations.Also applies to: 146-158
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
- src/op/builtin.h(1 hunks)
- src/target/codegen_cuda.cc(1 hunks)
- src/transform/flatten_buffer.cc(4 hunks)
- src/transform/lower_opaque_block.cc(7 hunks)
- src/transform/storage_rewrite.cc(9 hunks)
- testing/python/language/test_tilelang_language_alloc.py(1 hunks)
- tilelang/language/allocate.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/transform/flatten_buffer.cc (2)
src/transform/simplify.cc (2)
func(222-274)
func(223-225)tilelang/language/ast/ir.py (1)
init(361-369)
tilelang/language/allocate.py (1)
tilelang/language/ast/ir.py (3)
block_attr(430-438)
init(361-369)
alloc_buffer(441-508)
src/transform/lower_opaque_block.cc (2)
src/transform/inject_fence_proxy.cc (4)
f(133-140)
f(133-133)
f(166-173)
f(166-166)src/transform/legalize_safe_memory_access.cc (12)
f(286-298)
f(286-286)
buffer(87-95)
buffer(87-87)
buffer(98-138)
buffer(98-99)
buffer(256-260)
buffer(256-256)
buffer(262-265)
buffer(262-262)
buffer(267-270)
buffer(267-267)
testing/python/language/test_tilelang_language_alloc.py (2)
tilelang/language/allocate.py (1)
alloc_var(70-117)tilelang/language/copy.py (1)
copy(10-86)
src/transform/storage_rewrite.cc (1)
src/transform/lower_opaque_block.cc (12)
annotations(208-248)
annotations(209-212)
op(62-109)
op(62-62)
op(110-116)
op(110-110)
op(118-151)
op(118-118)
op(153-166)
op(153-153)
f(45-59)
f(45-45)
🪛 Ruff (0.14.1)
tilelang/language/allocate.py
98-98: Avoid specifying long messages outside the exception class
(TRY003)
102-102: Avoid specifying long messages outside the exception class
(TRY003)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
108-109: Avoid specifying long messages outside the exception class
(TRY003)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (15)
src/op/builtin.h (1)
30-30: LGTM! Clean constant definition.The new
kLocalVarInitattribute key is well-placed in thetl::attrnamespace and follows the established naming convention.src/target/codegen_cuda.cc (1)
2204-2213: LGTM! Initialization logic is sound.The implementation correctly:
- Defaults to zero initialization for backward compatibility
- Extracts user-provided initializers from annotations
- Handles type mismatches with explicit casting
- Emits the initialization in the generated CUDA code
src/transform/flatten_buffer.cc (4)
28-28: LGTM! Necessary includes for attribute handling.The new includes support reading local variable initializers from function attributes and using the
kLocalVarInitconstant.Also applies to: 36-36
52-55: LGTM! Correct initialization of local_var_init_map_.The pass properly extracts the existing local variable initializer map from function attributes when present, enabling preservation through buffer flattening.
208-214: LGTM! Proper propagation of initializers to Allocate nodes.The code correctly attaches local variable initializers to the Allocate node's annotations, ensuring they are preserved through the transformation.
372-373: LGTM! Well-documented member variable.The
local_var_init_map_member is clearly documented and follows naming conventions.tilelang/language/allocate.py (2)
19-21: LGTM! Necessary imports for initialization support.The new imports enable type hints, initializer expressions, and block attribute handling for local variable initialization.
116-116: LGTM! Correct attribute attachment.The block attribute is properly structured as
{buffer.data: init}to match the C++Map<Var, PrimExpr>type expected by the transform passes.src/transform/lower_opaque_block.cc (3)
25-25: LGTM! Necessary includes added.The new includes support attribute handling, string operations, and the
kLocalVarInitconstant.Also applies to: 29-29, 32-32
45-59: LGTM! Correct PrimFunc-level handling of LocalVarInit.The refactored
Rewritemethod properly:
- Operates on PrimFunc to access and update attributes
- Extracts existing local_var_init_map_ from function attributes
- Collects initializers during lowering
- Reattaches the map as a function attribute for downstream passes
This ensures LocalVarInit annotations are preserved throughout the compilation pipeline.
96-100: LGTM! Proper propagation to Allocate nodes.The initializers are correctly attached to the Allocate node's annotations, making them available to later passes (storage_rewrite, codegen).
src/transform/storage_rewrite.cc (4)
700-710: LGTM! Well-structured helper for annotation management.The
MakeAllocateAnnotationsmethod correctly extracts initializers from the map and constructs appropriate annotations. The defensive check fordefined()and the const qualifier are good practices.
843-847: Verify initializer behavior when allocations are merged.When multiple allocations are merged into shared storage (lines 802-857 and 862-904), only
e->alloc_var(the first allocation's buffer_var) is used to look up the initializer. If multiple merged allocations have different initializers, only the first would be preserved in the annotations.This is acceptable for the primary use case (
local.varscope), since:
- Lines 1161-1169 exclude local-scope allocations from reuse/merging
- Lines 1086-1094 exclude small allocations (≤32 bits) from reuse
local.varvariables are typically small, register-mapped allocationsHowever, if this feature is later used with storage scopes that allow merging (e.g., shared memory), the loss of non-first initializers could cause incorrect behavior.
Consider documenting this limitation, or add a check to warn/error if merging occurs between allocations with conflicting initializers:
// In NewAllocTagMerged or the merged allocation path if (local_var_init_map_.defined()) { for (StorageEntry* child : e->merged_children) { for (const AllocateNode* op : child->allocs) { if (local_var_init_map_.count(op->buffer_var)) { LOG(WARNING) << "Local variable initializer for " << op->buffer_var->name_hint << " may be lost due to allocation merging"; } } } }Also applies to: 896-899
1966-1975: Correct extraction and forwarding of local_var_init_map.The StorageRewrite pass properly extracts the
kLocalVarInitattribute from the function and forwards it to theStoragePlanRewriter. The use ofstd::moveavoids unnecessary copies.
1820-1820: Annotations correctly preserved during vector type rewriting.The
VectorTypeRewriterproperly forwardsop->annotationswhen creating the rewrittenAllocatenode, ensuring initializer annotations survive the transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
- src/transform/storage_rewrite.cc(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/transform/storage_rewrite.cc (1)
src/transform/lower_opaque_block.cc (12)
annotations(208-248)
annotations(209-212)
op(62-109)
op(62-62)
op(110-116)
op(110-110)
op(118-151)
op(118-118)
op(153-166)
op(153-153)
f(45-59)
f(45-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (6)
src/transform/storage_rewrite.cc (6)
28-28: LGTM: Include added for annotations support.The
<tvm/ir/attrs.h>header is required for theMap<String, ffi::Any>annotations functionality.
471-475: LGTM: Backward-compatible signature extension.The signature extension correctly adds support for passing initializer maps while maintaining backward compatibility with a default empty map. Good use of
std::movefor efficiency.
700-710: LGTM: Well-designed helper function.The
MakeAllocateAnnotationshelper properly checks if the map is defined before access and cleanly encapsulates the logic for creating allocation annotations with initializers.
1201-1202: LGTM: Member variable properly declared.The
local_var_init_map_member is clearly named, documented, and uses the appropriate type.
1966-1975: LGTM: Proper attribute extraction and propagation.The pass correctly extracts the
kLocalVarInitattribute from the function and passes it to the storage plan rewriter. The Optional handling is appropriate.
1820-1820: LGTM: Annotations correctly preserved during type rewriting.Forwarding
op->annotationsensures that local variable initializers are preserved when theVectorTypeRewriterconverts scalar buffer types to vectorized types.
| Map<String, ffi::Any> annotations = | ||
| MakeAllocateAnnotations(e->alloc_var); | ||
| e->alloc_nest.push_back(Allocate( | ||
| e->alloc_var, alloc_type, e->allocs[0]->extents, | ||
| e->allocs[0]->condition, Evaluate(0), std::move(annotations))); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that initialized allocations are not inappropriately merged.
The implementation consistently applies MakeAllocateAnnotations across all allocation paths. However, when allocations are merged (especially in NewAllocTagMerged at line 887 where child->alloc_var = e->alloc_var), only the primary allocation's init value is preserved since all children share the same alloc_var.
If multiple local.var allocations with distinct init values are merged, the child init values would be lost. Verify that allocations requiring initialization are not merged, or document this limitation.
Run the following script to check if the storage reuse logic excludes initialized variables:
Also applies to: 843-847, 896-899
🏁 Script executed:
#!/bin/bash
# Check if storage reuse considers initialization when deciding to merge allocations
# Search for reuse/merge decision logic that might consider local_var_init
rg -A5 -B5 "FindAlloc|NewAlloc|sym_free_list_|const_free_map_" src/transform/storage_rewrite.cc | rg -C3 "local_var_init|kLocalVarInit"
# Check if small local.var allocations (which need init) are excluded from reuse
rg -A10 "is_small_array|kWarp" src/transform/storage_rewrite.ccLength of output: 1040
🏁 Script executed:
#!/bin/bash
# Search for local.var and scope.rank assignment to understand allocation rank
# Also search for init value handling in merge/reuse logic
# Find where local.var scope is defined
rg -n "local\.var|local_var" src/transform/storage_rewrite.cc | head -20
# Find where alloc_var is assigned scope rank
rg -B10 -A10 "alloc_var.*StorageRank|scope\.rank.*=" src/transform/storage_rewrite.cc | head -50
# Search for NewAllocTagMerged to see how child alloc_var is set
rg -B5 -A15 "NewAllocTagMerged" src/transform/storage_rewrite.cc | grep -A15 "child->alloc_var"Length of output: 2341
🏁 Script executed:
#!/bin/bash
# Search for NewAllocTagMerged function definition and its logic
rg -n "NewAllocTagMerged|MergeAllocates" src/transform/storage_rewrite.cc
# Look for the actual merge decision logic - search around line 843 and 896
rg -B20 "e->allocs\[0\]->condition" src/transform/storage_rewrite.cc | tail -80
# Search for where initialized allocations might be excluded from merge
rg -B5 "local_var_init_map_" src/transform/storage_rewrite.ccLength of output: 1804
🏁 Script executed:
#!/bin/bash
# Search for how allocations are added to e->allocs list (the merge candidates)
rg -B10 -A5 "e->allocs.push_back|allocs.emplace" src/transform/storage_rewrite.cc | head -60
# Look for how local_var_init_map_ is populated - what determines if an alloc is included
rg -B5 "local_var_init_map\[" src/transform/storage_rewrite.cc
# Search for allocation handling that might exclude init values from reuse
rg -B3 -A3 "local_var_init|init_map" src/transform/storage_rewrite.cc | grep -v "^--$" | head -80Length of output: 2293
🏁 Script executed:
#!/bin/bash
# Find how e->alloc_var is created/assigned - is it a new variable or an existing one?
rg -B15 "e->alloc_var\s*=" src/transform/storage_rewrite.cc | grep -A15 "StorageEntry"
# Look at FindAlloc to see if it groups different buffer_vars into same StorageEntry
rg -B5 -A30 "StorageEntry\* FindAlloc" src/transform/storage_rewrite.cc | head -80
# Search for whether init_map entries are created for merged allocations
rg -B3 -A3 "kLocalVarInit" src/transform/storage_rewrite.ccLength of output: 981
🏁 Script executed:
#!/bin/bash
# Search for the core logic that decides which allocations go into the same StorageEntry
rg -B20 "dst_entry->allocs.emplace_back" src/transform/storage_rewrite.cc | head -60
# Check whether different buffer_vars with same storage scope are grouped together
rg -B30 "FindAlloc\(" src/transform/storage_rewrite.cc | grep -A30 "FindAlloc.*StorageRank"Length of output: 968
🏁 Script executed:
#!/bin/bash
# Find FindAlloc implementation - search for function definition
rg -A50 "StorageEntry\* FindAlloc\(" src/transform/storage_rewrite.cc | head -100
# Also check if there's a separate path for local.var that bypasses the reuse logic
rg -B5 -A10 "\.var\"|kWarp|local.var" src/transform/storage_rewrite.cc | grep -B5 -A10 "NewAlloc\|FindAlloc"Length of output: 2680
Verify that initialized allocations are not inappropriately merged.
The implementation consistently applies MakeAllocateAnnotations across allocation paths, but a critical gap exists: .var-tagged allocations are NOT excluded by the is_small_array check (which requires scope.tag.empty()). This means multiple local.var allocations with different buffer variables and init values can be grouped into the same StorageEntry and merged.
When merged, only the first allocation's init value (from e->allocs[0]->buffer_var) is looked up and preserved. If different local.var allocations have distinct init values, the subsequent ones are silently dropped during the merge at lines 843-847 and 896-899.
Confirm that allocations with distinct init values are excluded from storage reuse, or document this limitation.
🤖 Prompt for AI Agents
In src/transform/storage_rewrite.cc around lines 783 to 787, the code always
applies MakeAllocateAnnotations when creating Allocate nodes, but allocations
with a ".var" scope tag are not being excluded by the is_small_array check
(which requires scope.tag.empty()), causing local.var allocations with different
buffer vars and distinct init values to be grouped into the same StorageEntry
and merged; to fix, ensure allocations with non-empty scope.tag (e.g., ".var")
or allocations that carry init values are treated as non-mergeable by either (1)
updating the storage-entry grouping logic to compare and require identical init
values (and buffer_var) before merging, or (2) explicitly skip adding
MakeAllocateAnnotations and skip grouping for allocations whose scope.tag is
non-empty or that have an initialization expression, so they are not
reused/merged; make the minimal change that prevents combining allocations with
differing init values (or add a clear comment documenting that such allocations
are not eligible for reuse).
This pull request introduces support for specifying initial values for local variable allocations (
alloc_var) in TileLang and its underlying TVM code generation pipeline. The changes propagate user-specified initializers from the frontend through buffer flattening, opaque block lowering, storage rewriting, and finally into CUDA code generation. Comprehensive tests are added to verify the correct code emission for initializers.Support for local variable initializers:
tl.local_var_init(kLocalVarInit) to represent initial values for local variable allocations.local.varallocations, falling back to zero if not provided.IR transformation and propagation:
flatten_buffer,lower_opaque_block, andstorage_rewritepasses to propagate and apply thetl.local_var_initattribute, ensuring initializers are carried through all relevant transformations:Testing:
test_tilelang_language_alloc.pyto verify that alloc_var initializers are correctly emitted and that multiple variables can be initialized with different values.These changes ensure that user-specified initial values for local variables are correctly handled and reflected in the generated CUDA code, improving expressiveness and correctness of TileLang programs.
Summary by CodeRabbit
New Features
Tests