Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Oct 21, 2025

This pull request introduces support for specifying initial values for local variable allocations (alloc_var) in TileLang and its underlying TVM code generation pipeline. The changes propagate user-specified initializers from the frontend through buffer flattening, opaque block lowering, storage rewriting, and finally into CUDA code generation. Comprehensive tests are added to verify the correct code emission for initializers.

Support for local variable initializers:

  • Added a new attribute key tl.local_var_init (kLocalVarInit) to represent initial values for local variable allocations.
  • Modified the CUDA code generator to emit user-specified initial values for local.var allocations, falling back to zero if not provided.

IR transformation and propagation:

  • Updated flatten_buffer, lower_opaque_block, and storage_rewrite passes to propagate and apply the tl.local_var_init attribute, ensuring initializers are carried through all relevant transformations:
    • Buffer flattener and opaque block lowerer now collect, preserve, and attach initializers from block annotations to allocation nodes. [1] [2] [3] [4] [5] [6] [7] [8]
    • Storage plan rewriter attaches initializers to allocation annotations and ensures correct propagation during storage rewriting. [1] [2] [3] [4] [5] [6] [7]
    • Necessary includes and attribute key wiring added throughout transformation files. [1] [2] [3]

Testing:

  • Added new tests in test_tilelang_language_alloc.py to verify that alloc_var initializers are correctly emitted and that multiple variables can be initialized with different values.

These changes ensure that user-specified initial values for local variables are correctly handled and reflected in the generated CUDA code, improving expressiveness and correctness of TileLang programs.

Summary by CodeRabbit

  • New Features

    • Added support for custom initialization of local variables: local allocations can now be initialized to a user-provided value instead of default zero.
    • Updated alloc_var() API to accept optional initializers while preserving backward-compatible scope argument handling.
  • Tests

    • Added tests verifying single and multiple local-variable initializers and their correct propagation into generated kernels.

…attach it to

    generated Allocates and the PrimFunc attrs
  - thread the map through FlattenBuffer and StorageRewrite so flattened/merged
    allocations keep their tl.local_var_init annotations
  - teach annotation handling to accept scalar initializers, resolve buffers, and merge
    with existing stat
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Walkthrough

Adds a new attribute constant kLocalVarInit and implements propagation and handling of local-variable initializers across the pipeline: TileLang allocation API, transform passes (flatten, opaque-block lowering, storage rewrite), CUDA codegen, and tests to verify emitted kernel initializers.

Changes

Cohort / File(s) Summary
Attribute Definition
src/op/builtin.h
Added static constexpr const char *kLocalVarInit = "tl.local_var_init"; in namespace tvm::tl::attr.
TileLang API
tilelang/language/allocate.py
Updated alloc_var signature to accept flexible positional args and init (supports forms: init, scope, or init+scope). Emits tl.local_var_init block attribute when initializer provided; added argument validation.
Transform Passes (annotation propagation)
src/transform/flatten_buffer.cc, src/transform/lower_opaque_block.cc, src/transform/storage_rewrite.cc
Added storage for and propagation of a local_var_init_map_ extracted from PrimFunc attrs (kLocalVarInit); updated APIs/signatures to accept and forward the map; Attach initializer annotations to Allocate nodes and flattened buffers; refactored opaque-block lowering to operate on PrimFunc and preserve/reattach initializers.
Codegen (CUDA)
src/target/codegen_cuda.cc
When emitting local (scope="local.var") allocations, check for kLocalVarInit annotation and emit the annotated initializer (with cast to target dtype) instead of unconditional zero initialization.
Tests
testing/python/language/test_tilelang_language_alloc.py
Added helpers and tests for single and multiple local variable initializers; tests compile and assert the generated kernel source contains the provided initializer(s).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TileLang
    participant Passes
    participant Codegen

    User->>TileLang: alloc_var(dtype, init=expr)
    TileLang->>TileLang: parse args, attach attr(kLocalVarInit)
    TileLang->>Passes: PrimFunc (with attrs.kLocalVarInit)

    rect rgba(200,220,240,0.3)
    Note over Passes: Transform passes extract and forward<br/>local_var_init_map_ through flatten, opaque lowering, storage rewrite
    Passes->>Passes: populate local_var_init_map_
    Passes->>Passes: attach annotations to Allocate nodes / buffers
    Passes->>Codegen: lowered IR with Allocate annotations
    end

    rect rgba(220,200,240,0.3)
    Note over Codegen: CUDA codegen checks kLocalVarInit for local.var
    Codegen->>Codegen: if present -> cast init -> emit initializer
    Codegen->>User: Generated CUDA kernel with custom init
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

  • chengyupku
  • xysmlx

Poem

🐰
I nibble bytes and bind a var,
A little init to set the bar.
From Python hop through passes wide,
To CUDA kernels where values bide.
Hooray — local seeds now sprout with pride!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "[Language] Support tilelang alloc_var(dtype, init=x)" is directly and clearly related to the main change in the changeset. The core user-facing feature is updating the alloc_var function signature in tilelang/language/allocate.py to accept an optional init parameter for variable initialization, which is precisely what the title describes. While the PR includes supporting infrastructure changes across multiple transformation passes and code generators to propagate and emit the initializer, these are all in service of enabling the primary feature highlighted in the title. The title is concise, specific, and clearly conveys what a teammate would need to understand about the primary change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
tilelang/language/allocate.py (1)

70-117: Consider simplifying the argument parsing logic.

The current implementation handles multiple calling patterns for backward compatibility, but the complexity makes it harder to maintain and understand. The logic is correct, but consider these alternatives:

  1. Option 1 (Recommended): Deprecate the string-as-scope positional pattern in favor of explicit keyword arguments:

    def alloc_var(dtype, init=None, scope="local.var"):
        buffer = T.alloc_buffer([1], dtype, scope=scope)
        if init is not None:
            block_attr({"tl.local_var_init": {buffer.data: init}})
        return buffer

    This is cleaner and more explicit, though it breaks backward compatibility with alloc_var(dtype, "shared").

  2. Option 2: Keep the current approach but add validation to ensure string initializers aren't silently treated as scope:

    if len(args) == 1:
        arg = args[0]
        if isinstance(arg, str):
            if parsed_init is None and scope == "local.var":
                parsed_scope = arg
            else:
                raise TypeError("String arguments are treated as scope for backward compatibility. Use init= keyword for initializers.")

The current implementation is functional, but the backward compatibility burden may outweigh the benefits. Consider documenting the deprecation path in the docstring.

src/transform/lower_opaque_block.cc (1)

219-261: Clarify the buffer resolution fallback behavior.

The ResolveLocalVarBuffer logic uses a fallback strategy that may be surprising:

  1. First, it looks for a buffer with local.var scope (lines 251-255)
  2. Then, it falls back to the last buffer in alloc_buffers if no local.var is found (lines 257-259)

Potential Issue: If a block allocates multiple buffers (e.g., shared and local buffers), and the user provides a bare PrimExpr initializer, the fallback may attach the initializer to the wrong buffer.

Example scenario:

with T.block():
    shared_buf = alloc_shared((128,), "float32")
    local_buf = alloc_local((16,), "float32")
    block_attr({"tl.local_var_init": 1.0})  # Which buffer gets initialized?

The fallback would choose local_buf (last buffer), which may or may not be the intended target.

Recommendations:

  1. Document this fallback behavior clearly in the code comments and user documentation
  2. Consider making the bare PrimExpr form require that there is exactly one local.var buffer, failing with a clear error otherwise
  3. Alternatively, always require the explicit Map<Var, PrimExpr> form when multiple buffers are present

The current implementation logs a warning (line 229), which is good, but users may not see warnings during compilation. An error would be safer for ambiguous cases.

To verify the current behavior, you could add a test case with multiple buffer allocations to confirm the fallback selects the expected buffer:

@T.prim_func
def test_multi_buffer_init():
    with T.block():
        buf1 = T.alloc_buffer((128,), "float32", scope="shared")
        buf2 = T.alloc_buffer((16,), "float32", scope="local")
        T.block_attr({"tl.local_var_init": 42.0})
        # Which buffer is initialized?
testing/python/language/test_tilelang_language_alloc.py (3)

97-101: Redundant copy operation preceding overwrite.

Line 99 copies A to B, but lines 100-101 immediately overwrite all elements of B with tmp. The copy serves no purpose in this test and may confuse readers about the test's intent.

Consider removing the redundant copy:

     with T.Kernel(T.ceildiv(N, block_N), threads=block_N) as bx:
         tmp = T.alloc_var(dtype, init_value)
-        T.copy(A[bx * block_N], B[bx * block_N])
         for i in T.Parallel(block_N):
             B[bx * block_N + i] = tmp

136-141: Redundant copy operation preceding overwrite.

Similar to the single-variable test, line 139 copies A to B, but line 141 immediately overwrites all elements of B. The copy is unnecessary.

Apply this diff:

     with T.Kernel(T.ceildiv(N, block_N), threads=block_N) as bx:
         tmp0 = T.alloc_var(dtype, 1)
         tmp1 = T.alloc_var(dtype, 2)
-        T.copy(A[bx * block_N], B[bx * block_N])
         for i in T.Parallel(block_N):
             B[bx * block_N + i] = tmp0 + tmp1

106-118: String-based code verification is fragile.

Lines 117, 156, and 157 verify initializers by searching for exact string patterns in generated code (e.g., "= 5;", "= 1;"). This approach is brittle—minor formatting changes in code generation would break tests despite correct functionality.

Consider adding runtime execution tests that verify actual behavior rather than relying solely on string matching. For example:

def test_alloc_var_with_initializer_runtime():
    import numpy as np
    N, block_N, init_value = 256, 64, 5
    program = alloc_var_with_initializer(N, block_N, "int32", init_value)
    
    kernel = tilelang.compile(program, out_idx=[1])
    a_np = np.random.randint(0, 100, (N,), dtype="int32")
    b_np = np.zeros((N,), dtype="int32")
    
    kernel(a_np, b_np)
    
    # Verify that all elements of b_np are set to init_value
    assert np.all(b_np == init_value), f"Expected all {init_value}, got {b_np}"

If code-level verification is still desired, consider using a more flexible pattern like re.search(rf'\b\w+\s*=\s*{init_value}\s*;', code) to tolerate whitespace variations.

Also applies to: 146-158

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdc67fc and 9479bf1.

📒 Files selected for processing (7)
  • src/op/builtin.h (1 hunks)
  • src/target/codegen_cuda.cc (1 hunks)
  • src/transform/flatten_buffer.cc (4 hunks)
  • src/transform/lower_opaque_block.cc (7 hunks)
  • src/transform/storage_rewrite.cc (9 hunks)
  • testing/python/language/test_tilelang_language_alloc.py (1 hunks)
  • tilelang/language/allocate.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/transform/flatten_buffer.cc (2)
src/transform/simplify.cc (2)
  • func (222-274)
  • func (223-225)
tilelang/language/ast/ir.py (1)
  • init (361-369)
tilelang/language/allocate.py (1)
tilelang/language/ast/ir.py (3)
  • block_attr (430-438)
  • init (361-369)
  • alloc_buffer (441-508)
src/transform/lower_opaque_block.cc (2)
src/transform/inject_fence_proxy.cc (4)
  • f (133-140)
  • f (133-133)
  • f (166-173)
  • f (166-166)
src/transform/legalize_safe_memory_access.cc (12)
  • f (286-298)
  • f (286-286)
  • buffer (87-95)
  • buffer (87-87)
  • buffer (98-138)
  • buffer (98-99)
  • buffer (256-260)
  • buffer (256-256)
  • buffer (262-265)
  • buffer (262-262)
  • buffer (267-270)
  • buffer (267-267)
testing/python/language/test_tilelang_language_alloc.py (2)
tilelang/language/allocate.py (1)
  • alloc_var (70-117)
tilelang/language/copy.py (1)
  • copy (10-86)
src/transform/storage_rewrite.cc (1)
src/transform/lower_opaque_block.cc (12)
  • annotations (208-248)
  • annotations (209-212)
  • op (62-109)
  • op (62-62)
  • op (110-116)
  • op (110-110)
  • op (118-151)
  • op (118-118)
  • op (153-166)
  • op (153-153)
  • f (45-59)
  • f (45-45)
🪛 Ruff (0.14.1)
tilelang/language/allocate.py

98-98: Avoid specifying long messages outside the exception class

(TRY003)


102-102: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


108-109: Avoid specifying long messages outside the exception class

(TRY003)


112-112: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (15)
src/op/builtin.h (1)

30-30: LGTM! Clean constant definition.

The new kLocalVarInit attribute key is well-placed in the tl::attr namespace and follows the established naming convention.

src/target/codegen_cuda.cc (1)

2204-2213: LGTM! Initialization logic is sound.

The implementation correctly:

  • Defaults to zero initialization for backward compatibility
  • Extracts user-provided initializers from annotations
  • Handles type mismatches with explicit casting
  • Emits the initialization in the generated CUDA code
src/transform/flatten_buffer.cc (4)

28-28: LGTM! Necessary includes for attribute handling.

The new includes support reading local variable initializers from function attributes and using the kLocalVarInit constant.

Also applies to: 36-36


52-55: LGTM! Correct initialization of local_var_init_map_.

The pass properly extracts the existing local variable initializer map from function attributes when present, enabling preservation through buffer flattening.


208-214: LGTM! Proper propagation of initializers to Allocate nodes.

The code correctly attaches local variable initializers to the Allocate node's annotations, ensuring they are preserved through the transformation.


372-373: LGTM! Well-documented member variable.

The local_var_init_map_ member is clearly documented and follows naming conventions.

tilelang/language/allocate.py (2)

19-21: LGTM! Necessary imports for initialization support.

The new imports enable type hints, initializer expressions, and block attribute handling for local variable initialization.


116-116: LGTM! Correct attribute attachment.

The block attribute is properly structured as {buffer.data: init} to match the C++ Map<Var, PrimExpr> type expected by the transform passes.

src/transform/lower_opaque_block.cc (3)

25-25: LGTM! Necessary includes added.

The new includes support attribute handling, string operations, and the kLocalVarInit constant.

Also applies to: 29-29, 32-32


45-59: LGTM! Correct PrimFunc-level handling of LocalVarInit.

The refactored Rewrite method properly:

  • Operates on PrimFunc to access and update attributes
  • Extracts existing local_var_init_map_ from function attributes
  • Collects initializers during lowering
  • Reattaches the map as a function attribute for downstream passes

This ensures LocalVarInit annotations are preserved throughout the compilation pipeline.


96-100: LGTM! Proper propagation to Allocate nodes.

The initializers are correctly attached to the Allocate node's annotations, making them available to later passes (storage_rewrite, codegen).

src/transform/storage_rewrite.cc (4)

700-710: LGTM! Well-structured helper for annotation management.

The MakeAllocateAnnotations method correctly extracts initializers from the map and constructs appropriate annotations. The defensive check for defined() and the const qualifier are good practices.


843-847: Verify initializer behavior when allocations are merged.

When multiple allocations are merged into shared storage (lines 802-857 and 862-904), only e->alloc_var (the first allocation's buffer_var) is used to look up the initializer. If multiple merged allocations have different initializers, only the first would be preserved in the annotations.

This is acceptable for the primary use case (local.var scope), since:

  • Lines 1161-1169 exclude local-scope allocations from reuse/merging
  • Lines 1086-1094 exclude small allocations (≤32 bits) from reuse
  • local.var variables are typically small, register-mapped allocations

However, if this feature is later used with storage scopes that allow merging (e.g., shared memory), the loss of non-first initializers could cause incorrect behavior.

Consider documenting this limitation, or add a check to warn/error if merging occurs between allocations with conflicting initializers:

// In NewAllocTagMerged or the merged allocation path
if (local_var_init_map_.defined()) {
  for (StorageEntry* child : e->merged_children) {
    for (const AllocateNode* op : child->allocs) {
      if (local_var_init_map_.count(op->buffer_var)) {
        LOG(WARNING) << "Local variable initializer for " 
                     << op->buffer_var->name_hint 
                     << " may be lost due to allocation merging";
      }
    }
  }
}

Also applies to: 896-899


1966-1975: Correct extraction and forwarding of local_var_init_map.

The StorageRewrite pass properly extracts the kLocalVarInit attribute from the function and forwards it to the StoragePlanRewriter. The use of std::move avoids unnecessary copies.


1820-1820: Annotations correctly preserved during vector type rewriting.

The VectorTypeRewriter properly forwards op->annotations when creating the rewritten Allocate node, ensuring initializer annotations survive the transformation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9479bf1 and 7d14deb.

📒 Files selected for processing (1)
  • src/transform/storage_rewrite.cc (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/transform/storage_rewrite.cc (1)
src/transform/lower_opaque_block.cc (12)
  • annotations (208-248)
  • annotations (209-212)
  • op (62-109)
  • op (62-62)
  • op (110-116)
  • op (110-110)
  • op (118-151)
  • op (118-118)
  • op (153-166)
  • op (153-153)
  • f (45-59)
  • f (45-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (6)
src/transform/storage_rewrite.cc (6)

28-28: LGTM: Include added for annotations support.

The <tvm/ir/attrs.h> header is required for the Map<String, ffi::Any> annotations functionality.


471-475: LGTM: Backward-compatible signature extension.

The signature extension correctly adds support for passing initializer maps while maintaining backward compatibility with a default empty map. Good use of std::move for efficiency.


700-710: LGTM: Well-designed helper function.

The MakeAllocateAnnotations helper properly checks if the map is defined before access and cleanly encapsulates the logic for creating allocation annotations with initializers.


1201-1202: LGTM: Member variable properly declared.

The local_var_init_map_ member is clearly named, documented, and uses the appropriate type.


1966-1975: LGTM: Proper attribute extraction and propagation.

The pass correctly extracts the kLocalVarInit attribute from the function and passes it to the storage plan rewriter. The Optional handling is appropriate.


1820-1820: LGTM: Annotations correctly preserved during type rewriting.

Forwarding op->annotations ensures that local variable initializers are preserved when the VectorTypeRewriter converts scalar buffer types to vectorized types.

Comment on lines +783 to +787
Map<String, ffi::Any> annotations =
MakeAllocateAnnotations(e->alloc_var);
e->alloc_nest.push_back(Allocate(
e->alloc_var, alloc_type, e->allocs[0]->extents,
e->allocs[0]->condition, Evaluate(0), std::move(annotations)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify that initialized allocations are not inappropriately merged.

The implementation consistently applies MakeAllocateAnnotations across all allocation paths. However, when allocations are merged (especially in NewAllocTagMerged at line 887 where child->alloc_var = e->alloc_var), only the primary allocation's init value is preserved since all children share the same alloc_var.

If multiple local.var allocations with distinct init values are merged, the child init values would be lost. Verify that allocations requiring initialization are not merged, or document this limitation.

Run the following script to check if the storage reuse logic excludes initialized variables:

Also applies to: 843-847, 896-899


🏁 Script executed:

#!/bin/bash
# Check if storage reuse considers initialization when deciding to merge allocations

# Search for reuse/merge decision logic that might consider local_var_init
rg -A5 -B5 "FindAlloc|NewAlloc|sym_free_list_|const_free_map_" src/transform/storage_rewrite.cc | rg -C3 "local_var_init|kLocalVarInit"

# Check if small local.var allocations (which need init) are excluded from reuse
rg -A10 "is_small_array|kWarp" src/transform/storage_rewrite.cc

Length of output: 1040


🏁 Script executed:

#!/bin/bash
# Search for local.var and scope.rank assignment to understand allocation rank
# Also search for init value handling in merge/reuse logic

# Find where local.var scope is defined
rg -n "local\.var|local_var" src/transform/storage_rewrite.cc | head -20

# Find where alloc_var is assigned scope rank
rg -B10 -A10 "alloc_var.*StorageRank|scope\.rank.*=" src/transform/storage_rewrite.cc | head -50

# Search for NewAllocTagMerged to see how child alloc_var is set
rg -B5 -A15 "NewAllocTagMerged" src/transform/storage_rewrite.cc | grep -A15 "child->alloc_var"

Length of output: 2341


🏁 Script executed:

#!/bin/bash
# Search for NewAllocTagMerged function definition and its logic
rg -n "NewAllocTagMerged|MergeAllocates" src/transform/storage_rewrite.cc

# Look for the actual merge decision logic - search around line 843 and 896
rg -B20 "e->allocs\[0\]->condition" src/transform/storage_rewrite.cc | tail -80

# Search for where initialized allocations might be excluded from merge
rg -B5 "local_var_init_map_" src/transform/storage_rewrite.cc

Length of output: 1804


🏁 Script executed:

#!/bin/bash
# Search for how allocations are added to e->allocs list (the merge candidates)
rg -B10 -A5 "e->allocs.push_back|allocs.emplace" src/transform/storage_rewrite.cc | head -60

# Look for how local_var_init_map_ is populated - what determines if an alloc is included
rg -B5 "local_var_init_map\[" src/transform/storage_rewrite.cc

# Search for allocation handling that might exclude init values from reuse
rg -B3 -A3 "local_var_init|init_map" src/transform/storage_rewrite.cc | grep -v "^--$" | head -80

Length of output: 2293


🏁 Script executed:

#!/bin/bash
# Find how e->alloc_var is created/assigned - is it a new variable or an existing one?
rg -B15 "e->alloc_var\s*=" src/transform/storage_rewrite.cc | grep -A15 "StorageEntry"

# Look at FindAlloc to see if it groups different buffer_vars into same StorageEntry
rg -B5 -A30 "StorageEntry\* FindAlloc" src/transform/storage_rewrite.cc | head -80

# Search for whether init_map entries are created for merged allocations
rg -B3 -A3 "kLocalVarInit" src/transform/storage_rewrite.cc

Length of output: 981


🏁 Script executed:

#!/bin/bash
# Search for the core logic that decides which allocations go into the same StorageEntry
rg -B20 "dst_entry->allocs.emplace_back" src/transform/storage_rewrite.cc | head -60

# Check whether different buffer_vars with same storage scope are grouped together
rg -B30 "FindAlloc\(" src/transform/storage_rewrite.cc | grep -A30 "FindAlloc.*StorageRank"

Length of output: 968


🏁 Script executed:

#!/bin/bash
# Find FindAlloc implementation - search for function definition
rg -A50 "StorageEntry\* FindAlloc\(" src/transform/storage_rewrite.cc | head -100

# Also check if there's a separate path for local.var that bypasses the reuse logic
rg -B5 -A10 "\.var\"|kWarp|local.var" src/transform/storage_rewrite.cc | grep -B5 -A10 "NewAlloc\|FindAlloc"

Length of output: 2680


Verify that initialized allocations are not inappropriately merged.

The implementation consistently applies MakeAllocateAnnotations across allocation paths, but a critical gap exists: .var-tagged allocations are NOT excluded by the is_small_array check (which requires scope.tag.empty()). This means multiple local.var allocations with different buffer variables and init values can be grouped into the same StorageEntry and merged.

When merged, only the first allocation's init value (from e->allocs[0]->buffer_var) is looked up and preserved. If different local.var allocations have distinct init values, the subsequent ones are silently dropped during the merge at lines 843-847 and 896-899.

Confirm that allocations with distinct init values are excluded from storage reuse, or document this limitation.

🤖 Prompt for AI Agents
In src/transform/storage_rewrite.cc around lines 783 to 787, the code always
applies MakeAllocateAnnotations when creating Allocate nodes, but allocations
with a ".var" scope tag are not being excluded by the is_small_array check
(which requires scope.tag.empty()), causing local.var allocations with different
buffer vars and distinct init values to be grouped into the same StorageEntry
and merged; to fix, ensure allocations with non-empty scope.tag (e.g., ".var")
or allocations that carry init values are treated as non-mergeable by either (1)
updating the storage-entry grouping logic to compare and require identical init
values (and buffer_var) before merging, or (2) explicitly skip adding
MakeAllocateAnnotations and skip grouping for allocations whose scope.tag is
non-empty or that have an initialization expression, so they are not
reused/merged; make the minimal change that prevents combining allocations with
differing init values (or add a clear comment documenting that such allocations
are not eligible for reuse).

@LeiWang1999 LeiWang1999 merged commit bddb125 into tile-ai:main Oct 21, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant