Skip to content

Conversation

@yyttt6
Copy link
Contributor

@yyttt6 yyttt6 commented Oct 27, 2025

Summary by CodeRabbit

  • New Features
    • Device-side assertion support: two intrinsics (assert and assert-with-message) enable runtime checks inside kernels with optional messages.
    • CUDA codegen emits these device assertions so failures are reported from device execution.
    • TileLang printing now uses the new intrinsics to produce device-side asserts.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

Walkthrough

Adds two TL intrinsics (tl.device_assert, tl.device_assert_with_msg), registers them in the TL builtin registry, emits device-side assert/printf code in CUDA codegen, removes prior device-side utility implementations, and updates tilelang printing to call the new intrinsics.

Changes

Cohort / File(s) Summary
TL Builtin Declarations
src/op/builtin.h
Added declarations TVM_DLL const Op &device_assert() and TVM_DLL const Op &device_assert_with_msg() in tvm::tl.
TL Builtin Implementations & Registrations
src/op/builtin.cc
Implemented and registered tl.device_assert (arity 1) and tl.device_assert_with_msg (arity 2) as opaque intrinsics in the TL builtin registry.
CUDA Codegen: Declaration
src/target/codegen_cuda.h
Added private method declaration void PrintDeviceAssert(const CallNode *call) to CodeGenTileLangCUDA.
CUDA Codegen: Implementation
src/target/codegen_cuda.cc
Added PrintDeviceAssert(const CallNode*) and integrated handling in the EvaluateNode visitor to emit optional printf and device assert(cond) for the new intrinsics, short-circuiting default CodeGenC emission.
Removed Device Utilities
src/tl_templates/cuda/debug.h
Removed prior TL_DEVICE void device_assert(bool) and TL_DEVICE void device_assert_with_msg(bool, const char*) definitions.
TileLang Printing
tilelang/language/print.py
Replaced tir.call_extern(...) usage with T.call_intrin('tl.device_assert', ...) and T.call_intrin('tl.device_assert_with_msg', ...) for assertion emission.

Sequence Diagram(s)

sequenceDiagram
    participant TLPrint as tilelang/print.py
    participant IR as TVM IR
    participant Codegen as CodeGenTileLangCUDA
    participant GPU as CUDA device

    alt no message
        TLPrint->>IR: T.call_intrin('tl.device_assert', cond)
    else with message
        TLPrint->>IR: T.call_intrin('tl.device_assert_with_msg', cond, msg)
    end

    IR->>IR: builtin ops registered (src/op/builtin.cc)
    IR->>Codegen: EvaluateNode visits CallNode
    Note over Codegen: matches tl.device_assert / tl.device_assert_with_msg
    Codegen->>Codegen: PrintDeviceAssert(call)  %%# (fill:#f7fbff)
    Codegen->>GPU: emit optional printf(msg) and assert(cond)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Files needing extra attention:
    • src/target/codegen_cuda.cc — correctness and escaping of emitted CUDA code, message handling, and visitor short-circuit.
    • src/op/builtin.cc — registration attributes (opaque effect, arity).
    • tilelang/language/print.py — correct intrinsic call formation and argument ordering.

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

🐰 I hopped through ops and CUDA night,
Tiny asserts now spring to light.
From Python call to GPU's shout,
Messages printed, bugs called out—
A carrot-coded cheer, so bright! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "[Refactor]:Move device_assert from extern_call to intrin_call" accurately captures the main change across the changeset. The PR refactors device assertion functionality by converting from extern function calls to intrinsic operations, which is evidenced by: (1) adding new intrinsic definitions in builtin files, (2) removing old extern-based implementations from debug.h, (3) updating call sites in print.py to use intrinsics instead of extern calls, and (4) implementing CUDA codegen support for the new intrinsics. The title is concise, clear, and specific enough that a teammate reviewing history would immediately understand the refactoring's purpose, though it represents both device_assert and device_assert_with_msg which are handled as a cohesive change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 307778e and 2f93c5a.

📒 Files selected for processing (6)
  • src/op/builtin.cc (1 hunks)
  • src/op/builtin.h (2 hunks)
  • src/target/codegen_cuda.cc (2 hunks)
  • src/target/codegen_cuda.h (1 hunks)
  • src/tl_templates/cuda/debug.h (0 hunks)
  • tilelang/language/print.py (2 hunks)
💤 Files with no reviewable changes (1)
  • src/tl_templates/cuda/debug.h
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/target/codegen_cuda.h
🧰 Additional context used
🧬 Code graph analysis (3)
src/op/builtin.h (1)
tilelang/language/print.py (1)
  • device_assert (144-155)
tilelang/language/print.py (1)
tilelang/language/tir/op.py (1)
  • call_intrin (120-145)
src/op/builtin.cc (1)
tilelang/language/print.py (1)
  • device_assert (144-155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
🔇 Additional comments (5)
tilelang/language/print.py (2)

8-8: LGTM: Import added for intrinsic call support.

The import of tilelang.language as T is necessary for the intrinsic-based device assertion calls.


151-155: LGTM: Correct migration to intrinsic-based assertions.

The switch from tir.call_extern to T.call_intrin with registered intrinsics (tl.device_assert and tl.device_assert_with_msg) is correct and aligns with the PR objective.

src/op/builtin.h (1)

517-529: LGTM: Intrinsic declarations are correct.

The declarations for device_assert() and device_assert_with_msg() follow the established pattern for TL builtin intrinsics. Documentation is clear and concise.

src/target/codegen_cuda.cc (1)

2313-2320: LGTM: Correct interception of device assert intrinsics.

The modification to VisitStmt_ correctly identifies device assertion intrinsics by name and routes them to the specialized PrintDeviceAssert handler, preventing them from being processed by the standard codegen path.

src/op/builtin.cc (1)

304-312: LGTM: Intrinsic registrations are correct.

Both device_assert and device_assert_with_msg are properly registered with the correct number of inputs (1 and 2 respectively) and appropriate kOpaque call effect kind, which is correct for operations with observable side effects.

Comment on lines +2287 to +2297
void CodeGenTileLangCUDA::PrintDeviceAssert(const CallNode *call) {
std::string cond = PrintExpr(call->args[0]);
if (call->args.size() == 2) {
if (auto str = call->args[1].as<StringImmNode>()) {
this->PrintIndent();
stream << "if (!(" << cond << ")) printf(\"" << str->value << "\\n\");\n";
}
}
this->PrintIndent();
stream << "assert(" << cond << ");\n";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify printf format string safety.

The implementation correctly emits device-side assertions. However, on line 2292, the message is passed directly to printf without format specifiers. If the message contains % characters, this could cause undefined behavior or crashes.

Consider using a format specifier to safely print the message:

-      stream << "if (!(" << cond << ")) printf(\"" << str->value << "\\n\");\n";
+      stream << "if (!(" << cond << ")) printf(\"%s\\n\", \"" << str->value << "\");\n";

Or escape any % characters in the message string before emitting it.

🤖 Prompt for AI Agents
In src/target/codegen_cuda.cc around lines 2287-2297, the emitted device printf
uses the message string directly which is unsafe if the message contains '%'
characters; change the emission to use a format specifier (e.g. printf("%s\n",
...)) or escape any '%' characters in the StringImmNode value before embedding
it so the runtime printf never interprets stray '%' sequences; update the branch
that handles call->args[1] to either print with "%s\\n" and pass the literal as
the second argument or preprocess the literal to replace each '%' with "%%"
prior to emitting the printf call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant