[Profiler]Adds CUPTI profiler support #936

Cunxiao2002 · 2025-10-03T16:42:49Z

Summary by CodeRabbit

New Features
- GEMM example now runs an integrated benchmark and prints measured latency.
- Profiler benchmarking adds selectable backends (including CUPTI), quieter profiling output, configurable quantiles, and selectable return modes (min/max/mean/median).
Tests
- Added end-to-end profiler tests that run and validate GEMM profiling across backends.
Documentation
- Adapter description updated to indicate the adapter uses cython rather than ctypes.

coderabbitai · 2025-10-03T16:43:01Z

Walkthrough

Adds runtime benchmarking to the GEMM example and extends the profiler with two backends ("event", "cupti"), new options (backend, quantiles, return_mode), CUPTI stdout/stderr suppression, and corresponding tests and minor docstring wording change.

Changes

Cohort / File(s)	Summary
Example benchmarking usage `examples/gemm/example_gemm.py`	Adds runtime benchmarking: obtains kernel profiler, calls `do_bench` on `gemm_kernel`, and prints measured latency (keeps an alternative commented call).
Profiler public surface `tilelang/profiler/__init__.py`	Adds `Literal` import and expands `do_bench`/`Profiler.do_bench` signatures to accept and propagate `backend`, `quantiles`, `return_mode`.
Benchmark engines & utilities `tilelang/profiler/bench.py`	Adds `suppress_stdout_stderr` utility, implements backend dispatch for `do_bench` with `"event"` and `"cupti"` paths, adds `_bench_with_cuda_events` and `_bench_with_cupti`, introduces L2-flush buffer and quantiles/return-mode handling, and updates returns/errors accordingly.
Tests: profiler integration `testing/python/profiler/test_tilelang_profiler.py`	New test module: defines a tiled GEMM kernel, validates correctness against PyTorch, and exercises both `cupti` and `event` backends via the kernel's profiler, printing latency.
Docs / wording `tilelang/jit/adapter/cython/adapter.py`	Minor docstring wording update: replace "ctypes" with "cython".

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as Example Script
  participant K as gemm_kernel
  participant P as Profiler
  participant E as EventBench
  participant C as CUPTIBench
  participant CUDA as CUDA / NSYS Runtime

  User->>K: build kernel
  User->>P: profiler = kernel.get_profiler()
  User->>P: profiler.do_bench(func=K, backend=...)
  alt backend == "event"
    P->>E: _bench_with_cuda_events(func=K, ...)
    E->>CUDA: warmup + timed iterations (CUDA events)
    CUDA-->>E: event timings
    E-->>P: aggregated latency (quantiles/return_mode)
  else backend == "cupti"
    P->>C: _bench_with_cupti(func=K, ...)
    C->>CUDA: run iterations (stdout/stderr suppressed)
    CUDA-->>C: per-kernel records
    C-->>P: per-kernel latency aggregated (quantiles/return_mode)
  end
  P-->>User: latency value(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[Feature request] CUPTI profiler instead of CUDA event timing #838 — Implements a CUPTI backend and extends Profiler.do_bench with backend/quantiles/return_mode, matching the issue objective.

Suggested reviewers

LeiWang1999
tzj-fxz

Poem

I hopped through kernels, soft and spry,
Timers ticking as the flops fly by.
Event or cupti, quiet and keen,
I gather latencies, tidy and clean.
Carrot-count printed — a small performance scene. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and concisely conveys the primary change—adding CUPTI support to the profiler component—without including unnecessary details or vague wording, making it immediately understandable to reviewers and future readers.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-10-03T16:43:13Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tilelang/profiler/__init__.py (1)

218-233: Fix API typing + defaults: avoid overriding dynamic warmup/repeat.

input_tensors should be Optional.
kernel_names should be Optional[Union[str, Tuple[str, ...]]].
trace_path should be Optional[str].
Return type isn’t always float (event quantiles => List[float]; kineto multi-kernel => Tuple[float, ...]).
n_warmup/n_repeat should default to 0 to enable dynamic estimation (current 1 forces single-iteration timing).

Apply:

-from typing import List, Optional, Callable, Any, Literal
+from typing import List, Optional, Callable, Any, Literal, Union, Tuple
@@
     def do_bench(
         self,
         func: Optional[Callable] = None,
         warmup: int = 25,
         rep: int = 100,
-        n_warmup: int = 1,
-        n_repeat: int = 1,
-        input_tensors: List[torch.Tensor] = None,
-        kernel_names=None,
+        n_warmup: int = 0,
+        n_repeat: int = 0,
+        input_tensors: Optional[List[torch.Tensor]] = None,
+        kernel_names: Optional[Union[str, Tuple[str, ...]]] = None,
         grad_to_none: Optional[List[torch.Tensor]] = None,
         quantiles: Optional[List[float]] = None,
         return_mode: Literal["min", "max", "mean", "median"] = "mean",
-        trace_path: str = None,
+        trace_path: Optional[str] = None,
         suppress_kineto_output: bool = False,
         with_multiple_kernels: bool = False,
-    ) -> float:
+    ) -> Union[float, List[float], Tuple[float, ...]]:

Also update the docstring “Returns” accordingly. [Based on learnings]

🧹 Nitpick comments (6)

tilelang/profiler/__init__.py (1)

255-275: Forward parity and doc correctness when using Kineto path.

do_bench may return a tuple when kernel_names is a tuple; update doc/typing as suggested above.
If you add grad clearing to Kineto (recommended for backward benches), forward grad_to_none to do_bench_kineto.

If Kineto supports grad clearing:

-            ) if kernel_names is None else do_bench_kineto(
+            ) if kernel_names is None else do_bench_kineto(
                 bench_func,
                 kernel_names=kernel_names,
                 warmup=warmup,
                 rep=rep,
                 _n_warmup=n_warmup,
                 _n_repeat=n_repeat,
                 return_mode=return_mode,
                 trace_path=trace_path,
                 suppress_kineto_output=suppress_kineto_output,
                 with_multiple_kernels=with_multiple_kernels,
+                grad_to_none=grad_to_none,
             )

tilelang/profiler/bench.py (4)

114-148: Make stdout/stderr suppression robust on non-filelike streams.

sys.stdout/err may lack fileno() (e.g., Jupyter), raising exceptions. Fall back gracefully instead of crashing.

+import io
@@
 class suppress_stdout_stderr:
 
     def __enter__(self):
-        self.outnull_file = open(os.devnull, 'w')
-        self.errnull_file = open(os.devnull, 'w')
-
-        self.old_stdout_fileno_undup = sys.stdout.fileno()
-        self.old_stderr_fileno_undup = sys.stderr.fileno()
-
-        self.old_stdout_fileno = os.dup(sys.stdout.fileno())
-        self.old_stderr_fileno = os.dup(sys.stderr.fileno())
-
-        self.old_stdout = sys.stdout
-        self.old_stderr = sys.stderr
-
-        os.dup2(self.outnull_file.fileno(), self.old_stdout_fileno_undup)
-        os.dup2(self.errnull_file.fileno(), self.old_stderr_fileno_undup)
-
-        sys.stdout = self.outnull_file
-        sys.stderr = self.errnull_file
-        return self
+        self._disabled = False
+        try:
+            self.outnull_file = open(os.devnull, 'w')
+            self.errnull_file = open(os.devnull, 'w')
+
+            self.old_stdout_fileno_undup = sys.stdout.fileno()
+            self.old_stderr_fileno_undup = sys.stderr.fileno()
+
+            self.old_stdout_fileno = os.dup(sys.stdout.fileno())
+            self.old_stderr_fileno = os.dup(sys.stderr.fileno())
+
+            self.old_stdout = sys.stdout
+            self.old_stderr = sys.stderr
+
+            os.dup2(self.outnull_file.fileno(), self.old_stdout_fileno_undup)
+            os.dup2(self.errnull_file.fileno(), self.old_stderr_fileno_undup)
+
+            sys.stdout = self.outnull_file
+            sys.stderr = self.errnull_file
+        except (AttributeError, io.UnsupportedOperation, OSError):
+            # Fall back gracefully when fileno() is unavailable
+            self._disabled = True
+        return self
 
     def __exit__(self, *_):
-        sys.stdout = self.old_stdout
-        sys.stderr = self.old_stderr
-
-        os.dup2(self.old_stdout_fileno, self.old_stdout_fileno_undup)
-        os.dup2(self.old_stderr_fileno, self.old_stderr_fileno_undup)
-
-        os.close(self.old_stdout_fileno)
-        os.close(self.old_stderr_fileno)
-
-        self.outnull_file.close()
-        self.errnull_file.close()
+        if getattr(self, "_disabled", False):
+            return
+        sys.stdout = self.old_stdout
+        sys.stderr = self.old_stderr
+        os.dup2(self.old_stdout_fileno, self.old_stdout_fileno_undup)
+        os.dup2(self.old_stderr_fileno, self.old_stderr_fileno_undup)
+        os.close(self.old_stdout_fileno)
+        os.close(self.old_stderr_fileno)
+        self.outnull_file.close()
+        self.errnull_file.close()

200-209: Support gradient clearing in Kineto path (parity with event timing).

Avoid gradient accumulation for backward benches.

-        with profiler:
+        with profiler:
             for _i in range(2):
                 for _ in range(n_repeat):
                     cache.zero_()
+                    # clear grads if provided (backward compatibility with event path)
+                    # grad_to_none: Optional[List[torch.Tensor]]
+                    try:
+                        grad_list = grad_to_none  # if parameter is added
+                    except NameError:
+                        grad_list = None
+                    if grad_list is not None:
+                        for x in grad_list:
+                            x.grad = None
                     fn()

Note: If you adopt this, add grad_to_none: Optional[List[torch.Tensor]] = None to the function signature and forward it from the caller.

221-229: Remove unused noqa and harden kernel name validation.

The # noqa: SIM101 is flagged as unused by Ruff. Remove it.
Consider accepting Sequence[str] instead of only tuple for ergonomics (optional).

-    assert isinstance(kernel_names, str) or isinstance(kernel_names, tuple)  # noqa: SIM101
+    assert isinstance(kernel_names, str) or isinstance(kernel_names, tuple)

236-251: Expand unit parsing to include seconds and nanoseconds.

Covers more profiler formats and avoids silent mis-scaling.

-    units = {'ms': 1e3, 'us': 1e6}
+    units = {'s': 1, 'ms': 1e3, 'us': 1e6, 'ns': 1e9}

examples/gemm/example_gemm.py (1)

54-59: Suppress profiler chatter in example output.

Default Kineto may print noisy lines in some setups. Suppress for a cleaner example.
-    latency = profiler.do_bench(kernel_names="gemm_kernel")
+    latency = profiler.do_bench(kernel_names="gemm_kernel", suppress_kineto_output=True)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ccac4f and e5e77eb.

📒 Files selected for processing (3)

examples/gemm/example_gemm.py (1 hunks)
tilelang/profiler/__init__.py (4 hunks)
tilelang/profiler/bench.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

examples/gemm/example_gemm.py (2)

tilelang/jit/kernel.py (1)

get_profiler (360-376)

tilelang/profiler/__init__.py (1)

do_bench (218-296)

tilelang/profiler/__init__.py (1)

tilelang/profiler/bench.py (2)

do_bench_event (9-102)

do_bench_kineto (151-253)

tilelang/profiler/bench.py (1)

tilelang/env.py (1)

get (136-139)

🪛 Ruff (0.13.2)

tilelang/profiler/__init__.py

230-230: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

tilelang/profiler/bench.py

160-160: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

221-221: Unused noqa directive (non-enabled: SIM101)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-amd
GitHub Check: bot-task

coderabbitai · 2025-10-03T16:51:57Z

tilelang/profiler/bench.py

+# from https://github.com/deepseek-ai/DeepGEMM/blob/main/deep_gemm/testing/bench.py
+def do_bench_kineto(
+    fn: Callable,
+    kernel_names,
+    warmup: float = 25,
+    rep: float = 100,
+    _n_warmup: int = 0,
+    _n_repeat: int = 0,
+    fast_flush: bool = True,
+    return_mode: Literal["min", "max", "mean", "median"] = "mean",
+    trace_path: str = None,
+    suppress_kineto_output: bool = False,
+    with_multiple_kernels: bool = False,
+) -> Union[float, List[float]]:
+    """Benchmarks the runtime of a PyTorch function using Kineto."""


🛠️ Refactor suggestion | 🟠 Major

Correct Kineto API typing and Optional usage (RUF013).

kernel_names supports str or tuple; return can be float or tuple.

trace_path should be Optional[str].

-from typing import Callable, List, Literal, Optional, Union +from typing import Callable, List, Literal, Optional, Union, Tuple @@ -def do_bench_kineto( - fn: Callable, - kernel_names, +def do_bench_kineto( + fn: Callable, + kernel_names: Union[str, Tuple[str, ...]], @@ - return_mode: Literal["min", "max", "mean", "median"] = "mean", - trace_path: str = None, + return_mode: Literal["min", "max", "mean", "median"] = "mean", + trace_path: Optional[str] = None, suppress_kineto_output: bool = False, with_multiple_kernels: bool = False, -) -> Union[float, List[float]]: +) -> Union[float, Tuple[float, ...]]:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# from https://github.com/deepseek-ai/DeepGEMM/blob/main/deep_gemm/testing/bench.py

def do_bench_kineto(

fn: Callable,

kernel_names,

warmup: float = 25,

rep: float = 100,

_n_warmup: int = 0,

_n_repeat: int = 0,

fast_flush: bool = True,

return_mode: Literal["min", "max", "mean", "median"] = "mean",

trace_path: str = None,

suppress_kineto_output: bool = False,

with_multiple_kernels: bool = False,

) -> Union[float, List[float]]:

"""Benchmarks the runtime of a PyTorch function using Kineto."""

from typing import Callable, List, Literal, Optional, Union, Tuple

# from https://github.com/deepseek-ai/DeepGEMM/blob/main/deep_gemm/testing/bench.py

def do_bench_kineto(

fn: Callable,

kernel_names: Union[str, Tuple[str, ...]],

warmup: float = 25,

rep: float = 100,

_n_warmup: int = 0,

_n_repeat: int = 0,

fast_flush: bool = True,

return_mode: Literal["min", "max", "mean", "median"] = "mean",

trace_path: Optional[str] = None,

suppress_kineto_output: bool = False,

with_multiple_kernels: bool = False,

) -> Union[float, Tuple[float, ...]]:

"""Benchmarks the runtime of a PyTorch function using Kineto."""

🧰 Tools

🪛 Ruff (0.13.2)

160-160: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

🤖 Prompt for AI Agents

In tilelang/profiler/bench.py around lines 150 to 164, the function signature uses overly narrow typing: update kernel_names to accept either a string or a tuple of strings (e.g., Union[str, Tuple[str, ...]]), change the return annotation to reflect that it may return a float or a tuple of floats (e.g., Union[float, Tuple[float, ...], List[float]]), and make trace_path Optional[str]; also add any missing typing imports (Optional, Tuple, Union) at top of the file. Ensure no runtime behavior changes, only update the type hints and imports.

LeiWang1999 · 2025-10-03T17:04:35Z

would be better if we can combine do_bench_kineto and do_bench_event into a same api do_bench.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tilelang/profiler/bench.py (2)

159-160: Fix Optional typing for trace_path.

trace_path defaults to None, so the annotation must be Optional[str] to satisfy PEP 484 and Ruff (RUF013). Update the signature accordingly.

270-270: Restore trailing newline.

CI is failing with Ruff W292 because the file lacks a trailing newline. Please add the terminal newline so formatting passes.

tilelang/profiler/__init__.py (1)

230-231: Align trace_path annotation with default.

trace_path defaults to None; annotate it as Optional[str] to silence Ruff (RUF013) and accurately describe the API.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5e77eb and 17bdaba.

📒 Files selected for processing (4)

examples/gemm/example_gemm.py (1 hunks)
tilelang/jit/adapter/cython/adapter.py (1 hunks)
tilelang/profiler/__init__.py (3 hunks)
tilelang/profiler/bench.py (2 hunks)

✅ Files skipped from review due to trivial changes (1)

tilelang/jit/adapter/cython/adapter.py

🚧 Files skipped from review as they are similar to previous changes (1)

examples/gemm/example_gemm.py

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/profiler/bench.py (1)

tilelang/profiler/__init__.py (1)

do_bench (218-285)

🪛 GitHub Actions: CI Test on AMD

tilelang/profiler/bench.py

[error] 270-270: Ruff: W292 No newline at end of file. 1 fixable with the --fix option.

🪛 Ruff (0.13.3)

tilelang/profiler/bench.py

159-159: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

247-247: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

tilelang/profiler/__init__.py

227-227: Unused method argument: grad_to_none

(ARG002)

228-228: Unused method argument: quantiles

(ARG002)

230-230: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: format-check

tilelang/profiler/__init__.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tilelang/profiler/bench.py (1)
159-159: Fix implicit Optional type hint.

The trace_path parameter should be explicitly typed as Optional[str] to comply with PEP 484.

Apply this diff:
-    trace_path: str = None,
+    trace_path: Optional[str] = None,

🧹 Nitpick comments (2)

tilelang/profiler/bench.py (2)

151-162: Enhance docstring to document Kineto-specific behavior.

The docstring is minimal. Consider documenting the Kineto/CUPTI-specific behavior, the purpose of trace_path, suppress_kineto_output, and the handling of the DG_NSYS_PROFILING environment variable.

236-270: Add docstring to document dispatcher behavior.

The do_bench function now acts as a dispatcher but lacks a docstring. Consider adding documentation that explains:

The backend parameter and its effect on profiling method
Which parameters are specific to each backend (e.g., grad_to_none and quantiles only apply to "event" backend, trace_path and suppress_kineto_output only apply to "cupti" backend)
The return value and how it differs between backends

Example docstring structure:

def do_bench(
    fn: Callable,
    warmup: float = 25,
    rep: float = 100,
    _n_warmup: int = 0,
    _n_repeat: int = 0,
    grad_to_none: Optional[List[torch.Tensor]] = None,
    quantiles: Optional[List[float]] = None,
    fast_flush: bool = True,
    backend: Literal["event", "cupti"] = "event",
    return_mode: Literal["min", "max", "mean", "median"] = "mean",
    trace_path: Optional[str] = None,
    suppress_kineto_output: bool = False,
) -> Union[float, List[float]]:
    """Benchmarks the runtime of a PyTorch function.
    
    Dispatches to either event-based or Kineto/CUPTI-based profiling
    depending on the backend parameter.
    
    Args:
        fn: Function to benchmark
        warmup: Target warmup time in milliseconds
        rep: Target number of repetitions
        _n_warmup: Override for number of warmup iterations
        _n_repeat: Override for number of timing iterations
        grad_to_none: Tensors whose gradients should be cleared (event backend only)
        quantiles: Performance percentiles to compute (event backend only)
        fast_flush: Whether to use faster L2 cache flushing
        backend: Profiling backend ("event" for CUDA events, "cupti" for Kineto/CUPTI)
        return_mode: How to aggregate timing results
        trace_path: Path to save chrome trace (cupti backend only)
        suppress_kineto_output: Whether to suppress Kineto output (cupti backend only)
        
    Returns:
        Aggregated runtime in milliseconds, or list of quantiles if requested
    """

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17bdaba and 5b721cc.

📒 Files selected for processing (1)

tilelang/profiler/bench.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/profiler/bench.py (1)

tilelang/profiler/__init__.py (1)

do_bench (218-285)

🪛 Ruff (0.13.3)

tilelang/profiler/bench.py

159-159: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

247-247: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-amd
GitHub Check: format-check
GitHub Check: bot-task

🔇 Additional comments (4)

tilelang/profiler/bench.py (4)

5-6: LGTM!

The new imports are necessary for the output suppression functionality.

9-102: LGTM!

The refactoring to make this an internal function is appropriate. The enhanced docstring provides clear documentation of the function's behavior and parameters.

105-147: LGTM!

The context managers are well-implemented. The suppress_stdout_stderr class properly manages file descriptors with duplication and restoration. The empty_suppress no-op context manager is a clean pattern for conditional suppression.

215-216: Clarify NSYS profiling return value. Returning a hardcoded 1 when using_nsys is enabled is ambiguous; document this sentinel value or replace it with a meaningful indicator (e.g., float('nan')) or raise NotImplementedError.

coderabbitai · 2025-10-04T09:37:04Z

tilelang/profiler/bench.py

+    fast_flush: bool = True,
+    backend: Literal["event", "cupti"] = "event",
+    return_mode: Literal["min", "max", "mean", "median"] = "mean",
+    trace_path: str = None,


⚠️ Potential issue | 🟡 Minor

Fix implicit Optional type hint.

The trace_path parameter should be explicitly typed as Optional[str] to comply with PEP 484.

Apply this diff:

- trace_path: str = None, + trace_path: Optional[str] = None,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

trace_path: str = None,

trace_path: Optional[str] = None,

🧰 Tools

🪛 Ruff (0.13.3)

247-247: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

🤖 Prompt for AI Agents

In tilelang/profiler/bench.py around line 247, the parameter trace_path is currently declared as `trace_path: str = None`, which creates an implicit Optional; change its annotation to `trace_path: Optional[str] = None` and ensure `from typing import Optional` is imported at the top of the module (add the import if missing). This makes the optional nature explicit and compliant with PEP 484.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tilelang/profiler/bench.py (2)

9-17: Consider using contextlib.nullcontext for the no-op case.

While the custom empty_suppress implementation is correct, Python's standard library provides contextlib.nullcontext for this exact use case, which would eliminate the need for custom code.

Apply this diff to use the standard library:

-# from https://github.com/deepseek-ai/DeepGEMM/blob/main/deep_gemm/testing/bench.py
-class empty_suppress:
-
-    def __enter__(self):
-        return self
-
-    def __exit__(self, *_):
-        pass
+from contextlib import nullcontext as empty_suppress

Then update line 157 and 163 to use it:

-        suppress = suppress_stdout_stderr if not using_nsys else empty_suppress
+        suppress = suppress_stdout_stderr if not using_nsys else empty_suppress
         with suppress():
             schedule = torch.profiler.schedule(
-                wait=1, warmup=0, active=1, repeat=1) if not using_nsys else None
+                wait=1, warmup=0, active=1, repeat=1) if not using_nsys else None
             profiler = torch.profiler.profile(
                 activities=[torch.profiler.ProfilerActivity.CUDA],
-                schedule=schedule) if not using_nsys else empty_suppress()
+                schedule=schedule) if not using_nsys else empty_suppress()

189-190: Consider extracting error message to comply with TRY003.

While the error handling is functionally correct, the static analysis tool flags the long error message inline. This is a minor style issue.

Apply this diff to comply with TRY003:

+    UNKNOWN_BACKEND_MSG = "Unknown profiler backend: {}"
+
     else:
-        raise ValueError(f"Unknown profiler backend: {backend}")
+        raise ValueError(UNKNOWN_BACKEND_MSG.format(backend))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b721cc and 4cfa04d.

📒 Files selected for processing (2)

tilelang/profiler/__init__.py (3 hunks)
tilelang/profiler/bench.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/profiler/bench.py (1)

tilelang/env.py (1)

get (136-139)

🪛 Ruff (0.13.3)

tilelang/profiler/bench.py

190-190: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: format-check

🔇 Additional comments (7)

tilelang/profiler/bench.py (4)

5-6: LGTM!

The os and sys imports are necessary for the stdout/stderr suppression utilities introduced in this PR.

19-52: LGTM!

The suppress_stdout_stderr implementation correctly redirects stdout and stderr to /dev/null and properly restores the original state. File descriptor management and resource cleanup are handled correctly.

64-64: LGTM!

The backend parameter addition with Literal["event", "cupti"] provides clear type safety for backend selection.

126-153: LGTM!

The event backend preserves the original timing logic with CUDA events. The implementation correctly handles gradient clearing, L2 cache flushing, and quantile computation.

tilelang/profiler/__init__.py (3)

3-3: LGTM!

The Literal import addition supports the new backend and return_mode parameter typing.

226-229: LGTM!

The new parameter additions (backend, grad_to_none, quantiles, return_mode) are properly typed and align with the updated do_bench signature in bench.py.

252-262: LGTM! Past review feedback addressed.

All parameters (grad_to_none, quantiles, backend, return_mode) are now properly forwarded to the underlying do_bench call, resolving the previous review concern about missing parameter propagation.

LeiWang1999 · 2025-10-04T17:31:42Z

tilelang/profiler/__init__.py

        n_repeat: int = 1,
        input_tensors: List[torch.Tensor] = None,
+        backend: Literal["event", "cupti"] = "event",
+        grad_to_none: Optional[List[torch.Tensor]] = None,


do we really need parameter grad_to_none?

LeiWang1999 · 2025-10-04T17:33:11Z

tilelang/profiler/bench.py

+        return getattr(torch, return_mode)(times).item()
+
+    elif backend == "cupti":
+        using_nsys = int(os.environ.get('DG_NSYS_PROFILING', 0))


DG means DeepGEMM, would be better to remove this env and always with using_nsys = False.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cfa04d and f0509ab.

📒 Files selected for processing (1)

tilelang/profiler/bench.py (4 hunks)

🧰 Additional context used

🪛 Ruff (0.13.3)

tilelang/profiler/bench.py

175-175: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: format-check

coderabbitai · 2025-10-04T17:54:20Z

tilelang/profiler/bench.py

+    if backend == "event":
+        start_event = [torch.cuda.Event(enable_timing=True) for i in range(n_repeat)]
+        end_event = [torch.cuda.Event(enable_timing=True) for i in range(n_repeat)]
+        for i in range(n_repeat):
+            # we don't want `fn` to accumulate gradient values
+            # if it contains a backward pass. So we clear the
+            # provided gradients
+            if grad_to_none is not None:
+                for x in grad_to_none:
+                    x.grad = None
+            # we clear the L2 cache before each run
+            cache.zero_()
+            # record time of `fn`
+            start_event[i].record()
+            fn()
+            end_event[i].record()
+        # Record clocks
+        torch.cuda.synchronize()
+        times = torch.tensor(
+            [s.elapsed_time(e) for s, e in zip(start_event, end_event)],
+            dtype=torch.float,
+        )
+        if quantiles is not None:
+            ret = torch.quantile(times, torch.tensor(quantiles, dtype=torch.float)).tolist()
+            if len(ret) == 1:
+                ret = ret[0]
+            return ret
+        return getattr(torch, return_mode)(times).item()
+
+    elif backend == "cupti":
+        with suppress_stdout_stderr():
+            schedule = torch.profiler.schedule(
+                wait=1, warmup=0, active=1, repeat=1)
+            profiler = torch.profiler.profile(
+                activities=[torch.profiler.ProfilerActivity.CUDA],
+                schedule=schedule)
+            with profiler:
+                for _i in range(2):
+                    for _ in range(n_repeat):
+                        cache.zero_()
+                        fn()
+
+                    profiler.step()
+
+        # Return average kernel times
+        total_cuda_time = 0.0
+        excluded_time = 0.0
+        excluded_kernels = "at::native::vectorized_elementwise"
+        for e in profiler.key_averages():
+            total_cuda_time += e.self_device_time_total
+            if excluded_kernels in e.key:
+                excluded_time += e.self_device_time_total
+
+        kernel_time = (total_cuda_time - excluded_time) / n_repeat
+
+        return kernel_time * 1e-3


⚠️ Potential issue | 🟠 Major

Honor grad/aggregation options for the CUPTI path.

When backend="cupti", we drop the gradient reset (Line 124), quantile handling (Line 139), and return_mode aggregation (Line 144). Any caller supplying grad_to_none or return_mode="min"/"max"/"median" still gets the CUPTI branch, but those settings are silently ignored, so backward benches will accumulate grads and the returned statistic is always the mean. This diverges from the API contract established by the event backend and will yield wrong results for existing call sites. Please either implement the equivalent handling in the CUPTI branch or reject unsupported combinations up front (e.g. raise with a clear message when grad_to_none/quantiles/non-mean return modes are requested).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tilelang/profiler/bench.py (2)
155-155: Consider making the excluded kernel configurable.

The hardcoded exclusion of "at::native::vectorized_elementwise" kernels should be documented or made into a parameter. Users may want to:

Include these kernels in their measurements

Exclude different kernels

Option 1: Document why this specific kernel is excluded:
-        excluded_kernels = "at::native::vectorized_elementwise"
+        # Exclude vectorized_elementwise as it's an overhead kernel not part of the measured computation
+        excluded_kernels = "at::native::vectorized_elementwise"
Option 2: Add a parameter for excluded kernels:
def do_bench(
    fn: Callable,
    ...
    backend: Literal["event", "cupti"] = "event",
    excluded_kernels: Optional[List[str]] = None,
    ...
):
166-166: Minor: Consider extracting the error message.

Static analysis suggests avoiding long messages inside exception constructors (TRY003). While this is a minor style issue, you could extract it to a constant if the message grows.
+UNKNOWN_BACKEND_ERROR = "Unknown profiler backend: {backend}. Supported backends: 'event', 'cupti'"
 else:
-    raise ValueError(f"Unknown profiler backend: {backend}")
+    raise ValueError(UNKNOWN_BACKEND_ERROR.format(backend=backend))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0509ab and 7e64b8e.

📒 Files selected for processing (2)

tilelang/profiler/__init__.py (3 hunks)
tilelang/profiler/bench.py (4 hunks)

🧰 Additional context used

🪛 Ruff (0.13.3)

tilelang/profiler/bench.py

166-166: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)

tilelang/profiler/bench.py (1)

9-43: LGTM on the suppression utility.

The suppress_stdout_stderr context manager correctly duplicates file descriptors, redirects stdout/stderr to /dev/null, and restores them on exit. This is a clean approach for silencing profiler output.

tilelang/profiler/__init__.py (1)

226-260: LGTM! Parameters are correctly propagated.

The new backend, quantiles, and return_mode parameters are properly:

Added to the function signature with appropriate types

Propagated to the underlying do_bench call

Aligned with the implementation in bench.py

The parameter forwarding is complete and correct.

coderabbitai · 2025-10-04T18:01:50Z

tilelang/profiler/bench.py

+    elif backend == "cupti":
+        with suppress_stdout_stderr():
+            schedule = torch.profiler.schedule(wait=1, warmup=0, active=1, repeat=1)
+            profiler = torch.profiler.profile(
+                activities=[torch.profiler.ProfilerActivity.CUDA], schedule=schedule)
+            with profiler:
+                for _i in range(2):
+                    for _ in range(n_repeat):
+                        cache.zero_()
+                        fn()
+
+                    profiler.step()
+
+        # Return average kernel times
+        total_cuda_time = 0.0
+        excluded_time = 0.0
+        excluded_kernels = "at::native::vectorized_elementwise"
+        for e in profiler.key_averages():
+            total_cuda_time += e.self_device_time_total
+            if excluded_kernels in e.key:
+                excluded_time += e.self_device_time_total
+
+        kernel_time = (total_cuda_time - excluded_time) / n_repeat
+
+        return kernel_time * 1e-3


⚠️ Potential issue | 🔴 Critical

Critical: Fix unit conversion and honor ignored parameters in the CUPTI backend.

The CUPTI backend has several critical issues:

Unit mismatch (Line 163): Returns kernel_time * 1e-3 (converts microseconds to milliseconds), but the event backend returns milliseconds directly. This creates an inconsistency where the CUPTI backend returns values 1000× smaller than expected.

Ignored quantiles parameter: Callers requesting specific quantiles will get the mean instead, breaking the API contract.

Ignored return_mode parameter: Callers requesting "min", "max", or "median" will always get "mean", again breaking the contract.

Undocumented loop (Line 145): The outer loop running twice corresponds to the profiler schedule's wait and active phases but lacks explanation.

Fix the unit conversion:

- return kernel_time * 1e-3 + return kernel_time * 1e-3 # Convert microseconds to milliseconds

Wait, let me recalculate: self_device_time_total is in microseconds. The event backend returns milliseconds via elapsed_time(end). So kernel_time (in µs) should be divided by 1000, not multiplied:

- return kernel_time * 1e-3 + return kernel_time / 1000 # Convert microseconds to milliseconds

Honor quantiles and return_mode or reject them up front:

Option 1: Validate and reject unsupported combinations at the start of the function:

if backend == "cupti": if quantiles is not None: raise ValueError("quantiles parameter is not supported with cupti backend") if return_mode != "mean": raise ValueError(f"cupti backend only supports return_mode='mean', got '{return_mode}'")

Option 2: Implement support by collecting per-iteration times in the CUPTI path (more complex but provides feature parity).

Document the profiler schedule loop:

with profiler: + # Run twice: once for profiler wait phase, once for active measurement for _i in range(2):

Based on learnings: The reviewer asked if do_bench_kineto and do_bench_event could be combined into a single API. This PR achieves that by adding a backend parameter to do_bench.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tilelang/profiler/bench.py (1)
57-129: API contract violation: CUPTI backend silently ignores quantiles and return_mode parameters.

When backend="cupti" is selected, the function accepts quantiles and return_mode parameters but silently ignores them (Line 126 calls _bench_with_cupti which doesn't accept these parameters). This breaks the API contract:

Callers requesting quantiles=[0.5, 0.95] will get a single mean value

Callers requesting return_mode="min" will get the mean instead

This creates silent behavioral divergence from the event backend.

Solution 1 (recommended): Validate parameters up front and reject unsupported combinations:
     assert return_mode in ["min", "max", "mean", "median"], \
         f"Invalid return_mode: {return_mode}"
+    
+    # Validate CUPTI backend constraints
+    if backend == "cupti":
+        if quantiles is not None:
+            raise ValueError("quantiles parameter is not supported with backend='cupti'")
+        if return_mode != "mean":
+            raise ValueError(f"backend='cupti' only supports return_mode='mean', got '{return_mode}'")
Solution 2 (ideal but more complex): Implement full parameter support in _bench_with_cupti by collecting per-iteration times and computing quantiles/aggregations.

Based on learnings: This addresses the past review comment about honoring quantiles and return_mode for the CUPTI path.

🧹 Nitpick comments (1)

tilelang/profiler/bench.py (1)
180-184: Document the profiler schedule loop.

The outer loop running twice corresponds to the profiler's wait (1) and active (1) phases from the schedule, but this isn't documented.

Add a comment:
         with profiler:
+            # Loop twice: once for wait phase, once for active measurement phase
             for _ in range(2):
                 for _ in range(n_repeat):

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e64b8e and 9f55d3b.

📒 Files selected for processing (1)

tilelang/profiler/bench.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/profiler/bench.py (1)

tilelang/profiler/__init__.py (1)

do_bench (218-281)

🪛 Ruff (0.13.3)

tilelang/profiler/bench.py

128-128: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-amd
GitHub Check: bot-task

🔇 Additional comments (3)

tilelang/profiler/bench.py (3)

10-54: LGTM! Clean implementation of stdout/stderr suppression.

The context manager correctly:

Saves original file descriptors using os.dup

Redirects streams to /dev/null

Restores everything in __exit__

This is a standard pattern for fd redirection.

131-163: LGTM! Solid CUDA events implementation.

The function correctly:

Creates timing events per iteration

Clears L2 cache between runs

Synchronizes before collecting measurements

Handles both quantile and aggregation modes

189-194: Make excluded kernel pattern configurable and use regex

Replace the hardcoded "at::native::vectorized_elementwise" with a user-provided list of regex patterns (e.g. via a CLI flag --exclude-kernel-regex, defaulting to r"vectorized_elementwise")

Match only on the substring vectorized_elementwise (or other future names) rather than the full namespace to tolerate PyTorch renames

Optionally document how to verify the regex against profiler.key_averages() output across PyTorch releases

Cunxiao2002 · 2025-10-06T10:31:52Z

some benchmarks for two profiler

example/gemm/example_gemm.py

event Latency: 0.027775999158620834ms
cupti Latency: 0.023777000000000045ms

example/flash_attention/example_mha_fwd_bhsd.py

event Latency: 0.01020800042897462845ms
cupti Latency: 0.00630400000000008731ms

example/grouped_gemm/example_grouped_gemm_fwd.py

event Latency: 0.4819520115852356ms
cupti Latency: 0.4754229999999989ms

tilelang/profiler/bench.py

…cupti

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f55d3b and aa70ebd.

📒 Files selected for processing (1)

testing/python/profiler/test_tilelang_profiler.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

testing/python/profiler/test_tilelang_profiler.py (7)

src/tl_templates/cuda/reduce.h (1)

T (75-147)

tilelang/jit/__init__.py (1)

jit (240-313)

tilelang/jit/kernel.py (2)

out_idx (446-447)

get_profiler (360-376)

tilelang/language/allocate.py (2)

alloc_shared (21-36)

alloc_fragment (53-64)

tilelang/language/fill.py (1)

clear (24-48)

tilelang/language/pipeline.py (1)

Pipelined (9-46)

tilelang/language/copy.py (1)

copy (84-152)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-amd
GitHub Check: bot-task

coderabbitai · 2025-10-06T14:12:44Z

testing/python/profiler/test_tilelang_profiler.py

+    a = torch.randn(1024, 1024).cuda().half()
+    b = torch.randn(1024, 1024).cuda().half()
+
+    c = kernel(a, b)
+    ref_c = a @ b
+    torch.testing.assert_close(c, ref_c, rtol=1e-2, atol=1e-2)
+
+    # benchmark
+    profiler = kernel.get_profiler()
+
+    # use cupti backend
+    cupti_latency = profiler.do_bench(backend="cupti")
+
+    # use event backend
+    event_latency = profiler.do_bench(backend="event")
+    print(f"cupti Latency: {cupti_latency}ms")


⚠️ Potential issue | 🟠 Major

Guard the test when CUDA is unavailable.

torch.randn(...).cuda() will raise immediately on CPU-only hosts, so CI without GPUs will fail before exercising the profiler. Add a CUDA-availability check (e.g., skip via pytest.skip / pytest.mark.skipif) before moving tensors to CUDA.

Apply this diff (pytest is commonly available in the test suite):

+ import pytest + if not torch.cuda.is_available(): + pytest.skip("Profiler test requires CUDA.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

a = torch.randn(1024, 1024).cuda().half()

b = torch.randn(1024, 1024).cuda().half()

c = kernel(a, b)

ref_c = a @ b

torch.testing.assert_close(c, ref_c, rtol=1e-2, atol=1e-2)

# benchmark

profiler = kernel.get_profiler()

# use cupti backend

cupti_latency = profiler.do_bench(backend="cupti")

# use event backend

event_latency = profiler.do_bench(backend="event")

print(f"cupti Latency: {cupti_latency}ms")

import pytest

if not torch.cuda.is_available():

pytest.skip("Profiler test requires CUDA.")

a = torch.randn(1024, 1024).cuda().half()

b = torch.randn(1024, 1024).cuda().half()

c = kernel(a, b)

ref_c = a @ b

torch.testing.assert_close(c, ref_c, rtol=1e-2, atol=1e-2)

# benchmark

profiler = kernel.get_profiler()

# use cupti backend

cupti_latency = profiler.do_bench(backend="cupti")

# use event backend

event_latency = profiler.do_bench(backend="event")

print(f"cupti Latency: {cupti_latency}ms")

🤖 Prompt for AI Agents

testing/python/profiler/test_tilelang_profiler.py lines 35-50: the test unconditionally calls .cuda() which fails on CPU-only hosts; add a guard at the top of this block that checks torch.cuda.is_available() and skips the test when False (use pytest.skip or @pytest.mark.skipif) before creating tensors or calling kernel.get_profiler(), so no tensors are moved to CUDA on CI without GPUs.

Cunxiao2002 added 6 commits September 15, 2025 15:24

Merge branch 'main' of https://github.com/Cunxiao2002/tilelang

5294335

Merge branch 'main' of https://github.com/Cunxiao2002/tilelang

80a7d89

Merge branch 'main' of https://github.com/Cunxiao2002/tilelang

682ac18

Merge branch 'main' of https://github.com/Cunxiao2002/tilelang

0dd539a

[Profiler]Adds CUPTI profiler support

d2cbe45

format

e5e77eb

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

Cunxiao2002 closed this Oct 3, 2025

Cunxiao2002 reopened this Oct 3, 2025

Cunxiao2002 marked this pull request as draft October 3, 2025 17:38

rafactor cupti profiler

17bdaba

Cunxiao2002 marked this pull request as ready for review October 4, 2025 09:29

format

5b721cc

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

tilelang/profiler/__init__.py Show resolved Hide resolved

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

Cunxiao2002 added 5 commits October 4, 2025 23:38

Merge branch 'main' of https://github.com/Cunxiao2002/tilelang

8b7652e

Merge branch 'main' into cupti

55696f3

rafactor

4cfa04d

Merge branch 'main' of https://github.com/Cunxiao2002/tilelang

f691e79

Merge branch 'main' into cupti

8e9de93

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

LeiWang1999 requested changes Oct 4, 2025

View reviewed changes

Cunxiao2002 requested a review from LeiWang1999 October 4, 2025 17:50

rafactor

f0509ab

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

Cunxiao2002 added 2 commits October 5, 2025 01:55

fix lint

ebbda72

fix lint

7e64b8e

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

refactor

9f55d3b

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

LeiWang1999 previously approved these changes Oct 6, 2025

View reviewed changes

tilelang/profiler/bench.py Show resolved Hide resolved

Cunxiao2002 added 2 commits October 6, 2025 22:07

add profiler tests

8ed5139

Merge branch 'cupti' of https://github.com/Cunxiao2002/tilelang into …

aa70ebd

…cupti

Cunxiao2002 dismissed LeiWang1999’s stale review via aa70ebd October 6, 2025 14:09

LeiWang1999 approved these changes Oct 6, 2025

View reviewed changes

LeiWang1999 merged commit 91d5ef5 into tile-ai:main Oct 6, 2025
4 of 5 checks passed

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

LeiWang1999 mentioned this pull request Oct 6, 2025

[Feature request] CUPTI profiler instead of CUDA event timing #838

Closed

This was referenced Oct 24, 2025

[Language] Initial version of tilelang frontend v2 #1120

Open

[BugFix] alloc_var init failed to handle complex expression #1144

Merged

[Profiler]Adds CUPTI profiler support #936

[Profiler]Adds CUPTI profiler support #936

Uh oh!

Conversation

Cunxiao2002 commented Oct 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 commented Oct 3, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Cunxiao2002 commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Cunxiao2002 commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 3, 2025 •

edited

Loading