[Backend] Add metal backend #799

oraluben · 2025-09-10T09:44:05Z

Add metal backend via tvm's metal codegen and torch's torch.mps.compile_shader, and create a new torch backend, who only supports MPS for now. Some tasks are left for future works.

Performance numbers:

PyTorch's MPS backend uses matrixMultiplicationWithPrimaryTensor by default, which is faster than default implementation.
For pytorch's native MPS mm impl (enable with PYTORCH_MPS_PREFER_METAL=1), we can observe significant speedup.

In general, tilelang is 2-3x faster than torch native impl and 2-3x slower than MPSGraph impl. tilelang impl could be optimized via implementating T.gemm by matrixMultiplicationWithPrimaryTensor.

Outliners in the table below seem to be error and disappear after rerun.

config                     tl_run1    tl_run2        torch    torch_native
----------------------  ----------  ---------  -----------  --------------
1024_1024_1024,float32    0.203224   0.119603    0.0732187        0.454423
1024_1024_1024,float16    0.118059   0.171631    0.0509571        0.329795
1024_1024_2048,float32    0.21623    0.277052    0.0956315        1.01536
1024_1024_2048,float16    0.163326   0.245721    0.0952914        0.738723
1024_1024_4096,float32    0.364069   0.58848     0.18605          2.02241
1024_1024_4096,float16    0.321528   0.519458    0.164556         1.51384
1024_2048_1024,float32    0.17366    0.236656    0.0963473        0.991771
1024_2048_1024,float16    0.174649   0.251009    0.0902327        0.663332
1024_2048_2048,float32    0.350103   0.55103     0.177765         1.99991
1024_2048_2048,float16    0.322436   0.529719    0.160975         1.3848
1024_2048_4096,float32    0.711705   1.1608      0.3412           3.81918
1024_2048_4096,float16    0.649519   1.15448     0.308582         2.75274
1024_4096_1024,float32    0.348693   0.502921    0.17556          1.92968
1024_4096_1024,float16    0.320613   0.553743    0.162928         1.35127
1024_4096_2048,float32    0.705229   1.10684     0.335839         3.74767
1024_4096_2048,float16    0.634583   1.01857     0.305798         2.6813
1024_4096_4096,float32    1.42189    2.32234     0.657488         7.39324
1024_4096_4096,float16    1.23835    2.05512     0.589203         5.62581
2048_1024_1024,float32    0.172703   0.229385    0.0933813        0.96074
2048_1024_1024,float16    0.23833    0.337618    0.0884824        0.637932
2048_1024_2048,float32    0.349552   0.551636    0.174772         2.06767
2048_1024_2048,float16    0.309059   0.555771    0.159407         1.44814
2048_1024_4096,float32    0.705804   1.17274     0.341789         3.78212
2048_1024_4096,float16    0.623397   1.36904     0.30442          2.81867
2048_2048_1024,float32    0.340474   0.526456    0.176063         2.03803
2048_2048_1024,float16    0.474886   0.787487    0.159341         1.40929
2048_2048_2048,float32    0.702884   1.19228     0.347602       264.746
2048_2048_2048,float16    0.617617   1.03019     0.316195         2.74656
2048_2048_4096,float32    1.42331    2.59355     0.673089         7.18734
2048_2048_4096,float16    1.25659    2.48737     0.587454         5.39229
2048_4096_1024,float32    0.698205   1.09426     0.342761         3.74332
2048_4096_1024,float16    0.610077   1.15637     0.302213         3.20815
2048_4096_2048,float32    1.41329    2.46777     0.670037         7.25803
2048_4096_2048,float16    1.39938    2.26266     0.5874           5.24247
2048_4096_4096,float32    2.96685    3.01786     1.3243         676.942
2048_4096_4096,float16    2.55003    2.4255      1.16313          6.98329
4096_1024_1024,float32    0.346517   0.346101  267.917            1.24304
4096_1024_1024,float16    0.301433   0.302083    0.157588         0.881915
4096_1024_2048,float32    0.921969   0.691183    0.330206         2.44251
4096_1024_2048,float16    1.25368    0.609252    0.613118         1.76924
4096_1024_4096,float32    2.56901    1.37706     1.2552           4.88777
4096_1024_4096,float16    2.15145    1.21733     1.08383          3.47484
4096_2048_1024,float32    1.62391    0.675179    0.539141         2.43388
4096_2048_1024,float16    1.08711    0.599315    0.604424         1.7381
4096_2048_2048,float32    2.83558    1.36514     1.05902          4.84539
4096_2048_2048,float16    4.39186    1.21936     1.17894          3.50558
4096_2048_4096,float32   13.5588     2.83373     4.48778          9.69309
4096_2048_4096,float16    4.50082    2.42589     1.63996          6.96601
4096_4096_1024,float32    2.2144     1.36068     1.12703          4.85186
4096_4096_1024,float16  209.909      1.20646     0.910777         3.47439
4096_4096_2048,float32    4.83127    2.76113     2.04769          9.67094
4096_4096_2048,float16    4.50811    2.40558     1.90847          6.92157
4096_4096_4096,float32   10.6532     5.53237     4.63781         19.3358
4096_4096_4096,float16    8.64457    4.83226     4.26353         13.8529

script: https://gist.github.com/oraluben/89e54b0b687a1fa483c38429442a6d1e

TODOs:

Fix hard-coded torch.mps device and unify cuda with torch.accelerator (or we should use cuda by default and do this later?)
Benchmark gemm against pytorch/mlx (need to compile a patched version of pytorch due to a bug)
Fix bug, current add kernel failed when size > 1024 (invalid launch issue of wrong #thread)
Use proper metal/metallib backend
Add mac aarch64 cicd

Future work:

Verify quantization support
Support cache by compile with something like torch extension
waiting for [MPS] Extension segfault after #152210 (and >= 2.8) pytorch/pytorch#163721
~~(performance) Dispatch to MPSGraph (e.g. mm) for supported ops for better performance~~
Implement T.gemm via matrixMultiplicationWithPrimaryTensor

Summary by CodeRabbit

New Features
- Apple Metal (MPS) backend with a Torch-backed execution path and Metal kernel adapter; Metal target exposed broadly.
Improvements
- Automatic Metal detection and macOS-aware build/link behavior.
- Dynamic runtime resolution of prebuilt TVM libraries.
- Centralized device detection, broader backend warnings, and an additional JIT backend option (nvrtc).
Tests
- Added Metal-targeted GEMM verification tests.
Chores
- Metal install script and macOS CI workflow.
Documentation
- README updated announcing Metal support.

coderabbitai · 2025-09-10T09:44:11Z

Walkthrough

Add Apple Metal (MPS) support across build, JIT, adapters, lowering, runtime, tests, CI and packaging; resolve prebuilt TVM library locations at configure time; add device utilities and generalize profiler/autotuner/cache behavior for CUDA and MPS backends.

Changes

Cohort / File(s)	Summary
Build & CI `CMakeLists.txt`, `install_metal.sh`, `.github/workflows/metal_ci.yml`, `setup.py`	Resolve prebuilt TVM libs via `find_library` variables; conditionally collect/link Metal sources; add `install_metal.sh`; add macOS Metal CI workflow; add platform/backend flags and gating in `setup.py`.
New Metal runtime source `src/target/rt_mod_metal.cc`	Add placeholder Metal runtime source file (included when `USE_METAL` enabled).
Targets & device utilities `tilelang/utils/target.py`, `tilelang/utils/device.py`, `tilelang/contrib/cc.py`	Add Metal target detection and availability checks; add `IS_CUDA`/`IS_MPS` and `get_current_device()`; add `is_darwin()` helper.
Lowering / TVM interaction `tilelang/engine/lower.py`	Add Metal branch in `device_codegen_without_compile` calling `tvm.ffi.get_global_func("target.build.metal")`.
JIT core & kernel flow `tilelang/jit/__init__.py`, `tilelang/jit/kernel.py`	Enforce Metal targets require `execution_backend="torch"`; expand allowed backends to include `"nvrtc"` and `"torch"`; route `"torch"` backend to Metal adapter.
JIT adapters & wrappers `tilelang/jit/adapter/__init__.py`, `tilelang/jit/adapter/utils.py`, `tilelang/jit/adapter/wrapper.py`, `tilelang/jit/adapter/cython/adapter.py`, `tilelang/jit/adapter/torch/__init__.py`, `tilelang/jit/adapter/torch/metal.py`	Add `is_metal_target`; implement `MetalKernelAdapter` and `TLMetalSourceWrapper`; add macOS linker flag selection; route cython adapter device to MPS; expose Metal adapter via torch package.
Architecture modules `tilelang/carver/arch/__init__.py`, `tilelang/carver/arch/metal.py`, `tilelang/carver/arch/*`	Add `METAL` arch class and `is_metal_arch`; auto-infer Metal via `torch.mps.is_available()`; standardize `__all__` exports across arch modules.
Testing `testing/python/metal/test_metal_codegen.py`, `tilelang/testing/__init__.py`	Add Metal GEMM tests targeting MPS; explicitly expose `requires_metal` and testing symbols via `__all__`.
Runtime utilities, profiler, autotuner, cache, tensor `tilelang/utils/tensor.py`, `tilelang/utils/device.py`, `tilelang/profiler/bench.py`, `tilelang/autotuner/tuner.py`, `tilelang/cache/kernel_cache.py`	Introduce device helper usage; generalize profiler/timing for CUDA/MPS; ensure per-device compile context in autotuner; broaden kernel_cache disk-skip to include `"torch"`; tensor utilities use `get_current_device()`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant JIT as tilelang.jit
  participant WR as TL Wrapper
  participant LWR as Lowering
  participant TVM as TVM (target.build.metal)
  participant AD as MetalKernelAdapter
  participant MPS as torch.mps runtime

  Dev->>JIT: jit(func, target="metal", execution_backend="torch")
  JIT->>JIT: validate is_metal_target -> require backend=="torch"
  JIT->>WR: wrap(scheduled_ir, source, target=metal)
  WR-->>JIT: TLMetalSourceWrapper
  JIT->>LWR: device_codegen_without_compile(target=metal)
  LWR->>TVM: call target.build.metal(device_mod, target)
  TVM-->>JIT: device_mod + kernel_source
  JIT->>AD: create MetalKernelAdapter(func_or_mod, device_mod, kernel_source)
  Dev->>AD: call kernel(...)
  AD->>AD: on first call: torch.mps.compile_shader(kernel_source)
  AD->>MPS: launch compiled kernel with threads/group_size
  MPS-->>Dev: result
  note over AD: compiled launcher cached for subsequent calls

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

enhancement

Suggested reviewers

tzj-fxz

Poem

I thump my paw on macOS ground,
New Metal paths and kernels found.
Torch compiles shaders, threads align,
MPS hops swift — the carrots shine. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “[Backend] Add metal backend” succinctly and accurately summarizes the primary change of integrating a Metal backend into the project, directly reflecting the main focus of the extensive modifications without introducing extraneous details. It is concise, clear, and enables teammates to quickly understand the core purpose of the pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19fd7dc and 2451073.

📒 Files selected for processing (1)

tilelang/profiler/bench.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tilelang/profiler/bench.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-metal
GitHub Check: build-test-amd
GitHub Check: format-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-10T09:44:14Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

oraluben · 2025-09-12T03:50:10Z

While there're tasks remain, the main implementation is ready for review @LeiWang1999

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (14)

tilelang/carver/arch/cdna.py (1)

35-38: Sort all (RUF022).

Keep exports alphabetized for consistency and to satisfy Ruff.

Apply:
-__all__ = [
-    'is_cdna_arch',
-    'CDNA',
-]
+__all__ = [
+    'CDNA',
+    'is_cdna_arch',
+]

tilelang/carver/arch/cpu.py (1)

23-26: Sort all (RUF022).

Align with other arch modules.

-__all__ = [
-    'is_cpu_arch',
-    'CPU',
-]
+__all__ = [
+    'CPU',
+    'is_cpu_arch',
+]

tilelang/contrib/cc.py (1)

93-95: Add typing/docstring to is_mac().

Small polish; helps usage elsewhere.

-def is_mac():
-    return platform.system() == 'Darwin'
+def is_mac() -> bool:
+    "True if running on macOS (Darwin)."
+    return platform.system() == "Darwin"

tilelang/carver/arch/metal.py (1)

12-15: Sort all (RUF022).

Match the pattern used in other arch modules.
-__all__ = [
-    'is_metal_arch',
-    'METAL',
-]
+__all__ = [
+    'METAL',
+    'is_metal_arch',
+]

tilelang/carver/arch/cuda.py (1)

150-159: Sort all (RUF022).

Keep exports alphabetized.

-__all__ = [
-    'is_cuda_arch',
-    'is_volta_arch',
-    'is_ampere_arch',
-    'is_ada_arch',
-    'is_hopper_arch',
-    'is_tensorcore_supported_precision',
-    'has_mma_support',
-    "CUDA",
-]
+__all__ = [
+    "CUDA",
+    'has_mma_support',
+    'is_ada_arch',
+    'is_ampere_arch',
+    'is_cuda_arch',
+    'is_hopper_arch',
+    'is_tensorcore_supported_precision',
+    'is_volta_arch',
+]

tilelang/utils/target.py (3)

1-1: Import platform.machine() for fallback.

Some Python builds return empty arch from mac_ver(); fall back to machine().

-from platform import mac_ver
+from platform import mac_ver, machine

46-52: Harden Metal detection.

Use machine() as a fallback when mac_ver()[2] is empty.

-def check_metal_availability() -> bool:
-    mac_release, _, arch = mac_ver()
-    if not mac_release:
-        return False
-    # todo: check torch version?
-    return arch == 'arm64'
+def check_metal_availability() -> bool:
+    mac_release, _, arch = mac_ver()
+    if not mac_release:
+        return False
+    # Fallback for environments where mac_ver() doesn't report machine.
+    arch = arch or machine()
+    return arch == "arm64"

84-87: Error message: say “Metal” (and keep it short per TRY003).

Also consider updating determine_target docstring to mention Metal.

-        else:
-            raise ValueError("No CUDA or HIP or MPS available on this system.")
+        else:
+            raise ValueError("No CUDA, HIP, or Metal available on this system.")

Docstring tweak (outside hunk):

"""
Determine the appropriate target for compilation (CUDA, HIP, Metal, or manual selection).
"""

tilelang/engine/lower.py (1)

184-186: Parity: add Metal branch to device_codegen as well (or explicitly gate).

device_codegen_without_compile supports "metal", but device_codegen lacks a corresponding branch; enabling device compile for Metal will currently raise.

Proposed addition (outside this hunk) in device_codegen():
elif target.kind.name == "metal":
    device_mod = tvm.ffi.get_global_func("target.build.metal")(device_mod, target)

install_metal.sh (1)

20-22: Consider more conservative parallelism settings

Using half the logical CPU cores might be too aggressive for some systems. Consider using a safer default or allowing configuration via environment variable.
-CORES=$(sysctl -n hw.logicalcpu)
-MAKE_JOBS=$(( CORES / 2 ))
-make -j${MAKE_JOBS}
+CORES=$(sysctl -n hw.logicalcpu)
+MAKE_JOBS=${MAKE_JOBS:-$(( CORES / 2 ))}
+echo "Building with ${MAKE_JOBS} parallel jobs"
+make -j${MAKE_JOBS}

tilelang/carver/arch/__init__.py (1)

40-55: Sort all list for better maintainability

The static analysis correctly identified that all should be sorted for consistency and easier maintenance.

 __all__ = [
-    'is_cpu_arch',
-    'is_cuda_arch',
-    'is_volta_arch',
-    'is_ampere_arch',
-    'is_ada_arch',
-    'is_hopper_arch',
-    'is_tensorcore_supported_precision',
-    'has_mma_support',
-    'is_cdna_arch',
-    'is_metal_arch',
+    'CDNA',
+    'CPU',
     'CUDA',
-    'CDNA',
     'METAL',
-    'CPU',
+    'has_mma_support',
+    'is_ada_arch',
+    'is_ampere_arch',
+    'is_cdna_arch',
+    'is_cpu_arch',
+    'is_cuda_arch',
+    'is_hopper_arch',
+    'is_metal_arch',
+    'is_tensorcore_supported_precision',
+    'is_volta_arch',
 ]

tilelang/jit/adapter/wrapper.py (1)

1008-1028: Consider adding docstring and error handling.

The new TLMetalSourceWrapper class lacks documentation and error handling. Consider adding:

A class docstring explaining its purpose
Type hints validation
Error handling for Metal-specific operations

 class TLMetalSourceWrapper(object):
+    """
+    A wrapper class for the TileLang Metal backend.
+    
+    This wrapper handles Metal shader code generation for Apple Silicon GPUs.
+    Unlike CUDA/HIP wrappers, it performs minimal processing as Metal compilation
+    is handled downstream by PyTorch MPS.
+    """
 
     def __init__(self,
                  scheduled_ir_module: IRModule,
                  source: str,
                  target: Target,
                  device_mod: Optional[IRModule] = None,
                  host_mod: Optional[IRModule] = None,
                  pass_configs: Optional[Dict[str, Any]] = None):
+        if not is_metal_target(target):
+            raise ValueError(f"Expected Metal target, got {target.kind.name}")
         self.mod = scheduled_ir_module
         self.target = target
         self.source = source
         self.pass_configs = pass_configs
         self.device_mod = device_mod
         self.host_mod = host_mod
         self.lib_code = self.update_lib_code(source)
 
     def update_lib_code(self, code: str):
+        """Update and return the library code without modification.
+        
+        Metal shaders don't require the host function wrapping that
+        CUDA/HIP backends need.
+        """
         self.lib_code = code
         return self.lib_code

tilelang/jit/adapter/torch/metal.py (1)

14-26: Clean up commented parameters or document why they're excluded.

The constructor has many commented-out parameters. Either remove them entirely or add a comment explaining why they're excluded from the Metal implementation.

     def __init__(
         self,
-        #  params: List[KernelParam],
-        #  result_idx: List[int],
-        #  target: Union[str, Target],
         func_or_mod: Union[tir.PrimFunc, tvm.IRModule],
-        #  host_mod: Optional[tvm.IRModule] = None,
         device_mod: Optional[tvm.IRModule] = None,
         kernel_global_source: Optional[str] = None,
         verbose: bool = False,
-        #  pass_configs: Optional[Dict[str, Any]] = None,
-        #  compile_flags: Optional[List[str]] = None
     ):
+        """Initialize MetalKernelAdapter for PyTorch MPS execution.
+        
+        Note: Unlike other adapters, Metal doesn't require params, result_idx,
+        or compilation flags as these are handled by PyTorch MPS.
+        
+        Args:
+            func_or_mod: The function or module to compile
+            device_mod: Device module containing kernel metadata
+            kernel_global_source: Metal shader source code
+            verbose: Enable verbose logging
+        """

testing/python/metal/test_metal_codegen.py (1)

70-72: Consider more appropriate tolerance for float16.

An absolute tolerance of 1.0 for float16 seems too permissive for GEMM accuracy testing. Consider using a relative tolerance or a smaller absolute tolerance.
 def test_gemm_float16():
-    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1)
+    # Use relative tolerance for float16 to better handle different magnitude results
+    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1e-2)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 409ab83 and 0eba92f6ad86fca30149d937543b01069784eafd.

📒 Files selected for processing (19)

CMakeLists.txt (3 hunks)
install_metal.sh (1 hunks)
testing/python/metal/test_metal_codegen.py (1 hunks)
tilelang/cache/kernel_cache.py (1 hunks)
tilelang/carver/arch/__init__.py (3 hunks)
tilelang/carver/arch/cdna.py (1 hunks)
tilelang/carver/arch/cpu.py (1 hunks)
tilelang/carver/arch/cuda.py (1 hunks)
tilelang/carver/arch/metal.py (1 hunks)
tilelang/contrib/cc.py (2 hunks)
tilelang/engine/lower.py (1 hunks)
tilelang/jit/__init__.py (3 hunks)
tilelang/jit/adapter/__init__.py (1 hunks)
tilelang/jit/adapter/cython/adapter.py (3 hunks)
tilelang/jit/adapter/torch/metal.py (1 hunks)
tilelang/jit/adapter/utils.py (1 hunks)
tilelang/jit/adapter/wrapper.py (3 hunks)
tilelang/jit/kernel.py (4 hunks)
tilelang/utils/target.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (10)

tilelang/jit/adapter/utils.py (1)

tilelang/language/ast/ir.py (1)

target (1682-1713)

tilelang/carver/arch/metal.py (2)

tilelang/carver/arch/arch_base.py (1)

TileDevice (4-37)

tilelang/carver/template/base.py (1)

arch (152-159)

tilelang/jit/adapter/cython/adapter.py (3)

tilelang/jit/adapter/utils.py (4)

is_cuda_target (51-52)

is_hip_target (55-56)

is_cpu_target (59-60)

is_metal_target (63-64)

tilelang/utils/target.py (1)

determine_target (54-96)

tilelang/contrib/cc.py (2)

get_cplus_compiler (68-90)

is_mac (93-94)

tilelang/jit/adapter/__init__.py (1)

tilelang/jit/adapter/torch/metal.py (1)

MetalKernelAdapter (12-67)

tilelang/jit/kernel.py (3)

tilelang/jit/__init__.py (1)

jit (241-314)

tilelang/jit/adapter/utils.py (1)

is_metal_target (63-64)

tilelang/jit/adapter/torch/metal.py (1)

MetalKernelAdapter (12-67)

tilelang/carver/arch/__init__.py (1)

tilelang/carver/arch/metal.py (1)

METAL (8-9)

tilelang/jit/__init__.py (2)

tilelang/jit/adapter/utils.py (1)

is_metal_target (63-64)

tilelang/language/ast/ir.py (1)

target (1682-1713)

testing/python/metal/test_metal_codegen.py (6)

tilelang/jit/__init__.py (1)

jit (241-314)

tilelang/language/allocate.py (2)

alloc_shared (21-36)

alloc_fragment (53-64)

tilelang/language/fill.py (1)

clear (24-48)

tilelang/language/pipeline.py (1)

Pipelined (9-46)

tilelang/language/copy.py (1)

copy (84-152)

tilelang/jit/kernel.py (1)

kernel_source (470-471)

tilelang/jit/adapter/torch/metal.py (2)

tilelang/jit/adapter/base.py (2)

BaseKernelAdapter (8-55)

_post_init (54-55)

tilelang/jit/adapter/nvrtc/adapter.py (1)

_convert_torch_func (240-241)

tilelang/jit/adapter/wrapper.py (4)

tilelang/jit/adapter/utils.py (1)

is_metal_target (63-64)

tilelang/jit/adapter/cython/adapter.py (1)

lib_code (511-513)

tilelang/jit/adapter/ctypes/adapter.py (1)

lib_code (281-283)

tilelang/jit/adapter/libgen.py (1)

update_lib_code (53-54)

🪛 Ruff (0.12.2)

tilelang/carver/arch/cpu.py

23-26: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/carver/arch/cuda.py

150-159: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/utils/target.py

87-87: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/carver/arch/metal.py

12-15: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/carver/arch/cdna.py

35-38: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/jit/adapter/__init__.py

5-5: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

6-6: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tilelang/carver/arch/__init__.py

2-2: from .cuda import * used; unable to detect undefined names

(F403)

3-3: from .cpu import * used; unable to detect undefined names

(F403)

4-4: from .cdna import * used; unable to detect undefined names

(F403)

5-5: from .metal import * used; unable to detect undefined names

(F403)

22-22: METAL may be undefined, or defined from star imports

(F405)

40-55: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

41-41: is_cpu_arch may be undefined, or defined from star imports

(F405)

42-42: is_cuda_arch may be undefined, or defined from star imports

(F405)

43-43: is_volta_arch may be undefined, or defined from star imports

(F405)

44-44: is_ampere_arch may be undefined, or defined from star imports

(F405)

45-45: is_ada_arch may be undefined, or defined from star imports

(F405)

46-46: is_hopper_arch may be undefined, or defined from star imports

(F405)

47-47: is_tensorcore_supported_precision may be undefined, or defined from star imports

(F405)

48-48: has_mma_support may be undefined, or defined from star imports

(F405)

49-49: is_cdna_arch may be undefined, or defined from star imports

(F405)

50-50: is_metal_arch may be undefined, or defined from star imports

(F405)

51-51: CUDA may be undefined, or defined from star imports

(F405)

52-52: CDNA may be undefined, or defined from star imports

(F405)

53-53: METAL may be undefined, or defined from star imports

(F405)

54-54: CPU may be undefined, or defined from star imports

(F405)

tilelang/jit/adapter/torch/metal.py

44-44: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (9)

tilelang/carver/arch/metal.py (1)
8-9: Confirm TileDevice API expectations for METAL.

METAL is currently a stub. If any path calls TileDevice.get_avaliable_tensorintrin_shapes(), this will raise NotImplementedError. If that call path can occur, add a trivial override returning [] to avoid surprises.

Example (outside this hunk):
class METAL(TileDevice):
    def get_avaliable_tensorintrin_shapes(self):
        return []
tilelang/utils/target.py (1)

16-16: LGTM: add 'metal' to AVALIABLE_TARGETS.

This aligns the public set with the new Metal path.

tilelang/jit/adapter/__init__.py (1)

1-6: LGTM!

The addition of MetalKernelAdapter to the adapter exports is clean and follows the existing pattern.

tilelang/jit/adapter/utils.py (1)

63-65: LGTM!

The is_metal_target function follows the established pattern and correctly identifies Metal targets.

tilelang/jit/adapter/cython/adapter.py (2)

156-158: LGTM! Platform-specific linker flags handled correctly

The dynamic linker flag selection based on the platform is appropriate for handling undefined symbols across macOS and Linux.

455-456: LGTM! Correct device mapping for Metal target

The Metal target correctly maps to the "mps" device in PyTorch.

tilelang/jit/adapter/wrapper.py (1)

1068-1069: LGTM! Metal target routing is correctly integrated.

The Metal target detection and wrapper selection follows the existing pattern for other backends.

tilelang/jit/adapter/torch/metal.py (1)

61-61: Verify thread calculation logic.

The threads calculation multiplies block and grid dimensions element-wise. This seems unusual - typically threads would be just the block dimensions, and the total work would be block * grid.

Could you verify if this is the correct thread configuration for PyTorch MPS? The typical pattern would be:

threads = block dimensions (threads per group)

group_size = grid dimensions (number of groups)

The current implementation passes threads=[block[i] * grid[i] for i in range(3)] which seems to represent total threads across all groups.

testing/python/metal/test_metal_codegen.py (1)

79-80: Good practice: MPS availability check.

The check for MPS availability before running tests is appropriate for Metal backend testing.

coderabbitai · 2025-09-12T04:04:26Z

CMakeLists.txt

+  find_library(TVM_LIBRARY_LOCATION
+    NAMES tvm
+    HINTS "${TVM_PREBUILD_PATH}"
+  )
  set_target_properties(tvm PROPERTIES
-    IMPORTED_LOCATION "${TVM_PREBUILD_PATH}/libtvm.so"
+    IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}"
    INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
  )


⚠️ Potential issue

Add error handling for missing TVM libraries

The find_library commands should check if the libraries were actually found before proceeding.

find_library(TVM_LIBRARY_LOCATION NAMES tvm HINTS "${TVM_PREBUILD_PATH}" ) +if(NOT TVM_LIBRARY_LOCATION) + message(FATAL_ERROR "Could not find TVM library in ${TVM_PREBUILD_PATH}") +endif() set_target_properties(tvm PROPERTIES IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}" INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

find_library(TVM_LIBRARY_LOCATION

NAMES tvm

HINTS "${TVM_PREBUILD_PATH}"

)

set_target_properties(tvm PROPERTIES

IMPORTED_LOCATION "${TVM_PREBUILD_PATH}/libtvm.so"

IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}"

INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"

)

find_library(TVM_LIBRARY_LOCATION

NAMES tvm

HINTS "${TVM_PREBUILD_PATH}"

)

if(NOT TVM_LIBRARY_LOCATION)

message(FATAL_ERROR "Could not find TVM library in ${TVM_PREBUILD_PATH}")

endif()

set_target_properties(tvm PROPERTIES

IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}"

INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"

)

🤖 Prompt for AI Agents

In CMakeLists.txt around lines 102 to 109, the find_library call for TVM does not verify that TVM_LIBRARY_LOCATION was found; add a check after find_library such as if(NOT TVM_LIBRARY_LOCATION) message(FATAL_ERROR "Could not find tvm library in ${TVM_PREBUILD_PATH}; please set TVM_PREBUILD_PATH or install tvm") to stop configuration when missing, and only call set_target_properties for the imported tvm target if the library path exists (or create the imported target conditionally); ensure the error message includes the searched path for easier debugging.

coderabbitai · 2025-09-12T04:04:27Z

CMakeLists.txt

+  find_library(TVM_RUNTIME_LIBRARY_LOCATION
+    NAMES tvm_runtime
+    HINTS "${TVM_PREBUILD_PATH}"
+  )
  set_target_properties(tvm_runtime PROPERTIES
-    IMPORTED_LOCATION "${TVM_PREBUILD_PATH}/libtvm_runtime.so"
+    IMPORTED_LOCATION "${TVM_RUNTIME_LIBRARY_LOCATION}"
    INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
  )


⚠️ Potential issue

Add error handling for missing TVM runtime library

Similar to the TVM library, the runtime library location should be validated.

find_library(TVM_RUNTIME_LIBRARY_LOCATION NAMES tvm_runtime HINTS "${TVM_PREBUILD_PATH}" ) +if(NOT TVM_RUNTIME_LIBRARY_LOCATION) + message(FATAL_ERROR "Could not find TVM runtime library in ${TVM_PREBUILD_PATH}") +endif() set_target_properties(tvm_runtime PROPERTIES IMPORTED_LOCATION "${TVM_RUNTIME_LIBRARY_LOCATION}" INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include" )

🤖 Prompt for AI Agents

In CMakeLists.txt around lines 111 to 118, the find_library result for TVM_RUNTIME_LIBRARY_LOCATION is not validated; add the same existence check and failure handling used for the TVM library: after find_library, test if TVM_RUNTIME_LIBRARY_LOCATION is NOTFOUND (or empty) and call message(FATAL_ERROR "...unable to locate tvm_runtime library at ${TVM_PREBUILD_PATH}..." ) so configuration stops with a clear error; only set_target_properties for tvm_runtime and INTERFACE_INCLUDE_DIRECTORIES when the library was successfully found (mirror the pattern used for the TVM library block).

testing/python/metal/test_metal_codegen.py

coderabbitai · 2025-09-12T04:04:27Z

tilelang/cache/kernel_cache.py

+        if execution_backend in ("dlpack", "torch"):
+            self.logger.warning("DLPack or torch backend does not support cache saving to disk.")
        else:


🛠️ Refactor suggestion

Include 'torch' in type hints and clarify log.

Execution backend Literal types in this file exclude 'torch' (class attr and method params). This will trip type-checkers now that you gate disk-cache on 'torch'.

Apply minimal log tweak here:

- if execution_backend in ("dlpack", "torch"): - self.logger.warning("DLPack or torch backend does not support cache saving to disk.") + if execution_backend in ("dlpack", "torch"): + self.logger.warning("Backends 'dlpack' and 'torch' do not support saving cache to disk.")

Also update type hints elsewhere in this file (outside the changed hunk):

# line ~42 execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython" # line ~71 execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython", # line ~117 execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython", # line ~323 execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython",

🤖 Prompt for AI Agents

In tilelang/cache/kernel_cache.py around lines 189-191, the warning message and the execution_backend type hints omit 'torch', causing type-check failures and unclear logs; update the warning to mention "DLPack or torch backend does not support saving cache to disk" (or similar clearer phrasing) and add "torch" to the Literal union type hints at the listed locations: ~42, ~71, ~117, ~323 so those class attributes and method params read Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython" (or the equivalent default), ensuring consistency across the file.

coderabbitai · 2025-09-12T04:04:27Z

tilelang/carver/arch/__init__.py

+from .cuda import *
+from .cpu import *
+from .cdna import *
+from .metal import *


🛠️ Refactor suggestion

Wildcard imports reduce code clarity and maintainability

The wildcard imports make it difficult to track which symbols come from which module and can lead to namespace pollution. Consider explicit imports for better maintainability.

🧰 Tools

🪛 Ruff (0.12.2)

2-2: from .cuda import * used; unable to detect undefined names

(F403)

3-3: from .cpu import * used; unable to detect undefined names

(F403)

4-4: from .cdna import * used; unable to detect undefined names

(F403)

5-5: from .metal import * used; unable to detect undefined names

(F403)

🤖 Prompt for AI Agents

In tilelang/carver/arch/__init__.py around lines 2 to 5, the module currently uses wildcard imports (from .cuda import * etc.), which pollutes the package namespace and obscures which symbols are exported; replace these with explicit imports of the public symbols you need from each submodule (or import the modules themselves and re-export a curated __all__ list) so callers see a clear API and accidental name collisions are avoided. Ensure you list each function/class/constant you want to expose (or import submodules as namespaced attributes) and update or add __all__ to reflect the exported names.

coderabbitai · 2025-09-12T04:04:27Z

tilelang/jit/__init__.py

+    if is_metal_target(target):
+        assert execution_backend == 'torch', 'Currently metal target only support `tl.jit(execution_backend="torch")`'


⚠️ Potential issue

Fix inconsistent backend naming

The assertion expects 'torch' backend but the public API and documentation use 'nvrtc'. This inconsistency is confusing.

The error message should clarify that Metal targets require the "nvrtc" backend (which internally uses torch for Metal):

if is_metal_target(target): - assert execution_backend == 'torch', 'Currently metal target only support `tl.jit(execution_backend="torch")`' + assert execution_backend == 'nvrtc', 'Currently metal target only supports `tl.jit(execution_backend="nvrtc")`'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if is_metal_target(target):

assert execution_backend == 'torch', 'Currently metal target only support `tl.jit(execution_backend="torch")`'

if is_metal_target(target):

assert execution_backend == 'nvrtc', 'Currently metal target only supports `tl.jit(execution_backend="nvrtc")`'

🤖 Prompt for AI Agents

In tilelang/jit/__init__.py around lines 78-79, the assertion currently checks for execution_backend == 'torch' but the public API/doc use 'nvrtc'; change the assertion to require 'nvrtc' instead and update the assertion message to state that Metal targets require the "nvrtc" backend (which internally uses torch) so the message aligns with public API and avoids confusion.

coderabbitai · 2025-09-12T04:04:27Z

tilelang/jit/adapter/torch/metal.py

+        self.kernel_name = func_or_mod.__name__ + '_kernel'
+        self.verbose = verbose


⚠️ Potential issue

Potential issue with kernel name generation.

The kernel name is generated by appending '_kernel' to the function name. This assumes func_or_mod has a __name__ attribute, which may not be true for all IRModule instances.

self.kernel_global_source = kernel_global_source - self.kernel_name = func_or_mod.__name__ + '_kernel' + if hasattr(func_or_mod, '__name__'): + self.kernel_name = func_or_mod.__name__ + '_kernel' + elif hasattr(func_or_mod, 'get_global_vars'): + # Handle IRModule case + global_vars = func_or_mod.get_global_vars() + if global_vars: + self.kernel_name = global_vars[0].name_hint + '_kernel' + else: + raise ValueError("Cannot determine kernel name from IRModule") + else: + raise ValueError(f"Cannot determine kernel name from {type(func_or_mod)}") self.verbose = verbose

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.kernel_name = func_or_mod.__name__ + '_kernel'

self.verbose = verbose

self.kernel_global_source = kernel_global_source

if hasattr(func_or_mod, '__name__'):

self.kernel_name = func_or_mod.__name__ + '_kernel'

elif hasattr(func_or_mod, 'get_global_vars'):

# Handle IRModule case

global_vars = func_or_mod.get_global_vars()

if global_vars:

self.kernel_name = global_vars[0].name_hint + '_kernel'

else:

raise ValueError("Cannot determine kernel name from IRModule")

else:

raise ValueError(f"Cannot determine kernel name from {type(func_or_mod)}")

self.verbose = verbose

🤖 Prompt for AI Agents

In tilelang/jit/adapter/torch/metal.py around lines 28-29, kernel_name is set by blindly using func_or_mod.__name__ which can fail for IRModule or other objects without a __name__; change to safely derive a base name: try getattr(func_or_mod, "__name__", None), then getattr(func_or_mod, "name", None), and if both are missing fall back to a stable fallback such as f"{type(func_or_mod).__name__}_{id(func_or_mod):x}"; sanitize the base to be a valid identifier if needed and then set self.kernel_name = f"{base}_kernel".

coderabbitai · 2025-09-12T04:04:27Z

tilelang/jit/adapter/torch/metal.py

+        for var, func in device_mod.functions.items():
+            assert var.name_hint == self.kernel_name
+            thread_extent = func.attrs['thread_extent']
+            for tag, extent in thread_extent.items():
+                if "threadIdx" in tag:
+                    self.block_info["xyz".index(tag[-1])] = extent
+                elif "blockIdx" in tag:
+                    self.grid_info["xyz".index(tag[-1])] = extent
+            break
+        else:
+            raise AssertionError(f'no kernel with name {func_or_mod.__name__}')
+


⚠️ Potential issue

Fix critical issue: iteration over device module functions doesn't handle multiple functions correctly.

The loop breaks after processing the first function regardless of whether it matches kernel_name. This could silently process the wrong kernel if multiple functions exist.

for var, func in device_mod.functions.items(): - assert var.name_hint == self.kernel_name - thread_extent = func.attrs['thread_extent'] - for tag, extent in thread_extent.items(): - if "threadIdx" in tag: - self.block_info["xyz".index(tag[-1])] = extent - elif "blockIdx" in tag: - self.grid_info["xyz".index(tag[-1])] = extent - break + if var.name_hint == self.kernel_name: + thread_extent = func.attrs.get('thread_extent', {}) + for tag, extent in thread_extent.items(): + if "threadIdx" in tag: + self.block_info["xyz".index(tag[-1])] = extent + elif "blockIdx" in tag: + self.grid_info["xyz".index(tag[-1])] = extent + break else: - raise AssertionError(f'no kernel with name {func_or_mod.__name__}') + raise AssertionError(f'No kernel with name {self.kernel_name} found in device module')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for var, func in device_mod.functions.items():

assert var.name_hint == self.kernel_name

thread_extent = func.attrs['thread_extent']

for tag, extent in thread_extent.items():

if "threadIdx" in tag:

self.block_info["xyz".index(tag[-1])] = extent

elif "blockIdx" in tag:

self.grid_info["xyz".index(tag[-1])] = extent

break

else:

raise AssertionError(f'no kernel with name {func_or_mod.__name__}')

for var, func in device_mod.functions.items():

if var.name_hint == self.kernel_name:

thread_extent = func.attrs.get('thread_extent', {})

for tag, extent in thread_extent.items():

if "threadIdx" in tag:

self.block_info["xyz".index(tag[-1])] = extent

elif "blockIdx" in tag:

self.grid_info["xyz".index(tag[-1])] = extent

break

else:

raise AssertionError(f'No kernel with name {self.kernel_name} found in device module')

🧰 Tools

🪛 Ruff (0.12.2)

44-44: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In tilelang/jit/adapter/torch/metal.py around lines 34 to 45, the loop over device_mod.functions currently asserts the first function has the desired kernel_name and breaks unconditionally, which can process the wrong function when multiple functions exist; change the logic to search for a function whose var.name_hint equals self.kernel_name (don't assert immediately), extract its thread_extent and populate block_info/grid_info, break when found, and after the loop raise an AssertionError if no matching kernel was found (use self.kernel_name in the error message) so only the correct kernel is processed.

coderabbitai · 2025-09-12T04:04:28Z

tilelang/jit/adapter/torch/metal.py

+
+        if self._kernel is None:
+
+            _kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)


🛠️ Refactor suggestion

Add error handling for shader compilation.

The shader compilation could fail but there's no error handling. Also, the compiled shader might not contain the expected kernel.

- _kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name) + try: + compiled_shader = torch.mps.compile_shader(self.kernel_global_source) + if not hasattr(compiled_shader, self.kernel_name): + raise AttributeError(f"Compiled shader does not contain kernel '{self.kernel_name}'") + _kernel = getattr(compiled_shader, self.kernel_name) + except Exception as e: + raise RuntimeError(f"Failed to compile Metal shader: {e}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)

try:

compiled_shader = torch.mps.compile_shader(self.kernel_global_source)

if not hasattr(compiled_shader, self.kernel_name):

raise AttributeError(f"Compiled shader does not contain kernel '{self.kernel_name}'")

_kernel = getattr(compiled_shader, self.kernel_name)

except Exception as e:

raise RuntimeError(f"Failed to compile Metal shader: {e}")

🤖 Prompt for AI Agents

In tilelang/jit/adapter/torch/metal.py around line 54, wrap the call to torch.mps.compile_shader and subsequent attribute access in a try/except and validate the returned object contains the expected kernel; specifically, catch exceptions from compile_shader, raise or log a clear error that includes the original exception and the kernel name/source context, and after compilation check with hasattr or explicit lookup that the compiled object exposes self.kernel_name, raising a descriptive error if missing so callers know compilation succeeded but the kernel was not found.

tilelang/jit/kernel.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tilelang/jit/adapter/__init__.py (1)
5-6: Remove unused # noqa: F401 (RUF100) on new exports.

Ruff flags these as unused because F401 isn’t enabled; just drop the directives.
-from .nvrtc import NVRTCKernelAdapter  # noqa: F401
-from .torch import MetalKernelAdapter  # noqa: F401
+from .nvrtc import NVRTCKernelAdapter
+from .torch import MetalKernelAdapter
Optional: make exports explicit to avoid needing noqa anywhere and document the public API.
# add near the top of this module
__all__ = [
    "BaseKernelAdapter",
    "TorchDLPackKernelAdapter",
    "CtypesKernelAdapter",
    "CythonKernelAdapter",
    "NVRTCKernelAdapter",
    "MetalKernelAdapter",
]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0eba92f6ad86fca30149d937543b01069784eafd and 7514cf545a9b9ec3d999fc1b1f6bd098397d19e1.

📒 Files selected for processing (2)

tilelang/jit/adapter/__init__.py (1 hunks)
tilelang/jit/adapter/torch/__init__.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tilelang/jit/adapter/torch/init.py

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/jit/adapter/__init__.py (2)

tilelang/jit/adapter/nvrtc/adapter.py (1)

NVRTCKernelAdapter (30-246)

tilelang/jit/adapter/torch/metal.py (1)

MetalKernelAdapter (12-67)

🪛 Ruff (0.12.2)

tilelang/jit/adapter/__init__.py

5-5: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

6-6: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: bot-task

@LeiWang1999

Removed external library DLL names for macOS. found during tile-ai/tilelang#799 cc @LeiWang1999

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

setup.py (3)

345-347: Wheel may miss runtime libs: include tilelang/lib/.so|.dylib in package_data.

Without this, copied libs might not land in wheels/sdists.

 package_data = {
-    "tilelang": ["py.typed", "*pyx"],
+    "tilelang": ["py.typed", "*pyx", "lib/*.so", "lib/*.dylib"],
 }

805-807: Do not pass CMAKE_CUDA_COMPILER unless CUDA is enabled.

On Metal-only/mac builds, this inserts a bogus nvcc path and breaks configure.

-        if not USE_ROCM:
-            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")
+        if USE_CUDA and CUDA_HOME:
+            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")

826-839: Respect USE_CUDA toggle in config.cmake generation.

Current logic enables CUDA whenever CUDA_HOME is set, even if USE_CUDA=False.

-        elif CUDA_HOME:
+        elif USE_CUDA and CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]
+        else:
+            content_lines += [
+                "set(USE_CUDA OFF)",
+                "set(USE_ROCM OFF)",
+            ]

🧹 Nitpick comments (9)

setup.py (9)

48-50: Harden Metal detection (Rosetta/process-arch pitfalls).

Use Darwin + machine for reliability.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+IS_DARWIN = platform.system() == 'Darwin'
+MAYBE_METAL = IS_DARWIN and platform.machine() == 'arm64'

52-61: Defaulting USE_CUDA=True on Linux is user-hostile (raises when CUDA_HOME missing).

Consider defaulting to False or detecting from CUDA_HOME post-import to avoid hard failures on non-GPU Linux dev boxes.

If you want a minimal tweak now, set default False:

-USE_CUDA = _read_bool_env("USE_CUDA", IS_LINUX and not USE_ROCM)
+USE_CUDA = _read_bool_env("USE_CUDA", False)

99-103: Tighten exception text (Ruff TRY003) and clarity.

Shorten the message; keep actionable content.

-    raise ValueError(
-        "CUDA support is enabled by default on linux if `USE_ROCM=False`," \
-        " but CUDA_HOME is not set or detected.")
+    raise ValueError("USE_CUDA=True but CUDA_HOME is not set or detected.")

105-108: Same as above: shorten/standardize error message.

-    raise ValueError(
-        "Failed to automatically detect CUDA or ROCM installation. Please set the CUDA_HOME or ROCM_HOME environment variable manually (e.g., export CUDA_HOME=/usr/local/cuda or export ROCM_HOME=/opt/rocm)."
-    )
+    raise ValueError("Neither CUDA_HOME nor ROCM_HOME detected on Linux; set one explicitly.")

482-489: Guard dSYM deletion.

Use isdir check to avoid accidental rmtree on files with matching names.

-            for dir in potential_dirs:
-                candidate = os.path.join(dir, item)
-                if os.path.exists(candidate):
-                    shutil.rmtree(candidate)
-                    break
+            for d in potential_dirs:
+                candidate = os.path.join(d, item)
+                if os.path.isdir(candidate):
+                    shutil.rmtree(candidate)
+                    break

664-664: Copy-from-glob likely no-ops in non-inplace builds.

CMake outputs to extdir/build_temp, not CWD. Either search extdir/build_temp or drop this block (TileLangBuilPydCommand already handles copying).

270-273: Handle missing CMake gracefully (None from shutil.which).

os.path.exists(None) raises TypeError; raise a clear FileNotFoundError instead.

-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
-        raise Exception("CMake is not installed, please install it first.")
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
+        raise FileNotFoundError("CMake is not installed or not on PATH.")

149-168: Make ROCm version detection robust if rocminfo is missing.

Without try/except, subprocess errors abort the build even when the cmake file can be parsed.

-def get_rocm_version():
+def get_rocm_version():
     """Get the ROCM version from rocminfo."""
-    rocm_output = subprocess.check_output(["rocminfo"], universal_newlines=True)
-    # Parse ROCM version from output
-    # Example output: ROCM version: x.y.z-...
-    match = re.search(r'ROCm Version: (\d+\.\d+\.\d+)', rocm_output)
-    if match:
-        return Version(match.group(1))
-    else:
+    try:
+        rocm_output = subprocess.check_output(["rocminfo"], universal_newlines=True)
+        match = re.search(r'ROCm Version: (\d+\.\d+\.\d+)', rocm_output)
+        if match:
+            return Version(match.group(1))
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        pass
     rocm_path = os.environ.get("ROCM_PATH", "/opt/rocm")
     rocm_version_file = os.path.join(rocm_path, "lib", "cmake", "rocm",
                                      "rocm-config-version.cmake")
     if os.path.exists(rocm_version_file):
         with open(rocm_version_file, "r") as f:
             content = f.read()
             match = re.search(r'set\(PACKAGE_VERSION "(\d+\.\d+\.\d+)"', content)
             if match:
                 return Version(match.group(1))
     # return a default
     return Version("5.0.0")

882-895: Update PyPI metadata for Metal/macOS builds.

Expose macOS in platforms/classifiers when USE_METAL is on; optionally add/remove CUDA/ROCm env classifiers based on toggles.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7514cf545a9b9ec3d999fc1b1f6bd098397d19e1 and e1e18873a51c22fce0d9495e989c7253e871f99f.

📒 Files selected for processing (1)

setup.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

setup.py (1)

tilelang/env.py (1)

get (136-139)

🪛 Ruff (0.12.2)

setup.py

100-102: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: bot-task

🔇 Additional comments (1)

setup.py (1)

390-392: Linux-only rpath patch exit is correct.

Early-return avoids patchelf on macOS. LGTM.

setup.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

setup.py (3)
808-811: Fix unconditional CUDA compiler flag (breaks Metal/macOS and CUDA-less builds).

-DCMAKE_CUDA_COMPILER is appended whenever ROCm is disabled, even if CUDA is not requested or CUDA_HOME is unset. On macOS/Metal this can crash with a TypeError (joining path with None) and on Linux CPU/ROCm builds it forces CUDA.

Apply:
-        if not USE_ROCM:
-            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")
+        # Only set CUDA compiler when CUDA is explicitly enabled and detected
+        if USE_CUDA and CUDA_HOME:
+            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")
667-676: Editable install: also copy .so on macOS.

Python C-extensions on macOS commonly end with .so. Current logic only grabs .dylib on macOS, causing missing libs in editable mode.
-        files = glob.glob("*.so" if IS_LINUX else "*.dylib")
+        files = sorted(set(glob.glob("*.so") + glob.glob("*.dylib")))
269-277: Handle missing CMake path; avoid TypeError on os.path.exists(None).

shutil.which("cmake") may return None; calling os.path.exists(None) raises a TypeError.
 def get_cmake_path() -> str:
@@
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
         raise Exception("CMake is not installed, please install it first.")
     return cmake_path

♻️ Duplicate comments (1)

setup.py (1)

36-44: Boolean env parsing still brittle (whitespace/yes-no not handled; empty string coerced to False).

You addressed the earlier issue but missed .strip() and common synonyms, and you now treat an empty string as False rather than “unset → default”.

Apply:

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ("1", "true", "on", "yes", "y"):
+        return True
+    if s in ("0", "false", "off", "no", "n"):
+        return False
+    return default

🧹 Nitpick comments (4)

setup.py (4)
51-53: Make Metal default detection more robust.

Use explicit Darwin + machine check; platform.mac_ver()[2] is brittle and can vary.
-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+MAYBE_METAL = (platform.system() == 'Darwin' and platform.machine() == 'arm64')
829-843: When enabling Metal, explicitly disable CUDA/ROCm in config.cmake for clarity.

Prevents accidental dual-backend config if upstream defaults change.
-        if USE_METAL:
-            content_lines += [
-                "set(USE_METAL ON)",
-            ]
+        if USE_METAL:
+            content_lines += [
+                "set(USE_METAL ON)",
+                "set(USE_CUDA OFF)",
+                "set(USE_ROCM OFF)",
+            ]
893-898: Reflect macOS support in Trove classifiers.

Now that Metal/macOS is supported, advertise it.
         "Programming Language :: Python :: 3.8",
         "License :: OSI Approved :: MIT License",
-        "Operating System :: OS Independent",
+        "Operating System :: OS Independent",
+        "Operating System :: MacOS :: MacOS X",
         "Intended Audience :: Developers",
         "Intended Audience :: Science/Research",
102-111: Minor: tighten error messages and conditions (style).

Long exception messages trigger Ruff TRY003; consider shorter texts or helper constants. Also consider allowing Linux CPU-only builds in the future by gating the “require CUDA/ROCM” check behind an explicit flag.

If you want, I can open a follow-up to align linter expectations and make the Linux GPU requirement opt-in via an env (e.g., REQUIRE_GPU=1).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1e18873a51c22fce0d9495e989c7253e871f99f and cc91c3e22afffbb80f1dbec199fd55d91748adc5.

📒 Files selected for processing (1)

setup.py (6 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: bot-task

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

tilelang/carver/arch/metal.py (1)
17-20: Sort all to satisfy Ruff RUF022.
 __all__ = [
-    'is_metal_arch',
-    'METAL',
+    'METAL',
+    'is_metal_arch',
 ]
.github/workflows/metal_ci.yml (2)
12-14: Tighten permissions and fix trailing space.

contents: write isn’t needed; also removes the trailing space flagged by yamllint.
-    permissions:
-      contents: write 
+    permissions:
+      contents: read
96-97: Avoid double installation of the project.

You install in “Ensure venv” and again in “Install project.” Keep one to reduce time and flakiness.
-          pip install .
-          touch "$MARKER"
+          touch "$MARKER"
(Keep the dedicated “Install project” step.)

Also applies to: 100-104

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc91c3e22afffbb80f1dbec199fd55d91748adc5 and 12ca66077563253c09d1530c948b9fcf28cbbdd1.

📒 Files selected for processing (2)

.github/workflows/metal_ci.yml (1 hunks)
tilelang/carver/arch/metal.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/carver/arch/metal.py (2)

tilelang/language/ast/ir.py (1)

target (1682-1713)

tilelang/carver/arch/arch_base.py (1)

TileDevice (4-37)

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)

tilelang/carver/arch/metal.py

17-20: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: format-check
GitHub Check: bot-task

🔇 Additional comments (2)

tilelang/carver/arch/metal.py (1)

6-7: Helper reads well.

is_metal_arch is straightforward and correct.
.github/workflows/metal_ci.yml (1)
1-1: Rename workflow and use Metal CMake flag.
CMakeLists.txt contains a USE_METAL guard — switch -DUSE_ROCM=ON to -DUSE_METAL=ON and rename the workflow.
-name: CI Test on AMD
+name: Metal CI
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_ROCM=ON; cd ..
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..

.github/workflows/metal_ci.yml

coderabbitai · 2025-09-12T06:53:58Z

tilelang/carver/arch/metal.py

+    def __init__(self, target: Target | str):
+        if isinstance(target, str):
+            target = Target(target)
+        self.target = target
+


⚠️ Potential issue

Call the base initializer to populate TileDevice fields.

Without super().__init__(), METAL instances won't have the base attributes (reg_cap, smem_cap, etc.), which can break consumers expecting them.

Apply:

class METAL(TileDevice): def __init__(self, target: Target | str): + super().__init__() if isinstance(target, str): target = Target(target) self.target = target

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __init__(self, target: Target | str):

if isinstance(target, str):

target = Target(target)

self.target = target

def __init__(self, target: Target | str):

super().__init__()

if isinstance(target, str):

target = Target(target)

self.target = target

🤖 Prompt for AI Agents

In tilelang/carver/arch/metal.py around lines 11 to 15, the METAL class __init__ does not call the base class initializer so base TileDevice fields (e.g., reg_cap, smem_cap) are not populated; update the constructor to call super().__init__(target) (after converting a string target to Target if needed) so the parent initializer runs and base attributes are set, ensuring the instance has the expected fields.

@LeiWang1999

Removed external library DLL names for macOS. found during tile-ai/tilelang#799 cc @LeiWang1999

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (8)

tilelang/carver/arch/metal.py (1)

11-14: Call TileDevice.init to populate base fields (duplicate of prior feedback).

Without super().__init__(), instances miss required attributes (reg_cap, smem_cap, etc.), leading to runtime errors when accessed.

Apply:
 class METAL(TileDevice):
 
     def __init__(self, target: Target | str):
+        super().__init__()
         if isinstance(target, str):
             target = Target(target)
         self.target = target

.github/workflows/metal_ci.yml (7)

21-25: Update setup-python to v5 (v2 is deprecated on GH runners).

actionlint flags v2 as too old; move to v5.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

29-31: Fix hashing on macOS: use shasum -a 256.

sha256sum isn’t available on macOS. Use shasum.

-        REQS_HASH=$(sha256sum requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
         MARKER="${{ runner.tool_cache }}/.venv_marker_${{ env.PYTHON_VERSION }}_${REQS_HASH:0:8}"

40-43: Remove invalid pip flag --no-user.

pip doesn’t support --no-user here (even in venvs).

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

75-79: Update setup-python to v5 here as well.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

83-85: Use a stable requirements hash (shasum) instead of cat.

cat doesn’t produce a fixed-length digest and weakens cache keys.

-        REQS_HASH=$(cat requirements-test.txt 2>/dev/null || true)
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
         MARKER="${{ runner.tool_cache }}/.venv_marker_${{ env.PYTHON_VERSION }}_${REQS_HASH:0:8}"

93-97: Remove invalid --no-user and avoid double install.

Fix flags; also consider skipping the early install since you reinstall later.

           source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
           [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
-          pip install . --no-user
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
+          # Defer project install to the dedicated step below

100-104: Remove --no-user in the wheel install step.

-        pip install . --no-user -v
+        pip install . -v

🧹 Nitpick comments (6)

tilelang/carver/arch/metal.py (3)
17-20: Sort all to satisfy Ruff RUF022.
 __all__ = [
-    'is_metal_arch',
-    'METAL',
+    'METAL',
+    'is_metal_arch',
 ]
11-14: Optionally validate Target kind is metal.

Prevents accidental misuse with a non-Metal TVM target.
     def __init__(self, target: Target | str):
         super().__init__()
         if isinstance(target, str):
             target = Target(target)
+        # Ensure we were given a Metal target
+        if getattr(getattr(target, "kind", None), "name", None) != "metal":
+            raise ValueError(f"METAL requires a TVM Target(kind='metal'), got: {getattr(getattr(target, 'kind', None), 'name', None)}")
         self.target = target
Please confirm the TVM version exposes target.kind.name; if not, we can adapt the check.

9-15: Confirm whether get_avaliable_tensorintrin_shapes is required for Metal paths.

If invoked downstream, not overriding it will raise NotImplementedError from the base class.

Proposed minimal safe stub:
 class METAL(TileDevice):
 
     def __init__(self, target: Target | str):
         super().__init__()
         if isinstance(target, str):
             target = Target(target)
         self.target = target
+
+    def get_avaliable_tensorintrin_shapes(self):
+        # Metal/MPS: no tensor intrinsics exposed yet
+        return []
.github/workflows/metal_ci.yml (3)
12-14: Trim trailing space and drop unnecessary write permission.

Set minimal perms and remove the trailing space flagged by yamllint.
     permissions:
-      contents: write 
+      contents: read
10-10: Pin runner to Apple Silicon if you require MPS/Metal.

macos-latest can drift; pin to macos-14 (Apple Silicon) to ensure MPS availability.
-    runs-on: [macos-latest]
+    runs-on: macos-14
(Apply to both jobs.)

Also applies to: 63-63

26-45: Rethink venv “persistence”; use actions/cache instead.

runner.tool_cache doesn’t persist across jobs/runs on hosted macOS. Use actions/cache keyed by Python and requirements hash.

Example step to add before “Ensure venv”:
- name: Cache venv
  uses: actions/cache@v4
  with:
    path: ${{ runner.tool_cache }}/${{ env.VENV_DIR }}
    key: venv-${{ runner.os }}-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt', 'pyproject.toml', 'setup.cfg', 'setup.py') }}
Then create venv if missing and install deps; drop the custom MARKER logic.

Also applies to: 80-98

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12ca66077563253c09d1530c948b9fcf28cbbdd1 and d546487de7e3a2031c9106e89210669f666bbccd.

📒 Files selected for processing (2)

.github/workflows/metal_ci.yml (1 hunks)
tilelang/carver/arch/metal.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/carver/arch/metal.py (2)

tilelang/carver/arch/arch_base.py (1)

TileDevice (4-37)

tilelang/carver/template/base.py (1)

arch (152-159)

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)

tilelang/carver/arch/metal.py

17-20: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🔇 Additional comments (3)

tilelang/carver/arch/metal.py (1)

5-6: Helper looks good.
.github/workflows/metal_ci.yml (2)
51-51: Drop ROCm flag in Metal CI configure step.

This job targets Metal; forcing -DUSE_ROCM=ON on macOS is likely wrong and may fail. Prefer no ROCm flag (or a Metal-specific one if your CMake supports it).
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_ROCM=ON; cd ..
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON; cd ..
If your build supports it, consider adding: -DUSE_METAL=ON.

105-111: pytest-timeout declared — no change required

pytest-timeout is listed in requirements-test.txt (line 16); --timeout is supported so keep the workflow as-is.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

setup.py (2)
269-276: Fix possible TypeError when CMake isn't installed.

shutil.which can return None; os.path.exists(None) crashes.
 def get_cmake_path() -> str:
     """Return the path to the CMake compiler.
     """
-    # found which cmake is used
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
         raise Exception("CMake is not installed, please install it first.")
     return cmake_path
829-844: Ensure CUDA is disabled when building for Metal.

Without explicitly setting USE_CUDA OFF here, CMake template defaults could enable CUDA inadvertently.
 if USE_METAL:
     content_lines += [
         "set(USE_METAL ON)",
         "set(USE_ROCM OFF)",
+        "set(USE_CUDA OFF)",
     ]

♻️ Duplicate comments (8)

setup.py (1)

36-43: Boolean env parsing still wrong; empty/unknown values fall back to default and '' branch is unreachable.

Normalize and handle both true/false tokens (incl. empty string). Current logic can flip GPU toggles unexpectedly.

Apply:

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default

.github/workflows/metal_ci.yml (7)

93-97: Remove --no-user and avoid duplicate installs.

Install requirements here; leave project install to the dedicated step below.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
           [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
-          pip install . --no-user
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

21-25: Upgrade setup-python to v5.

v2 is deprecated and fails on current runners.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

76-79: Upgrade setup-python to v5 (second job).

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

29-31: sha256sum isn’t available on macOS runners; use shasum -a 256.

-        REQS_HASH=$(sha256sum requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")

83-85: Consistently hash requirements; don’t use cat.

-        REQS_HASH=$(cat requirements-test.txt 2>/dev/null || true)
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")

40-43: Drop invalid pip flag --no-user.

This option doesn’t exist and can fail even inside venvs.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

101-104: Remove --no-user (wheel install step).

-        pip install . --no-user -v
+        pip install . -v

🧹 Nitpick comments (5)

setup.py (4)

51-53: More robust Metal default detection.

Prefer platform checks that don’t rely on mac_ver quirks.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+MAYBE_METAL = (platform.system() == 'Darwin' and platform.machine() == 'arm64')

102-106: Shorten exception message (Ruff TRY003) and keep details in logs.

Long inline messages trigger TRY003 and add noise.

-if USE_CUDA and not CUDA_HOME:
-    raise ValueError(
-        "CUDA support is enabled by default on linux if `USE_ROCM=False`, but CUDA_HOME is not set or detected.")
+if USE_CUDA and not CUDA_HOME:
+    raise ValueError("CUDA enabled but CUDA_HOME not set/detected.")

463-484: Use logger.warning instead of “WARNING:” via info.

This preserves severity and integrates with log filters.

-                logger.info(f"WARNING: {item} not found in any expected directories!")
+                logger.warning(f"{item} not found in any expected directories!")

886-889: Update metadata to reflect macOS/Metal support.

Expose correct OS in metadata; remove Linux-only bias when targeting Metal.

-    platforms=[
-        "Environment :: GPU :: NVIDIA CUDA" if not USE_ROCM else "Environment :: GPU :: AMD ROCm",
-        "Operating System :: POSIX :: Linux",
-    ],
+    platforms=[
+        ("Environment :: GPU :: NVIDIA CUDA" if USE_CUDA else
+         "Environment :: GPU :: AMD ROCm" if USE_ROCM else
+         "Environment :: GPU"),
+        "Operating System :: MacOS :: MacOS X" if USE_METAL else "Operating System :: POSIX :: Linux",
+    ],

-        "Operating System :: OS Independent",
+        ("Operating System :: MacOS :: MacOS X" if USE_METAL else "Operating System :: POSIX :: Linux"),

Also applies to: 893-900

.github/workflows/metal_ci.yml (1)

13-13: Trailing space.

Minor YAML lint fix.
-      contents: write 
+      contents: write

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d546487de7e3a2031c9106e89210669f666bbccd and 5b306179cb8b24f66118ff9847bad7a1847e82ff.

📒 Files selected for processing (2)

.github/workflows/metal_ci.yml (1 hunks)
setup.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

setup.py (1)

tilelang/env.py (1)

get (136-139)

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)

setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: format-check

🔇 Additional comments (1)

setup.py (1)

108-111: Linux-only accelerator requirement is fine.

The guard correctly avoids enforcing CUDA/ROCm on macOS/Metal builds.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

.github/workflows/ci.yml (2)

22-25: Upgrade to setup-python v5.

actions/setup-python@v2 is deprecated and flagged by actionlint.

-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5

Also applies to: 78-81

40-45: Remove invalid pip install --no-user.

--no-user is not a valid pip flag (inside venvs it’s unnecessary anyway).

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-          pip install flash_attn==2.5.8 --no-user --no-build-isolation
+          pip install flash_attn==2.5.8 --no-build-isolation
-          pip install . --no-user
+          pip install .
-        pip install . --no-user -v
+        pip install . -v

Also applies to: 96-101, 107-107

.github/workflows/amd_ci.yml (1)

23-26: Upgrade to setup-python v5.

v2 is deprecated; actionlint will flag this.
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
Also applies to: 77-80

♻️ Duplicate comments (3)

.github/workflows/metal_ci.yml (3)

21-25: Update setup-python to v5.

Same as earlier bot comment; apply in both jobs.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

Also applies to: 76-79

29-31: Fix hashing on macOS (no sha256sum).

Use shasum -a 256 (portable on macOS). Apply to both blocks.

-        REQS_HASH=$(sha256sum requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")

-        REQS_HASH=$(cat requirements-test.txt 2>/dev/null || true)
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")

Also applies to: 83-85

40-44: Remove invalid pip install --no-user.

Not a valid flag and unnecessary in venvs. Apply in all occurrences.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-        pip install . --no-user -v
+        pip install . -v

Also applies to: 93-96, 102-102

🧹 Nitpick comments (4)

.github/workflows/ci.yml (1)
12-14: Tighten permissions.

Format check doesn’t push commits; prefer read-only.
-    permissions:
-      contents: write 
+    permissions:
+      contents: read
.github/workflows/metal_ci.yml (3)
13-13: Trim trailing space.

YAMLlint flags trailing spaces.
-      contents: write 
+      contents: write
10-10: Target Apple Silicon explicitly for MPS.

To avoid Intel mac runners without MPS, prefer macOS 14 (Apple Silicon).
-    runs-on: [macos-latest]
+    runs-on: [macos-14]
Please confirm your org has access to macOS-14 ARM runners; otherwise keep latest and skip MPS tests when unavailable.

Also applies to: 63-63

104-109: Add a quick MPS sanity check before tests.

Fail fast if MPS isn’t available.
     - name: Run metal test
       run: |
         source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
+        python - <<'PY'
+import sys
+try:
+    import torch
+    print("torch:", torch.__version__, "MPS available:", torch.backends.mps.is_available())
+    assert torch.backends.mps.is_available(), "MPS not available on this runner"
+except Exception as e:
+    print("Precheck failed:", e, file=sys.stderr)
+    sys.exit(1)
+PY
         cd testing/python
         unset PYTHONPATH
         python -m pytest -k metal -v -r fE --durations=0 --timeout=3600

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b306179cb8b24f66118ff9847bad7a1847e82ff and 65ffe111a4df5414ce44a0a14c5113c6491e35c8.

📒 Files selected for processing (3)

.github/workflows/amd_ci.yml (1 hunks)
.github/workflows/ci.yml (1 hunks)
.github/workflows/metal_ci.yml (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-test-metal

.github/workflows/amd_ci.yml

.github/workflows/ci.yml

coderabbitai · 2025-09-12T09:22:57Z

.github/workflows/metal_ci.yml

+        git submodule update --init --recursive --recommend-shallow
+        mkdir -p build
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..


🛠️ Refactor suggestion

Guard --recommend-shallow for older Git on macOS.

Add a fallback to keep runners with older Git working.

- git submodule update --init --recursive --recommend-shallow + if git submodule update -h 2>&1 | grep -q -- '--recommend-shallow'; then + git submodule update --init --recursive --recommend-shallow + else + git submodule update --init --recursive + fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

git submodule update --init --recursive --recommend-shallow

mkdir -p build

cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..

if git submodule update -h 2>&1 | grep -q -- '--recommend-shallow'; then

git submodule update --init --recursive --recommend-shallow

else

git submodule update --init --recursive

fi

mkdir -p build

cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

.github/workflows/metal_ci.yml (5)
16-16: Remove trailing whitespace (yamllint violation).

Line 16 has a trailing space after "write".
-      contents: write 
+      contents: write
40-50: Make format-check fail deterministically on formatting drift.

Currently you fail only when format.sh exits non‑zero; if the script exits 0 but emits the sentinel text, the job passes. Harden the logic.
-        if ! output=$(./format.sh 2>&1); then
-          echo "------------------------------------"
-          echo "message:"
-          echo "$output"
-          printf '%s\n' "$output" | grep "Please review and stage the changes."
-          echo "------------------------------------"
-          exit 1
-        fi
+        set -euo pipefail
+        status=0
+        output="$(./format.sh 2>&1)" || status=$?
+        if [ "$status" -ne 0 ] || printf '%s\n' "$output" | grep -q "Please review and stage the changes."; then
+          echo "------------------------------------"
+          echo "message:"
+          printf '%s\n' "$output"
+          echo "------------------------------------"
+          exit 1
+        fi
13-13: Pin to Apple Silicon runners for MPS/Metal determinism.

To ensure MPS availability and consistent behavior, prefer macOS 14 (Apple Silicon) explicitly rather than the moving alias macos-latest.
-    runs-on: [macos-latest]
+    runs-on: macos-14
Apply to both jobs after confirming that your Metal/MPS tests require Apple Silicon on GitHub-hosted runners.

Also applies to: 54-54

15-17: Use least-privilege permissions (write not needed).

This job doesn’t push changes; read is sufficient.
-    permissions:
-      contents: write
+    permissions:
+      contents: read
82-86: Use working-directory instead of cd; keep PYTHONPATH unset.

Cleaner, avoids subshell state and makes logs clearer.
-    - name: Run metal test
-      run: |
-        cd testing/python
-        unset PYTHONPATH
-        python -m pytest -k metal -v -r fE --durations=0 --timeout=3600
+    - name: Run metal test
+      working-directory: testing/python
+      run: |
+        unset PYTHONPATH
+        python -m pytest -k metal -v -r fE --durations=0 --timeout=3600

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65ffe111a4df5414ce44a0a14c5113c6491e35c8 and ca6546ac34e89cde32673641f20a0aca904533c8.

📒 Files selected for processing (1)

.github/workflows/metal_ci.yml (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

28-28: property "os" is not defined in object type {}

(expression)

68-68: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 16-16: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: format-check
GitHub Check: bot-task

.github/workflows/metal_ci.yml

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

.github/workflows/metal_ci.yml (1)
63-63: Fix invalid cache key reference to matrix.os (actionlint failure).

Replace undefined matrix.os with runner.os and include runner.arch for uniqueness.
-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}

🧹 Nitpick comments (5)

.github/workflows/metal_ci.yml (5)

16-16: Remove trailing space (yamllint).

-      contents: write 
+      contents: write

15-17: Scope down permissions to least privilege.

This job doesn’t appear to need write. Prefer read.

     permissions:
-      contents: write
+      contents: read

33-46: Make format check fail on detected changes even if script exits 0.

Currently failure only triggers when format.sh returns non-zero. Also fail on the marker text and on any git diff.

     - name: Run format check
       run: |
-        source .venv/bin/activate
-        mkdir -p build
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..
-        if ! output=$(./format.sh 2>&1); then
-          echo "------------------------------------"
-          echo "message:"
-          echo "$output"
-          printf '%s\n' "$output" | grep "Please review and stage the changes."
-          echo "------------------------------------"
-          exit 1
-        fi
-        rm -rf build
+        set -euo pipefail
+        source .venv/bin/activate
+        mkdir -p build
+        (cd build && cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON)
+        output=$(./format.sh 2>&1) || true
+        echo "------------------------------------"
+        echo "$output"
+        echo "------------------------------------"
+        if printf '%s\n' "$output" | grep -q "Please review and stage the changes."; then
+          exit 1
+        fi
+        if ! git diff --quiet; then
+          git --no-pager diff
+          exit 1
+        fi
+        rm -rf build

13-13: Consider pinning to macos-14 for stable Apple Silicon/MPS runners.

Reduces surprises from future macos-latest changes.

-    runs-on: [macos-latest]
+    runs-on: macos-14

Also applies to: 49-49

77-81: Optional: print ccache stats after tests.

Quick visibility into cache effectiveness.

     - name: Run metal test
       run: |
         cd testing/python
         unset PYTHONPATH
         python -m pytest -k metal -v -r fE --durations=0 --timeout=3600
+
+    - name: ccache stats
+      run: ccache -s || true

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca6546ac34e89cde32673641f20a0aca904533c8 and e9cd15f744d1cb61534c907cad248305982953e0.

📒 Files selected for processing (1)

.github/workflows/metal_ci.yml (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

63-63: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 16-16: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: bot-task

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

.github/workflows/metal_ci.yml (1)
64-64: Fix cache key: matrix.os is undefined in this workflow
Using matrix.os without a matrix breaks actionlint and can affect cache keys. Use runner context.
-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}

🧹 Nitpick comments (3)

.github/workflows/metal_ci.yml (3)
14-14: Trim trailing whitespace (yamllint error)

There’s a trailing space after write flagged by YAMLlint.
-      contents: write 
+      contents: write
11-11: Pin macOS runner to Apple Silicon image for stable MPS/Metal
macos-latest can shift. Pin to macos-14 (or your validated version) to ensure consistent MPS/Metal availability.
-    runs-on: [macos-latest]
+    runs-on: macos-14
(Apply to both jobs.)

Also applies to: 47-47

13-15: Restrict permissions to least privilege
This job doesn’t push changes; contents: read suffices.
-    permissions:
-      contents: write
+    permissions:
+      contents: read

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9cd15f744d1cb61534c907cad248305982953e0 and 98336a39372308bb8964379727b6155dc665cdb3.

📒 Files selected for processing (2)

.github/workflows/metal_ci.yml (1 hunks)
format.sh (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

64-64: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 14-14: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: format-check
GitHub Check: bot-task

.github/workflows/metal_ci.yml

format.sh

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

.github/workflows/ci.yml (3)
124-125: Stabilize Metal job: pin Apple Silicon runner and preflight MPS availability.

To avoid drift if macos-latest changes, pin to an Apple Silicon image (e.g., macos-14). Add a quick MPS check to fail fast with a clear message if MPS isn’t available.

Apply:
-    runs-on: [macos-latest]
+    runs-on: macos-14
Insert before tests:
+    - name: Verify MPS availability
+      run: |
+        python - <<'PY'
+        import sys
+        try:
+          import torch
+          ok = getattr(torch.backends, "mps", None) and torch.backends.mps.is_available()
+        except Exception:
+          ok = False
+        print(f"MPS available: {ok}")
+        sys.exit(0 if ok else 1)
+        PY
Also applies to: 156-160

133-136: Improve checkout completeness for submodules on macOS.

Ensure the whole history is available when submodules/versions require it.

Apply:
       with:
         submodules: recursive
+        fetch-depth: 0
12-14: Least-privilege: format-check likely only needs read permission.

Unless you auto-commit formatting, drop to contents: read.

Apply:
-    permissions:
-      contents: write 
+    permissions:
+      contents: read

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98336a39372308bb8964379727b6155dc665cdb3 and fd8a47bf287afd3a2435815e3b9370d4e2ec4d49.

📒 Files selected for processing (1)

.github/workflows/ci.yml (2 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/ci.yml

141-141: property "os" is not defined in object type {}

(expression)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: format-check
GitHub Check: bot-task

coderabbitai · 2025-09-12T11:20:18Z

.github/workflows/ci.yml

@@ -7,7 +7,7 @@ env:

 jobs:
  format-check:
-    runs-on: [self-hosted, nvidia]
+    runs-on: [ubuntu-latest]


⚠️ Potential issue

Make format-check runner GPU-agnostic: drop flash-attn build and don’t force CUDA in CMake.

On ubuntu-latest, building flash_attn commonly fails (no CUDA toolchain). Also forcing -DUSE_CUDA=ON can make CMake error out on non-GPU runners. These can break formatting CI unrelated to GPU.

Apply:

- pip install flash_attn==2.5.8 --no-user --no-build-isolation + # GPU-specific deps are not needed for format check; avoid fragile builds + true

- cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=ON; cd .. + # Generate compile_commands.json without assuming CUDA + cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=OFF; cd ..

Also applies to: 41-45, 53-54

🤖 Prompt for AI Agents

.github/workflows/ci.yml around line 10 (and also update the same changes at lines 41-45 and 53-54): the format-check runner is currently tied to GPU tooling — remove the step that builds flash-attn and stop forcing CUDA in CMake for this runner; update the job matrix/steps so the format-check job does not invoke building flash-attn and remove or conditionally set any -DUSE_CUDA=ON CMake flags (only enable CUDA on GPU-capable jobs), ensuring the workflow uses CPU-only defaults so formatting CI will run on ubuntu-latest without requiring a CUDA toolchain.

coderabbitai · 2025-09-12T11:20:18Z

.github/workflows/ci.yml

+      uses: hendrikmuhs/ccache-action@v1.2
+      with:
+        create-symlink: true
+        key: ${{ github.job }}-${{ matrix.os }}
+


⚠️ Potential issue

Fix ccache key: matrix.os is undefined (will fail at runtime).

matrix.os isn’t defined in this job (no matrix), which actionlint already flags. Use runner.os and include Python/version and requirements hash to improve cache hits.

Apply:

- with: - create-symlink: true - key: ${{ github.job }}-${{ matrix.os }} + with: + create-symlink: true + key: ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements-*.txt') }} + restore-keys: | + ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uses: hendrikmuhs/ccache-action@v1.2

with:

create-symlink: true

key: ${{ github.job }}-${{ matrix.os }}

uses: hendrikmuhs/ccache-action@v1.2

with:

create-symlink: true

key: ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements-*.txt') }}

restore-keys: |

${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-

🧰 Tools

🪛 actionlint (1.7.7)

141-141: property "os" is not defined in object type {}

(expression)

🤖 Prompt for AI Agents

In .github/workflows/ci.yml around lines 138 to 142 the ccache key uses matrix.os which is undefined and will fail; replace matrix.os with runner.os and expand the key to include stable identifiers like the Python version and a hash of requirements (or lockfile) to improve cache correctness and hit rate — e.g. construct the key as a combination of github.job, runner.os, the python-version variable, and a requirements.txt (or poetry/poetry.lock) hash so the cache is unique per OS/Python/deps and will not error at runtime.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.github/workflows/amd_ci.yml (1)
64-120: Remove invalid pip flag and modernize Python setup

pip --no-user is invalid and will fail; remove it (both install locations).

Update actions/setup-python to v5 in this job too.
-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
@@
-          USE_ROCM=True pip install . --no-user
+          USE_ROCM=True pip install .
@@
-        USE_ROCM=True pip install . --no-user
+        USE_ROCM=True pip install .
.github/workflows/metal_ci.yml (1)
88-94: Install the built wheel explicitly

Avoid picking up sdists (which could rebuild without USE_METAL). Install only the wheel.
-        uv pip install dist/*
+        uv pip install dist/*.whl

♻️ Duplicate comments (2)

.github/workflows/metal_ci.yml (2)
82-87: Don’t source .venv with setup-uv; ensure Metal build flag set

setup-uv already activates the env; .venv/bin/activate may not exist and will fail. Also ensure Metal is enabled during build.
-        source .venv/bin/activate
-        uv pip install build wheel ninja cmake
-        python -mbuild -w --no-isolation
+        uv pip install build wheel ninja cmake
+        USE_METAL=1 python -m build -w --no-isolation
65-70: Fix undefined matrix.os in ccache key (actionlint)

matrix.os isn’t defined; use runner context to keep keys valid and unique.
-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}

🧹 Nitpick comments (4)

setup.py (1)
102-106: Improve error message clarity.

The error message for missing CUDA_HOME uses a backslash for line continuation which could be confusing. Consider using parentheses for cleaner multi-line strings.

Apply this diff to improve readability:
-        "CUDA support is enabled by default on linux if `USE_ROCM=False`," \
-        " but CUDA_HOME is not set or detected.")
+        ("CUDA support is enabled by default on linux if `USE_ROCM=False`, "
+         "but CUDA_HOME is not set or detected."))
.github/workflows/amd_ci.yml (1)
9-62: Gating with if: false — consider a toggle and update setup-python

Replace hardcoded if: false with a single env/input (e.g., ENABLE_AMD_CI) so both jobs can be enabled together without editing YAML.

Update actions/setup-python from v2 to v5 to avoid deprecation issues.
-  format-check:
-    if: false
+  format-check:
+    if: ${{ env.ENABLE_AMD_CI == 'true' }}
@@
-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
.github/workflows/metal_ci.yml (2)
10-11: Pin runner to Apple Silicon

Metal/MPS tests may require Apple Silicon. Prefer macos-14 (arm64) over macos-latest to avoid Intel runners.
-    runs-on: [macos-latest]
+    runs-on: macos-14
12-14: Tighten permissions and fix trailing space

Use least-privilege and remove trailing space flagged by yamllint.
-    permissions:
-      contents: write 
+    permissions:
+      contents: read

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd8a47bf287afd3a2435815e3b9370d4e2ec4d49 and 32edb2446da0b9e707397767677eb68b032d9962.

📒 Files selected for processing (4)

.github/workflows/amd_ci.yml (2 hunks)
.github/workflows/ci.yml (2 hunks)
.github/workflows/metal_ci.yml (1 hunks)
setup.py (11 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/ci.yml

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

69-69: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)

setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-metal
GitHub Check: bot-task

🔇 Additional comments (9)

setup.py (8)

36-43: LGTM! Boolean environment parsing is now more robust.

The updated _read_bool_env function correctly handles both truthy and falsy values explicitly, including empty strings as false. This prevents the previous issue where explicit false values like "0" or "false" would incorrectly return the default.

65-81: LGTM! Platform-specific library naming is correctly handled.

The prebuild items lists are properly differentiated between Linux (.so) and macOS (.dylib) shared libraries, with appropriate deletion items for macOS debug symbols.

614-622: LGTM! CMakeExtension constructor is properly updated.

The addition of **kwargs parameter allows for flexible extension configuration while maintaining backward compatibility.

669-669: LGTM! Platform-specific file globbing is correctly implemented.

The dynamic selection between .so and .dylib files based on the platform is appropriate for cross-platform compatibility.

700-710: Document Metal-specific Cython compilation issue.

The comment explains the issue with Cython compilation on Metal/macOS requiring libpython linkage. This is a reasonable workaround until the issue is investigated further.

761-763: Document the macOS Cython linking issue.

The FIXME comment correctly identifies that aarch64 Darwin needs additional Python library linking. This is consistent with the earlier Metal-specific handling.

822-857: LGTM! CMake configuration logic properly handles Metal backend.

The Metal-first configuration approach is well-implemented:

Metal takes precedence when USE_METAL is enabled

Proper fallback to ROCM/CUDA backends

Correct mutual exclusion of backends

487-494: LGTM! Cleanup logic for platform-specific artifacts is properly implemented.

The loop handles deletion of macOS-specific build artifacts (like .dSYM files) that need to be removed during the build process.

.github/workflows/metal_ci.yml (1)

35-49: Format step looks good

CMake prep with USE_METAL=ON and clean failure output are clear.

coderabbitai · 2025-09-12T13:35:43Z

setup.py

+IS_LINUX = platform.system() == 'Linux'
+MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+
 # Add LLVM control environment variable
-USE_LLVM = os.environ.get("USE_LLVM", "False").lower() == "true"
+USE_LLVM = _read_bool_env('USE_LLVM')
+# Add ROCM control environment variable
+USE_ROCM = _read_bool_env("USE_ROCM")
+# Add ROCM control environment variable
+USE_METAL = _read_bool_env("USE_METAL", MAYBE_METAL)
 # Add ROCM control environment variable
-USE_ROCM = os.environ.get("USE_ROCM", "False").lower() == "true"
+USE_CUDA = _read_bool_env("USE_CUDA", IS_LINUX and not USE_ROCM)
 # Build with Debug mode


💡 Verification agent

🧩 Analysis chain

Verify Metal availability detection logic.

The MAYBE_METAL variable checks for arm64 architecture on macOS, but this may not be sufficient for determining Metal support availability. Consider if additional checks are needed for Metal framework availability.

🏁 Script executed:

#!/bin/bash # Description: Check if Metal detection logic in other files is consistent with setup.py # Search for Metal availability checks in the codebase rg -n "check_metal_availability|metal.*availability" --type=py -A3 -B3 # Look for platform-specific Metal checks rg -n "mac_ver|arm64|MAYBE_METAL" --type=py -A2 -B2

Length of output: 2480

Replace arch-only Metal detection with feature-based detection

Both setup.py (MAYBE_METAL = platform.mac_ver()[2] == 'arm64') and tilelang/utils/target.py (check_metal_availability returning arch == 'arm64') assume Metal only on arm64 and thus exclude Intel Macs that may support Metal. Replace these ad-hoc checks with a single helper that verifies macOS release and Metal framework availability (e.g., presence of /System/Library/Frameworks/Metal.framework or attempting to load the framework) and use it both in setup.py and target selection.

Locations: setup.py (MAYBE_METAL, ~lines 51–62); tilelang/utils/target.py (check_metal_availability, ~lines 46–51; used at ~line 84).

🤖 Prompt for AI Agents

In setup.py around lines 51–62 and tilelang/utils/target.py around lines 46–51 and usage at ~84, replace the arch-only Metal detection with a feature-based helper: add a cross-file helper is_metal_available() (in a shared util module or a small platform_helpers.py) that returns True only if platform.system() == 'Darwin', macOS release is new enough (parse platform.mac_ver()[0] or platform.release()), and the Metal framework is present (check existence of /System/Library/Frameworks/Metal.framework or attempt to load it via ctypes.CDLL); then change MAYBE_METAL in setup.py to call is_metal_available() instead of checking arm64, and replace check_metal_availability / arch == 'arm64' in tilelang/utils/target.py to call this new helper so both files use the same robust feature-detection logic.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

.github/workflows/amd_ci.yml (2)

41-45: Remove invalid pip flag --no-user.

pip install --no-user isn’t a valid option and will fail.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

98-105: Remove --no-user in remaining pip installs.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
@@
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-rocm.txt
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-rocm.txt
@@
-          USE_ROCM=True pip install . --no-user
+          USE_ROCM=True pip install .
@@
-        USE_ROCM=True pip install . --no-user
+        USE_ROCM=True pip install .

Also applies to: 112-112

setup.py (2)

271-279: get_cmake_path can raise TypeError when CMake is missing.

Guard None before os.path.exists.

 def get_cmake_path() -> str:
@@
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
         raise Exception("CMake is not installed, please install it first.")
     return cmake_path

843-857: Honor USE_CUDA=False and explicitly disable CUDA when using Metal.

Current logic enables CUDA if CUDA_HOME is set even when USE_CUDA=0, and Metal path doesn’t force USE_CUDA OFF.

-        if USE_METAL:
+        if USE_METAL:
             content_lines += [
                 "set(USE_METAL ON)",
                 "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
             ]
-        elif USE_ROCM:
+        elif USE_ROCM:
             content_lines += [
                 f"set(USE_ROCM {ROCM_HOME})",
                 "set(USE_CUDA OFF)",
             ]
-        elif CUDA_HOME:
+        elif USE_CUDA and CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]

♻️ Duplicate comments (4)

.github/workflows/ci.yml (1)

41-45: Remove invalid --no-user and avoid fragile GPU deps in format-check.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-          pip install flash_attn==2.5.8 --no-user --no-build-isolation
+          # GPU-specific deps not needed for formatting
+          true
@@
-          pip install flash_attn==2.5.8 --no-user --no-build-isolation
-          pip install . --no-user
+          # Skip GPU-only deps here
+          pip install .
@@
-        pip install . --no-user -v
+        pip install . -v

-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=ON; cd ..
+        # Do not force CUDA for formatting
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=OFF; cd ..

Also applies to: 101-103, 109-109

.github/workflows/metal_ci.yml (2)

84-86: Don’t source .venv when using setup-uv with activation.

The UV action already activates the environment; .venv may not exist.

-        source .venv/bin/activate
-        uv pip install -v --no-build-isolation .
+        uv pip install -v --no-build-isolation .

69-69: Fix cache key: matrix.os is undefined (actionlint error).

Use runner context to avoid runtime failure.

-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-${{ runner.arch }}-py${{ env.PYTHON_VERSION }}
+        restore-keys: |
+          ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-

setup.py (1)

36-44: Boolean env parser misinterprets explicit false values (and ignores empty string).

if env := os.environ.get(name): skips ''; elif env in ['', ...] is unreachable. This can enable backends unintentionally.

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ("1", "true", "on", "yes", "y"):
+        return True
+    if s in ("0", "false", "off", "no", "n", ""):
+        return False
+    return default

🧹 Nitpick comments (13)

.github/workflows/amd_ci.yml (4)
11-11: Don’t hard-disable AMD CI; gate via repo var or input.

Replace if: false with a controllable switch so maintainers can re-enable without code changes.
-    if: false
+    if: ${{ vars.DISABLE_AMD_CI != '1' }}
65-65: Same here: avoid permanent disable.

Use a variable-gated condition instead of if: false.
-    if: false
+    if: ${{ vars.DISABLE_AMD_CI != '1' }}
24-26: Upgrade setup-python to v5.

actions/setup-python@v2 is deprecated.
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}
Also applies to: 79-81

15-15: Trim trailing whitespace in permissions.
-      contents: write 
+      contents: write
.github/workflows/ci.yml (4)
10-10: Avoid permanent disable; gate via a repo variable.
-    if: false
+    if: ${{ vars.DISABLE_NVIDIA_CI != '1' }}
66-66: Same for build-test-nvidia: use a toggle instead of if: false.
-    if: false
+    if: ${{ vars.DISABLE_NVIDIA_CI != '1' }}
23-25: Update to actions/setup-python@v5.
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}
Also applies to: 80-82

14-14: Trailing space after contents: write.
-      contents: write 
+      contents: write
.github/workflows/metal_ci.yml (2)
13-13: Trim trailing whitespace.
-      contents: write 
+      contents: write
40-40: Consider generator parity with setup.py.

If the build expects Ninja elsewhere, specify -G Ninja here too for consistency, or rely on default uniformly.
setup.py (3)
51-53: Metal detection: prefer platform.system()/platform.machine().

platform.mac_ver() is brittle; simplify and be explicit.
-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == "Linux"
+MAYBE_METAL = (platform.system() == "Darwin" and platform.machine() == "arm64")
900-903: Classifiers/platforms: include macOS when Metal builds are possible.
-    platforms=[
-        "Environment :: GPU :: NVIDIA CUDA" if not USE_ROCM else "Environment :: GPU :: AMD ROCm",
-        "Operating System :: POSIX :: Linux",
-    ],
+    platforms=[
+        ("Environment :: GPU :: NVIDIA CUDA" if (USE_CUDA and not USE_ROCM) else
+         "Environment :: GPU :: AMD ROCm" if USE_ROCM else
+         "Environment :: GPU :: Apple Metal"),
+        ("Operating System :: MacOS" if USE_METAL else "Operating System :: POSIX :: Linux"),
+    ],
103-106: Long exception message (TRY003).

Consider shortening or using a constant; non-blocking.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd8a47bf287afd3a2435815e3b9370d4e2ec4d49 and 608fef209e5a99d4fe6a6cc6d1feb5b1abcf7092.

📒 Files selected for processing (4)

.github/workflows/amd_ci.yml (2 hunks)
.github/workflows/ci.yml (2 hunks)
.github/workflows/metal_ci.yml (1 hunks)
setup.py (11 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

69-69: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)

setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

coderabbitai · 2025-09-12T13:41:14Z

.github/workflows/metal_ci.yml

+        python-version: ${{ env.PYTHON_VERSION }}
+
+    - name: Ensure venv (local & persistent)
+      run: uv pip install -r requirements-test.txt -r requirements-build.txt


⚠️ Potential issue

Guard optional requirements to avoid failures if the file is missing.

- run: uv pip install -r requirements-test.txt -r requirements-build.txt + run: | + set -e + [[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt + [[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

run: uv pip install -r requirements-test.txt -r requirements-build.txt

run: |

set -e

[[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt

[[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt

🤖 Prompt for AI Agents

In .github/workflows/metal_ci.yml around line 80, the step currently tries to install optional requirement files unconditionally which will fail the job if those files are missing; change the run command to detect each optional requirements file (e.g., requirements-test.txt and requirements-build.txt) and only include them in the pip install command if they exist (use a shell conditional to build the -r arguments or test -f before calling pip), ensuring that missing files are skipped but real install errors still fail the job.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

setup.py (1)

271-279: get_cmake_path can raise TypeError when cmake is missing

shutil.which() returns None; os.path.exists(None) throws. Guard None explicitly.

 def get_cmake_path() -> str:
@@
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
-        raise Exception("CMake is not installed, please install it first.")
-    return cmake_path
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
+        raise RuntimeError("CMake is not installed or not on PATH.")
+    return cmake_path

♻️ Duplicate comments (5)

.github/workflows/metal_ci.yml (3)

84-85: Don’t source .venv with setup-uv; it’s already activated

.setup-uv with activate-environment: true exports the env; .venv may not exist.
-        source .venv/bin/activate
-        uv pip install -v --no-build-isolation .
+        uv pip install -v --no-build-isolation .
69-69: Fix actionlint error: undefined matrix.os in cache key

No matrix is defined; this breaks actionlint and can impact cache reuse.
-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}
79-81: Guard optional requirement files

Unconditionally installing both files fails if one is absent.
-      run: uv pip install -r requirements-test.txt -r requirements-build.txt
+      run: |
+        set -euo pipefail
+        [[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt
+        [[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt

setup.py (2)

51-62: Prefer feature-based Metal detection over arch-only heuristic

Intel Macs can support Metal; arch-only check excludes them. Centralize detection (Darwin + Metal.framework presence) and reuse.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+
+def _has_metal() -> bool:
+    if platform.system() != 'Darwin':
+        return False
+    return os.path.exists('/System/Library/Frameworks/Metal.framework')
+
+MAYBE_METAL = _has_metal()

36-43: Boolean env parsing bug: empty string never reaches the “false” branch

The truthy guard prevents '' from being treated as false; explicit “0/false/off/''” can incorrectly return default.

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default

🧹 Nitpick comments (4)

.github/workflows/metal_ci.yml (2)

12-14: Drop unnecessary write permission and fix trailing space

This job doesn't publish artifacts; reduce to read-only and remove trailing space on Line 13.
-    permissions:
-      contents: write 
+    permissions:
+      contents: read
71-78: Ensure Ninja is present or fall back to default generator

CMake is invoked with “-G Ninja” in setup.py; missing Ninja will fail on some mac runners.
     - name: Install python via uv
       uses: astral-sh/setup-uv@v6
       with:
         enable-cache: true
         ignore-nothing-to-cache: true
         activate-environment: true
         python-version: ${{ env.PYTHON_VERSION }}
+
+    - name: Ensure Ninja
+      run: |
+        if ! command -v ninja >/dev/null 2>&1; then
+          brew update && brew install ninja
+        fi
Optionally, make setup.py fall back to default generator if Ninja is absent.

setup.py (2)

823-825: Prefer Python3_EXECUTABLE for CMake

FindPython prefers Python3_EXECUTABLE; improves compatibility across CMake toolchains.

-            f"-DPython_EXECUTABLE={sys.executable}",
+            f"-DPython3_EXECUTABLE={sys.executable}",

844-859: Always emit mutually exclusive GPU flags

When CUDA is selected, set USE_ROCM OFF; when ROCm is selected, set USE_CUDA OFF; when neither is selected, set both OFF explicitly to avoid stale cache effects.

         if USE_METAL:
             content_lines += [
                 "set(USE_METAL ON)",
-                "set(USE_ROCM OFF)",
+                "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
             ]
         elif USE_ROCM:
             content_lines += [
                 f"set(USE_ROCM {ROCM_HOME})",
                 "set(USE_CUDA OFF)",
             ]
         elif CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]
+        else:
+            content_lines += [
+                "set(USE_CUDA OFF)",
+                "set(USE_ROCM OFF)",
+            ]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 608fef209e5a99d4fe6a6cc6d1feb5b1abcf7092 and 4c42d7eea70d878a010888bc4944585c059109e7.

📒 Files selected for processing (2)

.github/workflows/metal_ci.yml (1 hunks)
setup.py (11 hunks)

🧰 Additional context used

🪛 actionlint (1.7.7)

.github/workflows/metal_ci.yml

69-69: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)

.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)

setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-metal
GitHub Check: build-test-amd
GitHub Check: bot-task

🔇 Additional comments (2)

setup.py (2)
102-111: Linux-only GPU requirement may be too strict

This hard-errors on Linux without CUDA/ROCM even for CPU-only users or CI. Confirm this is intentional; otherwise downgrade to warning or gate behind env.

If relaxing is desired:
-if IS_LINUX and not (CUDA_HOME or ROCM_HOME):
-    raise ValueError(
-        "Failed to automatically detect CUDA or ROCM installation. Please set the CUDA_HOME or ROCM_HOME environment variable manually (e.g., export CUDA_HOME=/usr/local/cuda or export ROCM_HOME=/opt/rocm)."
-    )
+if IS_LINUX and not (CUDA_HOME or ROCM_HOME):
+    logger.warning("No CUDA/ROCM detected; building CPU-only components.")
700-710: Remove CythonExtension from ext_modules when MAYBE_METAL is true

setup.py returns early for Metal without producing the Cython artifact, but ext_modules still declares CythonExtension — this can break packaging/editable installs.

File: setup.py (around lines 700–710)
-    ext_modules=[
-        CMakeExtension("TileLangCXX", sourcedir="."),
-        CythonExtension("TileLangCython", sourcedir="."),
-    ],
+    ext_modules=[
+        CMakeExtension("TileLangCXX", sourcedir="."),
+    ] + ([] if MAYBE_METAL else [CythonExtension("TileLangCython", sourcedir=".")]),
Alternative: keep the extension but provide a stub .so or mark the extension optional so build_ext is satisfied.

.github/workflows/metal_ci.yml

coderabbitai · 2025-09-12T14:43:38Z

setup.py

+                            # fixme: aarch64 darwin needs something like `-Lxxx -lpython3.12`
                            command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
+                            # logger.info(command)
                            os.system(command)


🛠️ Refactor suggestion

Fix Darwin link flags for Cython build (non-Metal macs)

Using “-shared” on macOS is incorrect; use -dynamiclib and allow undefined symbols to Python.

- command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}" - # logger.info(command) - os.system(command) + if sys.platform == "darwin": + ldflags = "-dynamiclib -undefined dynamic_lookup" + else: + ldflags = "-shared -pthread -fPIC" + command = f"{cc} {ldflags} -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}" + logger.info(command) + subprocess.check_call(command, shell=True)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# fixme: aarch64 darwin needs something like `-Lxxx -lpython3.12`

command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"

# logger.info(command)

os.system(command)

# fixme: aarch64 darwin needs something like `-Lxxx -lpython3.12`

if sys.platform == "darwin":

ldflags = "-dynamiclib -undefined dynamic_lookup"

else:

ldflags = "-shared -pthread -fPIC"

command = f"{cc} {ldflags} -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"

logger.info(command)

subprocess.check_call(command, shell=True)

🧰 Tools

🪛 Ruff (0.12.2)

765-765: Starting a process with a shell, possible injection detected

(S605)

🤖 Prompt for AI Agents

In setup.py around lines 762-765, the build command uses the Linux-style "-shared" which is incorrect on macOS; change the link flags for Darwin to use "-dynamiclib" and allow undefined symbols to be resolved by Python by adding "-undefined dynamic_lookup" (and keep other flags as-is or remove -fPIC if desired for macOS), so on macOS construct the command with -dynamiclib and -undefined dynamic_lookup instead of -shared before invoking os.system(command).

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

setup.py (2)

674-683: Copy macOS-built Python extension modules (.so) too; not just .dylib.

On macOS, C extensions are typically .so. Include both patterns.

-import glob
-files = glob.glob("*.so" if IS_LINUX else "*.dylib")
+import glob
+files = glob.glob("*.so")
+if not IS_LINUX:
+    files += glob.glob("*.dylib")

839-854: CMake GPU toggles: explicitly disable unused backends and honor USE_CUDA flag.

In Metal branch, also set USE_CUDA OFF.
In CUDA branch, respect USE_CUDA rather than presence of CUDA_HOME alone.

-        if USE_METAL:
+        if USE_METAL:
             content_lines += [
                 "set(USE_METAL ON)",
                 "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
             ]
         elif USE_ROCM:
@@
-        elif CUDA_HOME:
+        elif USE_CUDA and CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]
+        else:
+            content_lines += [
+                "set(USE_METAL OFF)",
+                "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
+            ]

♻️ Duplicate comments (2)

setup.py (2)

36-44: Boolean env parsing still mishandles empty strings and unrecognized values; treat true/false explicitly.

Current truthiness gate skips empty strings; also missing yes/no aliases. Use explicit parsing.

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default

753-761: Fix Darwin link flags for Cython artifact and avoid shell-based os.system.

Use -dynamiclib and -undefined dynamic_lookup on macOS; prefer subprocess.check_call without shell.

-                            cython = get_cython_compiler()
+                            cython = get_cython_compiler()
                             if cython is None:
                                 raise Exception("Cython is not installed, please install it first.")
-                            os.system(f"{cython} {cython_wrapper_path} --cplus -o {source_path}")
+                            subprocess.check_call([cython, str(cython_wrapper_path), "--cplus", "-o", str(source_path)])
                             python_include_path = sysconfig.get_path("include")
                             cc = get_cplus_compiler()
-                            if MAYBE_METAL:
-                                cc += ' -Wl,-undefined,dynamic_lookup'
-                            command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
-                            logger.info(command)
-                            os.system(command)
+                            if sys.platform == "darwin":
+                                cmd = [
+                                    cc, "-dynamiclib", "-undefined", "dynamic_lookup",
+                                    "-fwrapv", "-O2", "-Wall", "-fno-strict-aliasing",
+                                    "-I", python_include_path, str(source_path), "-o", str(temp_path)
+                                ]
+                            else:
+                                cmd = [
+                                    cc, "-shared", "-pthread", "-fPIC",
+                                    "-fwrapv", "-O2", "-Wall", "-fno-strict-aliasing",
+                                    "-I", python_include_path, str(source_path), "-o", str(temp_path)
+                                ]
+                            logger.info(" ".join(cmd))
+                            subprocess.check_call(cmd)

🧹 Nitpick comments (3)

setup.py (3)
56-66: Refine Metal default detection; avoid platform.mac_ver()[2] and gate by OS/arch/version.

Arm64-only check is brittle. Prefer feature/OS checks and MPS minimum macOS.
-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+def _is_apple_silicon_macos(min_ver: str = "12.3") -> bool:
+    if platform.system() != "Darwin":
+        return False
+    try:
+        from packaging.version import Version
+        mac_ver = platform.mac_ver()[0] or "0"
+        return platform.machine().lower() in ("arm64", "aarch64") and Version(mac_ver) >= Version(min_ver)
+    except Exception:
+        return platform.machine().lower() in ("arm64", "aarch64")
+MAYBE_METAL = _is_apple_silicon_macos()
Also consider centralizing this helper (shared with target selection) to keep detection consistent across files.

920-930: Optional: expand classifiers/platforms for Metal/macOS builds.

Consider adding macOS classifier and GPU backend markers reflecting Metal when enabled.

46-69: Optional: document env var precedence and defaults in README.

Clarify how USE_CUDA/USE_ROCM/USE_METAL interact and defaults per platform to reduce surprise.

I can draft a short “Build matrix and env vars” section for README if helpful.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c42d7eea70d878a010888bc4944585c059109e7 and 7037ce81e29c74ac7a31a48b1e795aca5254e496.

📒 Files selected for processing (1)

setup.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

setup.py (1)

tilelang/env.py (1)

get (136-139)

🪛 Ruff (0.12.2)

setup.py

108-110: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-metal
GitHub Check: format-check
GitHub Check: format-check

🔇 Additional comments (3)

setup.py (3)

400-401: LGTM: Linux-only rpath patch guard.

70-86: LGTM: platform-specific prebuilt artifact lists and .dSYM cleanup.

619-623: LGTM: Extension kwargs passthrough for CMakeExtension.

setup.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

testing/python/metal/test_metal_codegen.py (1)

52-53: Tensor dimensions are correct for matrix multiplication.

The tensor dimensions in the test are properly set up for matrix multiplication A @ B:

a has shape (M, K)

b has shape (K, N)

c has shape (M, N)

This correctly implements the mathematical operation C = A × B where the inner dimension K is consistent.

Also applies to: 55-57

🧹 Nitpick comments (2)

testing/python/metal/test_metal_codegen.py (2)
18-20: Consider memory alignment for shared memory allocations.

The shared memory allocations for A_shared and B_shared may benefit from memory alignment optimizations to improve performance on Metal. Consider aligning the allocation sizes to powers of 2 or Metal's preferred alignment boundaries.

79-80: Add error message when MPS is not available.

The test silently does nothing when MPS is not available. Consider adding an informative message or skip marker to clarify why tests aren't running.
 if __name__ == "__main__":
     if torch.mps.is_available():
         tilelang.testing.main()
+    else:
+        print("MPS not available. Skipping Metal backend tests.")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7037ce81e29c74ac7a31a48b1e795aca5254e496 and 5dafa7782e286ebec4ba8a41cc41060259260d34.

📒 Files selected for processing (1)

testing/python/metal/test_metal_codegen.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

testing/python/metal/test_metal_codegen.py (6)

tilelang/jit/__init__.py (1)

jit (241-314)

tilelang/language/allocate.py (2)

alloc_shared (21-36)

alloc_fragment (53-64)

tilelang/language/fill.py (1)

clear (24-48)

tilelang/language/pipeline.py (1)

Pipelined (9-46)

tilelang/language/copy.py (1)

copy (84-152)

tilelang/jit/kernel.py (1)

kernel_source (470-471)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-amd
GitHub Check: build-test-metal
GitHub Check: bot-task

🔇 Additional comments (2)

testing/python/metal/test_metal_codegen.py (2)

63-63: Good test coverage for kernel source verification.

The assertion that kernel_source is not None properly validates that the Metal backend generates source code as expected.

25-26: Verify coalesced_width=2 for Metal backend.
T.copy uses coalesced_width=2 in testing/python/metal/test_metal_codegen.py (lines 25–26, 31). Confirm that 2 is the optimal coalescing width for the Metal backend and the tensor element type; if not, update to the backend-specific value or make it configurable.

coderabbitai · 2025-09-13T04:24:51Z

testing/python/metal/test_metal_codegen.py

+
+
+def test_gemm_float16():
+    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1)


🛠️ Refactor suggestion

High tolerance values for float16 and int32 tests.

The tolerance value of atol=1 for float16 and int32 tests seems quite high. For int32, the results should match exactly (atol=0), and for float16, a smaller tolerance might be more appropriate to catch potential precision issues.

def test_gemm_float16(): - assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1) + assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1e-2) def test_gemm_int32(): - assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='int32', atol=1) + assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='int32', atol=0)

Also applies to: 75-75

🤖 Prompt for AI Agents

In testing/python/metal/test_metal_codegen.py around lines 71 and 75, the tests use an overly large absolute tolerance (atol=1) for float16 and int32; change the int32 assertions to use atol=0 (exact match) and tighten the float16 tolerance (e.g., atol=1e-2 or another small value appropriate for float16 precision) so the tests detect precision regressions while still allowing expected float16 rounding error.

oraluben · 2025-10-06T07:14:10Z

Github hosted runner has a 2000 minutes limit per month, and macos runner even applies a 10x factor of its time.

By default, macos aarch runner only have 3 CPUs so building full tvm and tilelang will instantly run that out.

Unfortunately ccache is not an optional optimization, but necessary to make the test works.

(Actually that's about ~100 runs per month, which does not sounds very sufficient. Should we only run Metal test on main?)

I also uses uv to get a faster venv preparation.

douyixuan · 2025-10-20T12:13:57Z

Currently metal target only support tl.jit(execution_backend="torch")

* Reset * Fix other CUDA issue * fmt * fmt * fix cuda error * fix * fix * fmt * cleanup * fix * remove copyright * trivial update * readme update * lint fix --------- Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com> Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

oraluben force-pushed the metal branch 3 times, most recently from 68f58dd to 82dbdc0 Compare September 12, 2025 03:37

oraluben marked this pull request as ready for review September 12, 2025 03:49

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

oraluben added a commit to oraluben/tvm that referenced this pull request Sep 12, 2025

Clear ext_lib_dll_names for macOS platform

207b705

Removed external library DLL names for macOS. found during tile-ai/tilelang#799 cc @LeiWang1999

oraluben mentioned this pull request Sep 12, 2025

Clear ext_lib_dll_names for macOS platform apache/tvm#18304

Merged

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

setup.py Show resolved Hide resolved

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

FrozenGene pushed a commit to apache/tvm that referenced this pull request Sep 12, 2025

Clear ext_lib_dll_names for macOS platform (#18304)

85dc1d7

Removed external library DLL names for macOS. found during tile-ai/tilelang#799 cc @LeiWang1999

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

.github/workflows/metal_ci.yml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

oraluben force-pushed the metal branch 3 times, most recently from 9f7474c to 98336a3 Compare September 12, 2025 10:21

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

.github/workflows/metal_ci.yml Outdated Show resolved Hide resolved

format.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

coderabbitai bot reviewed Sep 13, 2025

View reviewed changes

setup.py Show resolved Hide resolved

setup.py Show resolved Hide resolved

coderabbitai bot reviewed Sep 13, 2025

View reviewed changes

trivial update

79b1941

oraluben force-pushed the metal branch from 17302f7 to 79b1941 Compare October 6, 2025 07:31

LeiWang1999 previously approved these changes Oct 7, 2025

View reviewed changes

Merge branch 'main' of https://github.com/tile-ai/tilelang into metal

41d579e

LeiWang1999 dismissed their stale review via 41d579e October 7, 2025 05:39

readme update

19fd7dc

LeiWang1999 previously approved these changes Oct 7, 2025

View reviewed changes

lint fix

2451073

LeiWang1999 dismissed their stale review via 2451073 October 7, 2025 06:39

LeiWang1999 merged commit 7fb0677 into tile-ai:main Oct 7, 2025
7 of 8 checks passed

oraluben deleted the metal branch October 7, 2025 07:18

coderabbitai bot mentioned this pull request Oct 9, 2025

[Build] Migrate to scikit-build-core #939

Merged

21 tasks

oraluben mentioned this pull request Oct 14, 2025

[CI][Refactor] Merge test CI workflow files into one #973

Merged

This was referenced Oct 24, 2025

[Language] Initial version of tilelang frontend v2 #1120

Merged

[BugFix] alloc_var init failed to handle complex expression #1144

Merged

This was referenced Nov 12, 2025

[Fix] Fix a type that make wrong T.macro backtrace #1234

Merged

[Language] Add type stubs for tir op #1239

Merged

coderabbitai bot mentioned this pull request Nov 14, 2025

[FFI] Use tvm ffi as the default execution backend #1259

Merged

This was referenced Nov 21, 2025

[Feat] Add missing support for uint32x2, add unsigned implicit cast in bitwise op, add T.Ref as macro annotation #1302

Closed

[Fix] Remove unused let_bindings_ in CodeGenC to fix #1300 #1305

Merged

[Fix] Fix frame scope error in T.macro #1308

Merged

		if is_metal_target(target):
		assert execution_backend == 'torch', 'Currently metal target only support `tl.jit(execution_backend="torch")`'

		self.kernel_name = func_or_mod.__name__ + '_kernel'
		self.verbose = verbose

-        self.kernel_name = func_or_mod.__name__ + '_kernel'
-        self.verbose = verbose
+        self.kernel_global_source = kernel_global_source
+        if hasattr(func_or_mod, '__name__'):
+            self.kernel_name = func_or_mod.__name__ + '_kernel'
+        elif hasattr(func_or_mod, 'get_global_vars'):
+            # Handle IRModule case
+            global_vars = func_or_mod.get_global_vars()
+            if global_vars:
+                self.kernel_name = global_vars[0].name_hint + '_kernel'
+            else:
+                raise ValueError("Cannot determine kernel name from IRModule")
+        else:
+            raise ValueError(f"Cannot determine kernel name from {type(func_or_mod)}")
+        self.verbose = verbose


		if self._kernel is None:

		_kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)

-            _kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)
+            try:
+                compiled_shader = torch.mps.compile_shader(self.kernel_global_source)
+                if not hasattr(compiled_shader, self.kernel_name):
+                    raise AttributeError(f"Compiled shader does not contain kernel '{self.kernel_name}'")
+                _kernel = getattr(compiled_shader, self.kernel_name)
+            except Exception as e:
+                raise RuntimeError(f"Failed to compile Metal shader: {e}")

-      run: uv pip install -r requirements-test.txt -r requirements-build.txt
+      run: |
+        set -e
+        [[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt
+        [[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt



		def test_gemm_float16():
		assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1)

[Backend] Add metal backend #799

[Backend] Add metal backend #799

Uh oh!

Conversation

oraluben commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

oraluben commented Sep 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

oraluben commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

oraluben commented Oct 6, 2025 •

edited

Loading