Skip to content

Conversation

@oraluben
Copy link
Contributor

@oraluben oraluben commented Sep 10, 2025

Add metal backend via tvm's metal codegen and torch's torch.mps.compile_shader, and create a new torch backend, who only supports MPS for now. Some tasks are left for future works.

Performance numbers:

PyTorch's MPS backend uses matrixMultiplicationWithPrimaryTensor by default, which is faster than default implementation.
For pytorch's native MPS mm impl (enable with PYTORCH_MPS_PREFER_METAL=1), we can observe significant speedup.

In general, tilelang is 2-3x faster than torch native impl and 2-3x slower than MPSGraph impl. tilelang impl could be optimized via implementating T.gemm by matrixMultiplicationWithPrimaryTensor.

Outliners in the table below seem to be error and disappear after rerun.

config                     tl_run1    tl_run2        torch    torch_native
----------------------  ----------  ---------  -----------  --------------
1024_1024_1024,float32    0.203224   0.119603    0.0732187        0.454423
1024_1024_1024,float16    0.118059   0.171631    0.0509571        0.329795
1024_1024_2048,float32    0.21623    0.277052    0.0956315        1.01536
1024_1024_2048,float16    0.163326   0.245721    0.0952914        0.738723
1024_1024_4096,float32    0.364069   0.58848     0.18605          2.02241
1024_1024_4096,float16    0.321528   0.519458    0.164556         1.51384
1024_2048_1024,float32    0.17366    0.236656    0.0963473        0.991771
1024_2048_1024,float16    0.174649   0.251009    0.0902327        0.663332
1024_2048_2048,float32    0.350103   0.55103     0.177765         1.99991
1024_2048_2048,float16    0.322436   0.529719    0.160975         1.3848
1024_2048_4096,float32    0.711705   1.1608      0.3412           3.81918
1024_2048_4096,float16    0.649519   1.15448     0.308582         2.75274
1024_4096_1024,float32    0.348693   0.502921    0.17556          1.92968
1024_4096_1024,float16    0.320613   0.553743    0.162928         1.35127
1024_4096_2048,float32    0.705229   1.10684     0.335839         3.74767
1024_4096_2048,float16    0.634583   1.01857     0.305798         2.6813
1024_4096_4096,float32    1.42189    2.32234     0.657488         7.39324
1024_4096_4096,float16    1.23835    2.05512     0.589203         5.62581
2048_1024_1024,float32    0.172703   0.229385    0.0933813        0.96074
2048_1024_1024,float16    0.23833    0.337618    0.0884824        0.637932
2048_1024_2048,float32    0.349552   0.551636    0.174772         2.06767
2048_1024_2048,float16    0.309059   0.555771    0.159407         1.44814
2048_1024_4096,float32    0.705804   1.17274     0.341789         3.78212
2048_1024_4096,float16    0.623397   1.36904     0.30442          2.81867
2048_2048_1024,float32    0.340474   0.526456    0.176063         2.03803
2048_2048_1024,float16    0.474886   0.787487    0.159341         1.40929
2048_2048_2048,float32    0.702884   1.19228     0.347602       264.746
2048_2048_2048,float16    0.617617   1.03019     0.316195         2.74656
2048_2048_4096,float32    1.42331    2.59355     0.673089         7.18734
2048_2048_4096,float16    1.25659    2.48737     0.587454         5.39229
2048_4096_1024,float32    0.698205   1.09426     0.342761         3.74332
2048_4096_1024,float16    0.610077   1.15637     0.302213         3.20815
2048_4096_2048,float32    1.41329    2.46777     0.670037         7.25803
2048_4096_2048,float16    1.39938    2.26266     0.5874           5.24247
2048_4096_4096,float32    2.96685    3.01786     1.3243         676.942
2048_4096_4096,float16    2.55003    2.4255      1.16313          6.98329
4096_1024_1024,float32    0.346517   0.346101  267.917            1.24304
4096_1024_1024,float16    0.301433   0.302083    0.157588         0.881915
4096_1024_2048,float32    0.921969   0.691183    0.330206         2.44251
4096_1024_2048,float16    1.25368    0.609252    0.613118         1.76924
4096_1024_4096,float32    2.56901    1.37706     1.2552           4.88777
4096_1024_4096,float16    2.15145    1.21733     1.08383          3.47484
4096_2048_1024,float32    1.62391    0.675179    0.539141         2.43388
4096_2048_1024,float16    1.08711    0.599315    0.604424         1.7381
4096_2048_2048,float32    2.83558    1.36514     1.05902          4.84539
4096_2048_2048,float16    4.39186    1.21936     1.17894          3.50558
4096_2048_4096,float32   13.5588     2.83373     4.48778          9.69309
4096_2048_4096,float16    4.50082    2.42589     1.63996          6.96601
4096_4096_1024,float32    2.2144     1.36068     1.12703          4.85186
4096_4096_1024,float16  209.909      1.20646     0.910777         3.47439
4096_4096_2048,float32    4.83127    2.76113     2.04769          9.67094
4096_4096_2048,float16    4.50811    2.40558     1.90847          6.92157
4096_4096_4096,float32   10.6532     5.53237     4.63781         19.3358
4096_4096_4096,float16    8.64457    4.83226     4.26353         13.8529

script: https://gist.github.com/oraluben/89e54b0b687a1fa483c38429442a6d1e

TODOs:

  • Fix hard-coded torch.mps device and unify cuda with torch.accelerator (or we should use cuda by default and do this later?)
  • Benchmark gemm against pytorch/mlx (need to compile a patched version of pytorch due to a bug)
  • Fix bug, current add kernel failed when size > 1024 (invalid launch issue of wrong #thread)
  • Use proper metal/metallib backend
  • Add mac aarch64 cicd

Future work:

Summary by CodeRabbit

  • New Features

    • Apple Metal (MPS) backend with a Torch-backed execution path and Metal kernel adapter; Metal target exposed broadly.
  • Improvements

    • Automatic Metal detection and macOS-aware build/link behavior.
    • Dynamic runtime resolution of prebuilt TVM libraries.
    • Centralized device detection, broader backend warnings, and an additional JIT backend option (nvrtc).
  • Tests

    • Added Metal-targeted GEMM verification tests.
  • Chores

    • Metal install script and macOS CI workflow.
  • Documentation

    • README updated announcing Metal support.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Add Apple Metal (MPS) support across build, JIT, adapters, lowering, runtime, tests, CI and packaging; resolve prebuilt TVM library locations at configure time; add device utilities and generalize profiler/autotuner/cache behavior for CUDA and MPS backends.

Changes

Cohort / File(s) Summary
Build & CI
CMakeLists.txt, install_metal.sh, .github/workflows/metal_ci.yml, setup.py
Resolve prebuilt TVM libs via find_library variables; conditionally collect/link Metal sources; add install_metal.sh; add macOS Metal CI workflow; add platform/backend flags and gating in setup.py.
New Metal runtime source
src/target/rt_mod_metal.cc
Add placeholder Metal runtime source file (included when USE_METAL enabled).
Targets & device utilities
tilelang/utils/target.py, tilelang/utils/device.py, tilelang/contrib/cc.py
Add Metal target detection and availability checks; add IS_CUDA/IS_MPS and get_current_device(); add is_darwin() helper.
Lowering / TVM interaction
tilelang/engine/lower.py
Add Metal branch in device_codegen_without_compile calling tvm.ffi.get_global_func("target.build.metal").
JIT core & kernel flow
tilelang/jit/__init__.py, tilelang/jit/kernel.py
Enforce Metal targets require execution_backend="torch"; expand allowed backends to include "nvrtc" and "torch"; route "torch" backend to Metal adapter.
JIT adapters & wrappers
tilelang/jit/adapter/__init__.py, tilelang/jit/adapter/utils.py, tilelang/jit/adapter/wrapper.py, tilelang/jit/adapter/cython/adapter.py, tilelang/jit/adapter/torch/__init__.py, tilelang/jit/adapter/torch/metal.py
Add is_metal_target; implement MetalKernelAdapter and TLMetalSourceWrapper; add macOS linker flag selection; route cython adapter device to MPS; expose Metal adapter via torch package.
Architecture modules
tilelang/carver/arch/__init__.py, tilelang/carver/arch/metal.py, tilelang/carver/arch/*
Add METAL arch class and is_metal_arch; auto-infer Metal via torch.mps.is_available(); standardize __all__ exports across arch modules.
Testing
testing/python/metal/test_metal_codegen.py, tilelang/testing/__init__.py
Add Metal GEMM tests targeting MPS; explicitly expose requires_metal and testing symbols via __all__.
Runtime utilities, profiler, autotuner, cache, tensor
tilelang/utils/tensor.py, tilelang/utils/device.py, tilelang/profiler/bench.py, tilelang/autotuner/tuner.py, tilelang/cache/kernel_cache.py
Introduce device helper usage; generalize profiler/timing for CUDA/MPS; ensure per-device compile context in autotuner; broaden kernel_cache disk-skip to include "torch"; tensor utilities use get_current_device().

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant JIT as tilelang.jit
  participant WR as TL Wrapper
  participant LWR as Lowering
  participant TVM as TVM (target.build.metal)
  participant AD as MetalKernelAdapter
  participant MPS as torch.mps runtime

  Dev->>JIT: jit(func, target="metal", execution_backend="torch")
  JIT->>JIT: validate is_metal_target -> require backend=="torch"
  JIT->>WR: wrap(scheduled_ir, source, target=metal)
  WR-->>JIT: TLMetalSourceWrapper
  JIT->>LWR: device_codegen_without_compile(target=metal)
  LWR->>TVM: call target.build.metal(device_mod, target)
  TVM-->>JIT: device_mod + kernel_source
  JIT->>AD: create MetalKernelAdapter(func_or_mod, device_mod, kernel_source)
  Dev->>AD: call kernel(...)
  AD->>AD: on first call: torch.mps.compile_shader(kernel_source)
  AD->>MPS: launch compiled kernel with threads/group_size
  MPS-->>Dev: result
  note over AD: compiled launcher cached for subsequent calls
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

enhancement

Suggested reviewers

  • tzj-fxz

Poem

I thump my paw on macOS ground,
New Metal paths and kernels found.
Torch compiles shaders, threads align,
MPS hops swift — the carrots shine. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “[Backend] Add metal backend” succinctly and accurately summarizes the primary change of integrating a Metal backend into the project, directly reflecting the main focus of the extensive modifications without introducing extraneous details. It is concise, clear, and enables teammates to quickly understand the core purpose of the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19fd7dc and 2451073.

📒 Files selected for processing (1)
  • tilelang/profiler/bench.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tilelang/profiler/bench.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-test-metal
  • GitHub Check: build-test-amd
  • GitHub Check: format-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

@oraluben oraluben force-pushed the metal branch 3 times, most recently from 68f58dd to 82dbdc0 Compare September 12, 2025 03:37
@oraluben oraluben marked this pull request as ready for review September 12, 2025 03:49
@oraluben
Copy link
Contributor Author

While there're tasks remain, the main implementation is ready for review @LeiWang1999

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (14)
tilelang/carver/arch/cdna.py (1)

35-38: Sort all (RUF022).

Keep exports alphabetized for consistency and to satisfy Ruff.

Apply:

-__all__ = [
-    'is_cdna_arch',
-    'CDNA',
-]
+__all__ = [
+    'CDNA',
+    'is_cdna_arch',
+]
tilelang/carver/arch/cpu.py (1)

23-26: Sort all (RUF022).

Align with other arch modules.

-__all__ = [
-    'is_cpu_arch',
-    'CPU',
-]
+__all__ = [
+    'CPU',
+    'is_cpu_arch',
+]
tilelang/contrib/cc.py (1)

93-95: Add typing/docstring to is_mac().

Small polish; helps usage elsewhere.

-def is_mac():
-    return platform.system() == 'Darwin'
+def is_mac() -> bool:
+    "True if running on macOS (Darwin)."
+    return platform.system() == "Darwin"
tilelang/carver/arch/metal.py (1)

12-15: Sort all (RUF022).

Match the pattern used in other arch modules.

-__all__ = [
-    'is_metal_arch',
-    'METAL',
-]
+__all__ = [
+    'METAL',
+    'is_metal_arch',
+]
tilelang/carver/arch/cuda.py (1)

150-159: Sort all (RUF022).

Keep exports alphabetized.

-__all__ = [
-    'is_cuda_arch',
-    'is_volta_arch',
-    'is_ampere_arch',
-    'is_ada_arch',
-    'is_hopper_arch',
-    'is_tensorcore_supported_precision',
-    'has_mma_support',
-    "CUDA",
-]
+__all__ = [
+    "CUDA",
+    'has_mma_support',
+    'is_ada_arch',
+    'is_ampere_arch',
+    'is_cuda_arch',
+    'is_hopper_arch',
+    'is_tensorcore_supported_precision',
+    'is_volta_arch',
+]
tilelang/utils/target.py (3)

1-1: Import platform.machine() for fallback.

Some Python builds return empty arch from mac_ver(); fall back to machine().

-from platform import mac_ver
+from platform import mac_ver, machine

46-52: Harden Metal detection.

Use machine() as a fallback when mac_ver()[2] is empty.

-def check_metal_availability() -> bool:
-    mac_release, _, arch = mac_ver()
-    if not mac_release:
-        return False
-    # todo: check torch version?
-    return arch == 'arm64'
+def check_metal_availability() -> bool:
+    mac_release, _, arch = mac_ver()
+    if not mac_release:
+        return False
+    # Fallback for environments where mac_ver() doesn't report machine.
+    arch = arch or machine()
+    return arch == "arm64"

84-87: Error message: say “Metal” (and keep it short per TRY003).

Also consider updating determine_target docstring to mention Metal.

-        else:
-            raise ValueError("No CUDA or HIP or MPS available on this system.")
+        else:
+            raise ValueError("No CUDA, HIP, or Metal available on this system.")

Docstring tweak (outside hunk):

"""
Determine the appropriate target for compilation (CUDA, HIP, Metal, or manual selection).
"""
tilelang/engine/lower.py (1)

184-186: Parity: add Metal branch to device_codegen as well (or explicitly gate).

device_codegen_without_compile supports "metal", but device_codegen lacks a corresponding branch; enabling device compile for Metal will currently raise.

Proposed addition (outside this hunk) in device_codegen():

elif target.kind.name == "metal":
    device_mod = tvm.ffi.get_global_func("target.build.metal")(device_mod, target)
install_metal.sh (1)

20-22: Consider more conservative parallelism settings

Using half the logical CPU cores might be too aggressive for some systems. Consider using a safer default or allowing configuration via environment variable.

-CORES=$(sysctl -n hw.logicalcpu)
-MAKE_JOBS=$(( CORES / 2 ))
-make -j${MAKE_JOBS}
+CORES=$(sysctl -n hw.logicalcpu)
+MAKE_JOBS=${MAKE_JOBS:-$(( CORES / 2 ))}
+echo "Building with ${MAKE_JOBS} parallel jobs"
+make -j${MAKE_JOBS}
tilelang/carver/arch/__init__.py (1)

40-55: Sort all list for better maintainability

The static analysis correctly identified that all should be sorted for consistency and easier maintenance.

 __all__ = [
-    'is_cpu_arch',
-    'is_cuda_arch',
-    'is_volta_arch',
-    'is_ampere_arch',
-    'is_ada_arch',
-    'is_hopper_arch',
-    'is_tensorcore_supported_precision',
-    'has_mma_support',
-    'is_cdna_arch',
-    'is_metal_arch',
+    'CDNA',
+    'CPU',
     'CUDA',
-    'CDNA',
     'METAL',
-    'CPU',
+    'has_mma_support',
+    'is_ada_arch',
+    'is_ampere_arch',
+    'is_cdna_arch',
+    'is_cpu_arch',
+    'is_cuda_arch',
+    'is_hopper_arch',
+    'is_metal_arch',
+    'is_tensorcore_supported_precision',
+    'is_volta_arch',
 ]
tilelang/jit/adapter/wrapper.py (1)

1008-1028: Consider adding docstring and error handling.

The new TLMetalSourceWrapper class lacks documentation and error handling. Consider adding:

  1. A class docstring explaining its purpose
  2. Type hints validation
  3. Error handling for Metal-specific operations
 class TLMetalSourceWrapper(object):
+    """
+    A wrapper class for the TileLang Metal backend.
+    
+    This wrapper handles Metal shader code generation for Apple Silicon GPUs.
+    Unlike CUDA/HIP wrappers, it performs minimal processing as Metal compilation
+    is handled downstream by PyTorch MPS.
+    """
 
     def __init__(self,
                  scheduled_ir_module: IRModule,
                  source: str,
                  target: Target,
                  device_mod: Optional[IRModule] = None,
                  host_mod: Optional[IRModule] = None,
                  pass_configs: Optional[Dict[str, Any]] = None):
+        if not is_metal_target(target):
+            raise ValueError(f"Expected Metal target, got {target.kind.name}")
         self.mod = scheduled_ir_module
         self.target = target
         self.source = source
         self.pass_configs = pass_configs
         self.device_mod = device_mod
         self.host_mod = host_mod
         self.lib_code = self.update_lib_code(source)
 
     def update_lib_code(self, code: str):
+        """Update and return the library code without modification.
+        
+        Metal shaders don't require the host function wrapping that
+        CUDA/HIP backends need.
+        """
         self.lib_code = code
         return self.lib_code
tilelang/jit/adapter/torch/metal.py (1)

14-26: Clean up commented parameters or document why they're excluded.

The constructor has many commented-out parameters. Either remove them entirely or add a comment explaining why they're excluded from the Metal implementation.

     def __init__(
         self,
-        #  params: List[KernelParam],
-        #  result_idx: List[int],
-        #  target: Union[str, Target],
         func_or_mod: Union[tir.PrimFunc, tvm.IRModule],
-        #  host_mod: Optional[tvm.IRModule] = None,
         device_mod: Optional[tvm.IRModule] = None,
         kernel_global_source: Optional[str] = None,
         verbose: bool = False,
-        #  pass_configs: Optional[Dict[str, Any]] = None,
-        #  compile_flags: Optional[List[str]] = None
     ):
+        """Initialize MetalKernelAdapter for PyTorch MPS execution.
+        
+        Note: Unlike other adapters, Metal doesn't require params, result_idx,
+        or compilation flags as these are handled by PyTorch MPS.
+        
+        Args:
+            func_or_mod: The function or module to compile
+            device_mod: Device module containing kernel metadata
+            kernel_global_source: Metal shader source code
+            verbose: Enable verbose logging
+        """
testing/python/metal/test_metal_codegen.py (1)

70-72: Consider more appropriate tolerance for float16.

An absolute tolerance of 1.0 for float16 seems too permissive for GEMM accuracy testing. Consider using a relative tolerance or a smaller absolute tolerance.

 def test_gemm_float16():
-    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1)
+    # Use relative tolerance for float16 to better handle different magnitude results
+    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1e-2)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 409ab83 and 0eba92f6ad86fca30149d937543b01069784eafd.

📒 Files selected for processing (19)
  • CMakeLists.txt (3 hunks)
  • install_metal.sh (1 hunks)
  • testing/python/metal/test_metal_codegen.py (1 hunks)
  • tilelang/cache/kernel_cache.py (1 hunks)
  • tilelang/carver/arch/__init__.py (3 hunks)
  • tilelang/carver/arch/cdna.py (1 hunks)
  • tilelang/carver/arch/cpu.py (1 hunks)
  • tilelang/carver/arch/cuda.py (1 hunks)
  • tilelang/carver/arch/metal.py (1 hunks)
  • tilelang/contrib/cc.py (2 hunks)
  • tilelang/engine/lower.py (1 hunks)
  • tilelang/jit/__init__.py (3 hunks)
  • tilelang/jit/adapter/__init__.py (1 hunks)
  • tilelang/jit/adapter/cython/adapter.py (3 hunks)
  • tilelang/jit/adapter/torch/metal.py (1 hunks)
  • tilelang/jit/adapter/utils.py (1 hunks)
  • tilelang/jit/adapter/wrapper.py (3 hunks)
  • tilelang/jit/kernel.py (4 hunks)
  • tilelang/utils/target.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
tilelang/jit/adapter/utils.py (1)
tilelang/language/ast/ir.py (1)
  • target (1682-1713)
tilelang/carver/arch/metal.py (2)
tilelang/carver/arch/arch_base.py (1)
  • TileDevice (4-37)
tilelang/carver/template/base.py (1)
  • arch (152-159)
tilelang/jit/adapter/cython/adapter.py (3)
tilelang/jit/adapter/utils.py (4)
  • is_cuda_target (51-52)
  • is_hip_target (55-56)
  • is_cpu_target (59-60)
  • is_metal_target (63-64)
tilelang/utils/target.py (1)
  • determine_target (54-96)
tilelang/contrib/cc.py (2)
  • get_cplus_compiler (68-90)
  • is_mac (93-94)
tilelang/jit/adapter/__init__.py (1)
tilelang/jit/adapter/torch/metal.py (1)
  • MetalKernelAdapter (12-67)
tilelang/jit/kernel.py (3)
tilelang/jit/__init__.py (1)
  • jit (241-314)
tilelang/jit/adapter/utils.py (1)
  • is_metal_target (63-64)
tilelang/jit/adapter/torch/metal.py (1)
  • MetalKernelAdapter (12-67)
tilelang/carver/arch/__init__.py (1)
tilelang/carver/arch/metal.py (1)
  • METAL (8-9)
tilelang/jit/__init__.py (2)
tilelang/jit/adapter/utils.py (1)
  • is_metal_target (63-64)
tilelang/language/ast/ir.py (1)
  • target (1682-1713)
testing/python/metal/test_metal_codegen.py (6)
tilelang/jit/__init__.py (1)
  • jit (241-314)
tilelang/language/allocate.py (2)
  • alloc_shared (21-36)
  • alloc_fragment (53-64)
tilelang/language/fill.py (1)
  • clear (24-48)
tilelang/language/pipeline.py (1)
  • Pipelined (9-46)
tilelang/language/copy.py (1)
  • copy (84-152)
tilelang/jit/kernel.py (1)
  • kernel_source (470-471)
tilelang/jit/adapter/torch/metal.py (2)
tilelang/jit/adapter/base.py (2)
  • BaseKernelAdapter (8-55)
  • _post_init (54-55)
tilelang/jit/adapter/nvrtc/adapter.py (1)
  • _convert_torch_func (240-241)
tilelang/jit/adapter/wrapper.py (4)
tilelang/jit/adapter/utils.py (1)
  • is_metal_target (63-64)
tilelang/jit/adapter/cython/adapter.py (1)
  • lib_code (511-513)
tilelang/jit/adapter/ctypes/adapter.py (1)
  • lib_code (281-283)
tilelang/jit/adapter/libgen.py (1)
  • update_lib_code (53-54)
🪛 Ruff (0.12.2)
tilelang/carver/arch/cpu.py

23-26: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/carver/arch/cuda.py

150-159: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/utils/target.py

87-87: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/carver/arch/metal.py

12-15: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/carver/arch/cdna.py

35-38: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/jit/adapter/__init__.py

5-5: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


6-6: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tilelang/carver/arch/__init__.py

2-2: from .cuda import * used; unable to detect undefined names

(F403)


3-3: from .cpu import * used; unable to detect undefined names

(F403)


4-4: from .cdna import * used; unable to detect undefined names

(F403)


5-5: from .metal import * used; unable to detect undefined names

(F403)


22-22: METAL may be undefined, or defined from star imports

(F405)


40-55: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)


41-41: is_cpu_arch may be undefined, or defined from star imports

(F405)


42-42: is_cuda_arch may be undefined, or defined from star imports

(F405)


43-43: is_volta_arch may be undefined, or defined from star imports

(F405)


44-44: is_ampere_arch may be undefined, or defined from star imports

(F405)


45-45: is_ada_arch may be undefined, or defined from star imports

(F405)


46-46: is_hopper_arch may be undefined, or defined from star imports

(F405)


47-47: is_tensorcore_supported_precision may be undefined, or defined from star imports

(F405)


48-48: has_mma_support may be undefined, or defined from star imports

(F405)


49-49: is_cdna_arch may be undefined, or defined from star imports

(F405)


50-50: is_metal_arch may be undefined, or defined from star imports

(F405)


51-51: CUDA may be undefined, or defined from star imports

(F405)


52-52: CDNA may be undefined, or defined from star imports

(F405)


53-53: METAL may be undefined, or defined from star imports

(F405)


54-54: CPU may be undefined, or defined from star imports

(F405)

tilelang/jit/adapter/torch/metal.py

44-44: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (9)
tilelang/carver/arch/metal.py (1)

8-9: Confirm TileDevice API expectations for METAL.

METAL is currently a stub. If any path calls TileDevice.get_avaliable_tensorintrin_shapes(), this will raise NotImplementedError. If that call path can occur, add a trivial override returning [] to avoid surprises.

Example (outside this hunk):

class METAL(TileDevice):
    def get_avaliable_tensorintrin_shapes(self):
        return []
tilelang/utils/target.py (1)

16-16: LGTM: add 'metal' to AVALIABLE_TARGETS.

This aligns the public set with the new Metal path.

tilelang/jit/adapter/__init__.py (1)

1-6: LGTM!

The addition of MetalKernelAdapter to the adapter exports is clean and follows the existing pattern.

tilelang/jit/adapter/utils.py (1)

63-65: LGTM!

The is_metal_target function follows the established pattern and correctly identifies Metal targets.

tilelang/jit/adapter/cython/adapter.py (2)

156-158: LGTM! Platform-specific linker flags handled correctly

The dynamic linker flag selection based on the platform is appropriate for handling undefined symbols across macOS and Linux.


455-456: LGTM! Correct device mapping for Metal target

The Metal target correctly maps to the "mps" device in PyTorch.

tilelang/jit/adapter/wrapper.py (1)

1068-1069: LGTM! Metal target routing is correctly integrated.

The Metal target detection and wrapper selection follows the existing pattern for other backends.

tilelang/jit/adapter/torch/metal.py (1)

61-61: Verify thread calculation logic.

The threads calculation multiplies block and grid dimensions element-wise. This seems unusual - typically threads would be just the block dimensions, and the total work would be block * grid.

Could you verify if this is the correct thread configuration for PyTorch MPS? The typical pattern would be:

  • threads = block dimensions (threads per group)
  • group_size = grid dimensions (number of groups)

The current implementation passes threads=[block[i] * grid[i] for i in range(3)] which seems to represent total threads across all groups.

testing/python/metal/test_metal_codegen.py (1)

79-80: Good practice: MPS availability check.

The check for MPS availability before running tests is appropriate for Metal backend testing.

Comment on lines +102 to 118
find_library(TVM_LIBRARY_LOCATION
NAMES tvm
HINTS "${TVM_PREBUILD_PATH}"
)
set_target_properties(tvm PROPERTIES
IMPORTED_LOCATION "${TVM_PREBUILD_PATH}/libtvm.so"
IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}"
INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for missing TVM libraries

The find_library commands should check if the libraries were actually found before proceeding.

 find_library(TVM_LIBRARY_LOCATION
   NAMES tvm
   HINTS "${TVM_PREBUILD_PATH}"
 )
+if(NOT TVM_LIBRARY_LOCATION)
+  message(FATAL_ERROR "Could not find TVM library in ${TVM_PREBUILD_PATH}")
+endif()
 set_target_properties(tvm PROPERTIES
   IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}"
   INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
find_library(TVM_LIBRARY_LOCATION
NAMES tvm
HINTS "${TVM_PREBUILD_PATH}"
)
set_target_properties(tvm PROPERTIES
IMPORTED_LOCATION "${TVM_PREBUILD_PATH}/libtvm.so"
IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}"
INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
)
find_library(TVM_LIBRARY_LOCATION
NAMES tvm
HINTS "${TVM_PREBUILD_PATH}"
)
if(NOT TVM_LIBRARY_LOCATION)
message(FATAL_ERROR "Could not find TVM library in ${TVM_PREBUILD_PATH}")
endif()
set_target_properties(tvm PROPERTIES
IMPORTED_LOCATION "${TVM_LIBRARY_LOCATION}"
INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
)
🤖 Prompt for AI Agents
In CMakeLists.txt around lines 102 to 109, the find_library call for TVM does
not verify that TVM_LIBRARY_LOCATION was found; add a check after find_library
such as if(NOT TVM_LIBRARY_LOCATION) message(FATAL_ERROR "Could not find tvm
library in ${TVM_PREBUILD_PATH}; please set TVM_PREBUILD_PATH or install tvm")
to stop configuration when missing, and only call set_target_properties for the
imported tvm target if the library path exists (or create the imported target
conditionally); ensure the error message includes the searched path for easier
debugging.

Comment on lines +111 to 127
find_library(TVM_RUNTIME_LIBRARY_LOCATION
NAMES tvm_runtime
HINTS "${TVM_PREBUILD_PATH}"
)
set_target_properties(tvm_runtime PROPERTIES
IMPORTED_LOCATION "${TVM_PREBUILD_PATH}/libtvm_runtime.so"
IMPORTED_LOCATION "${TVM_RUNTIME_LIBRARY_LOCATION}"
INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for missing TVM runtime library

Similar to the TVM library, the runtime library location should be validated.

 find_library(TVM_RUNTIME_LIBRARY_LOCATION
   NAMES tvm_runtime
   HINTS "${TVM_PREBUILD_PATH}"
 )
+if(NOT TVM_RUNTIME_LIBRARY_LOCATION)
+  message(FATAL_ERROR "Could not find TVM runtime library in ${TVM_PREBUILD_PATH}")
+endif()
 set_target_properties(tvm_runtime PROPERTIES
   IMPORTED_LOCATION "${TVM_RUNTIME_LIBRARY_LOCATION}"
   INTERFACE_INCLUDE_DIRECTORIES "${TVM_PREBUILD_PATH}/../include"
 )
🤖 Prompt for AI Agents
In CMakeLists.txt around lines 111 to 118, the find_library result for
TVM_RUNTIME_LIBRARY_LOCATION is not validated; add the same existence check and
failure handling used for the TVM library: after find_library, test if
TVM_RUNTIME_LIBRARY_LOCATION is NOTFOUND (or empty) and call message(FATAL_ERROR
"...unable to locate tvm_runtime library at ${TVM_PREBUILD_PATH}..." ) so
configuration stops with a clear error; only set_target_properties for
tvm_runtime and INTERFACE_INCLUDE_DIRECTORIES when the library was successfully
found (mirror the pattern used for the TVM library block).

Comment on lines +189 to 191
if execution_backend in ("dlpack", "torch"):
self.logger.warning("DLPack or torch backend does not support cache saving to disk.")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Include 'torch' in type hints and clarify log.

Execution backend Literal types in this file exclude 'torch' (class attr and method params). This will trip type-checkers now that you gate disk-cache on 'torch'.

Apply minimal log tweak here:

-        if execution_backend in ("dlpack", "torch"):
-            self.logger.warning("DLPack or torch backend does not support cache saving to disk.")
+        if execution_backend in ("dlpack", "torch"):
+            self.logger.warning("Backends 'dlpack' and 'torch' do not support saving cache to disk.")

Also update type hints elsewhere in this file (outside the changed hunk):

# line ~42
execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython"

# line ~71
execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython",

# line ~117
execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython",

# line ~323
execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython",
🤖 Prompt for AI Agents
In tilelang/cache/kernel_cache.py around lines 189-191, the warning message and
the execution_backend type hints omit 'torch', causing type-check failures and
unclear logs; update the warning to mention "DLPack or torch backend does not
support saving cache to disk" (or similar clearer phrasing) and add "torch" to
the Literal union type hints at the listed locations: ~42, ~71, ~117, ~323 so
those class attributes and method params read Literal["dlpack", "ctypes",
"cython", "nvrtc", "torch"] = "cython" (or the equivalent default), ensuring
consistency across the file.

Comment on lines +2 to +5
from .cuda import *
from .cpu import *
from .cdna import *
from .metal import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Wildcard imports reduce code clarity and maintainability

The wildcard imports make it difficult to track which symbols come from which module and can lead to namespace pollution. Consider explicit imports for better maintainability.

🧰 Tools
🪛 Ruff (0.12.2)

2-2: from .cuda import * used; unable to detect undefined names

(F403)


3-3: from .cpu import * used; unable to detect undefined names

(F403)


4-4: from .cdna import * used; unable to detect undefined names

(F403)


5-5: from .metal import * used; unable to detect undefined names

(F403)

🤖 Prompt for AI Agents
In tilelang/carver/arch/__init__.py around lines 2 to 5, the module currently
uses wildcard imports (from .cuda import * etc.), which pollutes the package
namespace and obscures which symbols are exported; replace these with explicit
imports of the public symbols you need from each submodule (or import the
modules themselves and re-export a curated __all__ list) so callers see a clear
API and accidental name collisions are avoided. Ensure you list each
function/class/constant you want to expose (or import submodules as namespaced
attributes) and update or add __all__ to reflect the exported names.

Comment on lines +78 to +79
if is_metal_target(target):
assert execution_backend == 'torch', 'Currently metal target only support `tl.jit(execution_backend="torch")`'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent backend naming

The assertion expects 'torch' backend but the public API and documentation use 'nvrtc'. This inconsistency is confusing.

The error message should clarify that Metal targets require the "nvrtc" backend (which internally uses torch for Metal):

 if is_metal_target(target):
-    assert execution_backend == 'torch', 'Currently metal target only support `tl.jit(execution_backend="torch")`'
+    assert execution_backend == 'nvrtc', 'Currently metal target only supports `tl.jit(execution_backend="nvrtc")`'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if is_metal_target(target):
assert execution_backend == 'torch', 'Currently metal target only support `tl.jit(execution_backend="torch")`'
if is_metal_target(target):
assert execution_backend == 'nvrtc', 'Currently metal target only supports `tl.jit(execution_backend="nvrtc")`'
🤖 Prompt for AI Agents
In tilelang/jit/__init__.py around lines 78-79, the assertion currently checks
for execution_backend == 'torch' but the public API/doc use 'nvrtc'; change the
assertion to require 'nvrtc' instead and update the assertion message to state
that Metal targets require the "nvrtc" backend (which internally uses torch) so
the message aligns with public API and avoids confusion.

Comment on lines +28 to +30
self.kernel_name = func_or_mod.__name__ + '_kernel'
self.verbose = verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential issue with kernel name generation.

The kernel name is generated by appending '_kernel' to the function name. This assumes func_or_mod has a __name__ attribute, which may not be true for all IRModule instances.

         self.kernel_global_source = kernel_global_source
-        self.kernel_name = func_or_mod.__name__ + '_kernel'
+        if hasattr(func_or_mod, '__name__'):
+            self.kernel_name = func_or_mod.__name__ + '_kernel'
+        elif hasattr(func_or_mod, 'get_global_vars'):
+            # Handle IRModule case
+            global_vars = func_or_mod.get_global_vars()
+            if global_vars:
+                self.kernel_name = global_vars[0].name_hint + '_kernel'
+            else:
+                raise ValueError("Cannot determine kernel name from IRModule")
+        else:
+            raise ValueError(f"Cannot determine kernel name from {type(func_or_mod)}")
         self.verbose = verbose
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.kernel_name = func_or_mod.__name__ + '_kernel'
self.verbose = verbose
self.kernel_global_source = kernel_global_source
if hasattr(func_or_mod, '__name__'):
self.kernel_name = func_or_mod.__name__ + '_kernel'
elif hasattr(func_or_mod, 'get_global_vars'):
# Handle IRModule case
global_vars = func_or_mod.get_global_vars()
if global_vars:
self.kernel_name = global_vars[0].name_hint + '_kernel'
else:
raise ValueError("Cannot determine kernel name from IRModule")
else:
raise ValueError(f"Cannot determine kernel name from {type(func_or_mod)}")
self.verbose = verbose
🤖 Prompt for AI Agents
In tilelang/jit/adapter/torch/metal.py around lines 28-29, kernel_name is set by
blindly using func_or_mod.__name__ which can fail for IRModule or other objects
without a __name__; change to safely derive a base name: try
getattr(func_or_mod, "__name__", None), then getattr(func_or_mod, "name", None),
and if both are missing fall back to a stable fallback such as
f"{type(func_or_mod).__name__}_{id(func_or_mod):x}"; sanitize the base to be a
valid identifier if needed and then set self.kernel_name = f"{base}_kernel".

Comment on lines +34 to +46
for var, func in device_mod.functions.items():
assert var.name_hint == self.kernel_name
thread_extent = func.attrs['thread_extent']
for tag, extent in thread_extent.items():
if "threadIdx" in tag:
self.block_info["xyz".index(tag[-1])] = extent
elif "blockIdx" in tag:
self.grid_info["xyz".index(tag[-1])] = extent
break
else:
raise AssertionError(f'no kernel with name {func_or_mod.__name__}')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix critical issue: iteration over device module functions doesn't handle multiple functions correctly.

The loop breaks after processing the first function regardless of whether it matches kernel_name. This could silently process the wrong kernel if multiple functions exist.

         for var, func in device_mod.functions.items():
-            assert var.name_hint == self.kernel_name
-            thread_extent = func.attrs['thread_extent']
-            for tag, extent in thread_extent.items():
-                if "threadIdx" in tag:
-                    self.block_info["xyz".index(tag[-1])] = extent
-                elif "blockIdx" in tag:
-                    self.grid_info["xyz".index(tag[-1])] = extent
-            break
+            if var.name_hint == self.kernel_name:
+                thread_extent = func.attrs.get('thread_extent', {})
+                for tag, extent in thread_extent.items():
+                    if "threadIdx" in tag:
+                        self.block_info["xyz".index(tag[-1])] = extent
+                    elif "blockIdx" in tag:
+                        self.grid_info["xyz".index(tag[-1])] = extent
+                break
         else:
-            raise AssertionError(f'no kernel with name {func_or_mod.__name__}')
+            raise AssertionError(f'No kernel with name {self.kernel_name} found in device module')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for var, func in device_mod.functions.items():
assert var.name_hint == self.kernel_name
thread_extent = func.attrs['thread_extent']
for tag, extent in thread_extent.items():
if "threadIdx" in tag:
self.block_info["xyz".index(tag[-1])] = extent
elif "blockIdx" in tag:
self.grid_info["xyz".index(tag[-1])] = extent
break
else:
raise AssertionError(f'no kernel with name {func_or_mod.__name__}')
for var, func in device_mod.functions.items():
if var.name_hint == self.kernel_name:
thread_extent = func.attrs.get('thread_extent', {})
for tag, extent in thread_extent.items():
if "threadIdx" in tag:
self.block_info["xyz".index(tag[-1])] = extent
elif "blockIdx" in tag:
self.grid_info["xyz".index(tag[-1])] = extent
break
else:
raise AssertionError(f'No kernel with name {self.kernel_name} found in device module')
🧰 Tools
🪛 Ruff (0.12.2)

44-44: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In tilelang/jit/adapter/torch/metal.py around lines 34 to 45, the loop over
device_mod.functions currently asserts the first function has the desired
kernel_name and breaks unconditionally, which can process the wrong function
when multiple functions exist; change the logic to search for a function whose
var.name_hint equals self.kernel_name (don't assert immediately), extract its
thread_extent and populate block_info/grid_info, break when found, and after the
loop raise an AssertionError if no matching kernel was found (use
self.kernel_name in the error message) so only the correct kernel is processed.


if self._kernel is None:

_kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for shader compilation.

The shader compilation could fail but there's no error handling. Also, the compiled shader might not contain the expected kernel.

-            _kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)
+            try:
+                compiled_shader = torch.mps.compile_shader(self.kernel_global_source)
+                if not hasattr(compiled_shader, self.kernel_name):
+                    raise AttributeError(f"Compiled shader does not contain kernel '{self.kernel_name}'")
+                _kernel = getattr(compiled_shader, self.kernel_name)
+            except Exception as e:
+                raise RuntimeError(f"Failed to compile Metal shader: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)
try:
compiled_shader = torch.mps.compile_shader(self.kernel_global_source)
if not hasattr(compiled_shader, self.kernel_name):
raise AttributeError(f"Compiled shader does not contain kernel '{self.kernel_name}'")
_kernel = getattr(compiled_shader, self.kernel_name)
except Exception as e:
raise RuntimeError(f"Failed to compile Metal shader: {e}")
🤖 Prompt for AI Agents
In tilelang/jit/adapter/torch/metal.py around line 54, wrap the call to
torch.mps.compile_shader and subsequent attribute access in a try/except and
validate the returned object contains the expected kernel; specifically, catch
exceptions from compile_shader, raise or log a clear error that includes the
original exception and the kernel name/source context, and after compilation
check with hasattr or explicit lookup that the compiled object exposes
self.kernel_name, raising a descriptive error if missing so callers know
compilation succeeded but the kernel was not found.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tilelang/jit/adapter/__init__.py (1)

5-6: Remove unused # noqa: F401 (RUF100) on new exports.

Ruff flags these as unused because F401 isn’t enabled; just drop the directives.

-from .nvrtc import NVRTCKernelAdapter  # noqa: F401
-from .torch import MetalKernelAdapter  # noqa: F401
+from .nvrtc import NVRTCKernelAdapter
+from .torch import MetalKernelAdapter

Optional: make exports explicit to avoid needing noqa anywhere and document the public API.

# add near the top of this module
__all__ = [
    "BaseKernelAdapter",
    "TorchDLPackKernelAdapter",
    "CtypesKernelAdapter",
    "CythonKernelAdapter",
    "NVRTCKernelAdapter",
    "MetalKernelAdapter",
]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0eba92f6ad86fca30149d937543b01069784eafd and 7514cf545a9b9ec3d999fc1b1f6bd098397d19e1.

📒 Files selected for processing (2)
  • tilelang/jit/adapter/__init__.py (1 hunks)
  • tilelang/jit/adapter/torch/__init__.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tilelang/jit/adapter/torch/init.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/jit/adapter/__init__.py (2)
tilelang/jit/adapter/nvrtc/adapter.py (1)
  • NVRTCKernelAdapter (30-246)
tilelang/jit/adapter/torch/metal.py (1)
  • MetalKernelAdapter (12-67)
🪛 Ruff (0.12.2)
tilelang/jit/adapter/__init__.py

5-5: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


6-6: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: bot-task

oraluben added a commit to oraluben/tvm that referenced this pull request Sep 12, 2025
Removed external library DLL names for macOS.

found during tile-ai/tilelang#799

cc @LeiWang1999
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
setup.py (3)

345-347: Wheel may miss runtime libs: include tilelang/lib/.so|.dylib in package_data.

Without this, copied libs might not land in wheels/sdists.

 package_data = {
-    "tilelang": ["py.typed", "*pyx"],
+    "tilelang": ["py.typed", "*pyx", "lib/*.so", "lib/*.dylib"],
 }

805-807: Do not pass CMAKE_CUDA_COMPILER unless CUDA is enabled.

On Metal-only/mac builds, this inserts a bogus nvcc path and breaks configure.

-        if not USE_ROCM:
-            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")
+        if USE_CUDA and CUDA_HOME:
+            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")

826-839: Respect USE_CUDA toggle in config.cmake generation.

Current logic enables CUDA whenever CUDA_HOME is set, even if USE_CUDA=False.

-        elif CUDA_HOME:
+        elif USE_CUDA and CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]
+        else:
+            content_lines += [
+                "set(USE_CUDA OFF)",
+                "set(USE_ROCM OFF)",
+            ]
🧹 Nitpick comments (9)
setup.py (9)

48-50: Harden Metal detection (Rosetta/process-arch pitfalls).

Use Darwin + machine for reliability.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+IS_DARWIN = platform.system() == 'Darwin'
+MAYBE_METAL = IS_DARWIN and platform.machine() == 'arm64'

52-61: Defaulting USE_CUDA=True on Linux is user-hostile (raises when CUDA_HOME missing).

Consider defaulting to False or detecting from CUDA_HOME post-import to avoid hard failures on non-GPU Linux dev boxes.

If you want a minimal tweak now, set default False:

-USE_CUDA = _read_bool_env("USE_CUDA", IS_LINUX and not USE_ROCM)
+USE_CUDA = _read_bool_env("USE_CUDA", False)

99-103: Tighten exception text (Ruff TRY003) and clarity.

Shorten the message; keep actionable content.

-    raise ValueError(
-        "CUDA support is enabled by default on linux if `USE_ROCM=False`," \
-        " but CUDA_HOME is not set or detected.")
+    raise ValueError("USE_CUDA=True but CUDA_HOME is not set or detected.")

105-108: Same as above: shorten/standardize error message.

-    raise ValueError(
-        "Failed to automatically detect CUDA or ROCM installation. Please set the CUDA_HOME or ROCM_HOME environment variable manually (e.g., export CUDA_HOME=/usr/local/cuda or export ROCM_HOME=/opt/rocm)."
-    )
+    raise ValueError("Neither CUDA_HOME nor ROCM_HOME detected on Linux; set one explicitly.")

482-489: Guard dSYM deletion.

Use isdir check to avoid accidental rmtree on files with matching names.

-            for dir in potential_dirs:
-                candidate = os.path.join(dir, item)
-                if os.path.exists(candidate):
-                    shutil.rmtree(candidate)
-                    break
+            for d in potential_dirs:
+                candidate = os.path.join(d, item)
+                if os.path.isdir(candidate):
+                    shutil.rmtree(candidate)
+                    break

664-664: Copy-from-glob likely no-ops in non-inplace builds.

CMake outputs to extdir/build_temp, not CWD. Either search extdir/build_temp or drop this block (TileLangBuilPydCommand already handles copying).


270-273: Handle missing CMake gracefully (None from shutil.which).

os.path.exists(None) raises TypeError; raise a clear FileNotFoundError instead.

-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
-        raise Exception("CMake is not installed, please install it first.")
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
+        raise FileNotFoundError("CMake is not installed or not on PATH.")

149-168: Make ROCm version detection robust if rocminfo is missing.

Without try/except, subprocess errors abort the build even when the cmake file can be parsed.

-def get_rocm_version():
+def get_rocm_version():
     """Get the ROCM version from rocminfo."""
-    rocm_output = subprocess.check_output(["rocminfo"], universal_newlines=True)
-    # Parse ROCM version from output
-    # Example output: ROCM version: x.y.z-...
-    match = re.search(r'ROCm Version: (\d+\.\d+\.\d+)', rocm_output)
-    if match:
-        return Version(match.group(1))
-    else:
+    try:
+        rocm_output = subprocess.check_output(["rocminfo"], universal_newlines=True)
+        match = re.search(r'ROCm Version: (\d+\.\d+\.\d+)', rocm_output)
+        if match:
+            return Version(match.group(1))
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        pass
     rocm_path = os.environ.get("ROCM_PATH", "/opt/rocm")
     rocm_version_file = os.path.join(rocm_path, "lib", "cmake", "rocm",
                                      "rocm-config-version.cmake")
     if os.path.exists(rocm_version_file):
         with open(rocm_version_file, "r") as f:
             content = f.read()
             match = re.search(r'set\(PACKAGE_VERSION "(\d+\.\d+\.\d+)"', content)
             if match:
                 return Version(match.group(1))
     # return a default
     return Version("5.0.0")

882-895: Update PyPI metadata for Metal/macOS builds.

Expose macOS in platforms/classifiers when USE_METAL is on; optionally add/remove CUDA/ROCm env classifiers based on toggles.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7514cf545a9b9ec3d999fc1b1f6bd098397d19e1 and e1e18873a51c22fce0d9495e989c7253e871f99f.

📒 Files selected for processing (1)
  • setup.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
setup.py (1)
tilelang/env.py (1)
  • get (136-139)
🪛 Ruff (0.12.2)
setup.py

100-102: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: bot-task
🔇 Additional comments (1)
setup.py (1)

390-392: Linux-only rpath patch exit is correct.

Early-return avoids patchelf on macOS. LGTM.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
setup.py (3)

808-811: Fix unconditional CUDA compiler flag (breaks Metal/macOS and CUDA-less builds).

-DCMAKE_CUDA_COMPILER is appended whenever ROCm is disabled, even if CUDA is not requested or CUDA_HOME is unset. On macOS/Metal this can crash with a TypeError (joining path with None) and on Linux CPU/ROCm builds it forces CUDA.

Apply:

-        if not USE_ROCM:
-            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")
+        # Only set CUDA compiler when CUDA is explicitly enabled and detected
+        if USE_CUDA and CUDA_HOME:
+            cmake_args.append(f"-DCMAKE_CUDA_COMPILER={os.path.join(CUDA_HOME, 'bin', 'nvcc')}")

667-676: Editable install: also copy .so on macOS.

Python C-extensions on macOS commonly end with .so. Current logic only grabs .dylib on macOS, causing missing libs in editable mode.

-        files = glob.glob("*.so" if IS_LINUX else "*.dylib")
+        files = sorted(set(glob.glob("*.so") + glob.glob("*.dylib")))

269-277: Handle missing CMake path; avoid TypeError on os.path.exists(None).

shutil.which("cmake") may return None; calling os.path.exists(None) raises a TypeError.

 def get_cmake_path() -> str:
@@
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
         raise Exception("CMake is not installed, please install it first.")
     return cmake_path
♻️ Duplicate comments (1)
setup.py (1)

36-44: Boolean env parsing still brittle (whitespace/yes-no not handled; empty string coerced to False).

You addressed the earlier issue but missed .strip() and common synonyms, and you now treat an empty string as False rather than “unset → default”.

Apply:

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ("1", "true", "on", "yes", "y"):
+        return True
+    if s in ("0", "false", "off", "no", "n"):
+        return False
+    return default
🧹 Nitpick comments (4)
setup.py (4)

51-53: Make Metal default detection more robust.

Use explicit Darwin + machine check; platform.mac_ver()[2] is brittle and can vary.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+MAYBE_METAL = (platform.system() == 'Darwin' and platform.machine() == 'arm64')

829-843: When enabling Metal, explicitly disable CUDA/ROCm in config.cmake for clarity.

Prevents accidental dual-backend config if upstream defaults change.

-        if USE_METAL:
-            content_lines += [
-                "set(USE_METAL ON)",
-            ]
+        if USE_METAL:
+            content_lines += [
+                "set(USE_METAL ON)",
+                "set(USE_CUDA OFF)",
+                "set(USE_ROCM OFF)",
+            ]

893-898: Reflect macOS support in Trove classifiers.

Now that Metal/macOS is supported, advertise it.

         "Programming Language :: Python :: 3.8",
         "License :: OSI Approved :: MIT License",
-        "Operating System :: OS Independent",
+        "Operating System :: OS Independent",
+        "Operating System :: MacOS :: MacOS X",
         "Intended Audience :: Developers",
         "Intended Audience :: Science/Research",

102-111: Minor: tighten error messages and conditions (style).

Long exception messages trigger Ruff TRY003; consider shorter texts or helper constants. Also consider allowing Linux CPU-only builds in the future by gating the “require CUDA/ROCM” check behind an explicit flag.

If you want, I can open a follow-up to align linter expectations and make the Linux GPU requirement opt-in via an env (e.g., REQUIRE_GPU=1).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1e18873a51c22fce0d9495e989c7253e871f99f and cc91c3e22afffbb80f1dbec199fd55d91748adc5.

📒 Files selected for processing (1)
  • setup.py (6 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: bot-task

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
tilelang/carver/arch/metal.py (1)

17-20: Sort all to satisfy Ruff RUF022.

 __all__ = [
-    'is_metal_arch',
-    'METAL',
+    'METAL',
+    'is_metal_arch',
 ]
.github/workflows/metal_ci.yml (2)

12-14: Tighten permissions and fix trailing space.

contents: write isn’t needed; also removes the trailing space flagged by yamllint.

-    permissions:
-      contents: write 
+    permissions:
+      contents: read

96-97: Avoid double installation of the project.

You install in “Ensure venv” and again in “Install project.” Keep one to reduce time and flakiness.

-          pip install .
-          touch "$MARKER"
+          touch "$MARKER"

(Keep the dedicated “Install project” step.)

Also applies to: 100-104

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc91c3e22afffbb80f1dbec199fd55d91748adc5 and 12ca66077563253c09d1530c948b9fcf28cbbdd1.

📒 Files selected for processing (2)
  • .github/workflows/metal_ci.yml (1 hunks)
  • tilelang/carver/arch/metal.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/carver/arch/metal.py (2)
tilelang/language/ast/ir.py (1)
  • target (1682-1713)
tilelang/carver/arch/arch_base.py (1)
  • TileDevice (4-37)
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
tilelang/carver/arch/metal.py

17-20: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: format-check
  • GitHub Check: bot-task
🔇 Additional comments (2)
tilelang/carver/arch/metal.py (1)

6-7: Helper reads well.

is_metal_arch is straightforward and correct.

.github/workflows/metal_ci.yml (1)

1-1: Rename workflow and use Metal CMake flag.
CMakeLists.txt contains a USE_METAL guard — switch -DUSE_ROCM=ON to -DUSE_METAL=ON and rename the workflow.

-name: CI Test on AMD
+name: Metal CI
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_ROCM=ON; cd ..
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..

Comment on lines +11 to +15
def __init__(self, target: Target | str):
if isinstance(target, str):
target = Target(target)
self.target = target

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Call the base initializer to populate TileDevice fields.

Without super().__init__(), METAL instances won't have the base attributes (reg_cap, smem_cap, etc.), which can break consumers expecting them.

Apply:

 class METAL(TileDevice):
     def __init__(self, target: Target | str):
+        super().__init__()
         if isinstance(target, str):
             target = Target(target)
         self.target = target
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, target: Target | str):
if isinstance(target, str):
target = Target(target)
self.target = target
def __init__(self, target: Target | str):
super().__init__()
if isinstance(target, str):
target = Target(target)
self.target = target
🤖 Prompt for AI Agents
In tilelang/carver/arch/metal.py around lines 11 to 15, the METAL class __init__
does not call the base class initializer so base TileDevice fields (e.g.,
reg_cap, smem_cap) are not populated; update the constructor to call
super().__init__(target) (after converting a string target to Target if needed)
so the parent initializer runs and base attributes are set, ensuring the
instance has the expected fields.

FrozenGene pushed a commit to apache/tvm that referenced this pull request Sep 12, 2025
Removed external library DLL names for macOS.

found during tile-ai/tilelang#799

cc @LeiWang1999
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (8)
tilelang/carver/arch/metal.py (1)

11-14: Call TileDevice.init to populate base fields (duplicate of prior feedback).

Without super().__init__(), instances miss required attributes (reg_cap, smem_cap, etc.), leading to runtime errors when accessed.

Apply:

 class METAL(TileDevice):
 
     def __init__(self, target: Target | str):
+        super().__init__()
         if isinstance(target, str):
             target = Target(target)
         self.target = target
.github/workflows/metal_ci.yml (7)

21-25: Update setup-python to v5 (v2 is deprecated on GH runners).

actionlint flags v2 as too old; move to v5.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

29-31: Fix hashing on macOS: use shasum -a 256.

sha256sum isn’t available on macOS. Use shasum.

-        REQS_HASH=$(sha256sum requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
         MARKER="${{ runner.tool_cache }}/.venv_marker_${{ env.PYTHON_VERSION }}_${REQS_HASH:0:8}"

40-43: Remove invalid pip flag --no-user.

pip doesn’t support --no-user here (even in venvs).

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

75-79: Update setup-python to v5 here as well.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

83-85: Use a stable requirements hash (shasum) instead of cat.

cat doesn’t produce a fixed-length digest and weakens cache keys.

-        REQS_HASH=$(cat requirements-test.txt 2>/dev/null || true)
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
         MARKER="${{ runner.tool_cache }}/.venv_marker_${{ env.PYTHON_VERSION }}_${REQS_HASH:0:8}"

93-97: Remove invalid --no-user and avoid double install.

Fix flags; also consider skipping the early install since you reinstall later.

           source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
           [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
-          pip install . --no-user
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
+          # Defer project install to the dedicated step below

100-104: Remove --no-user in the wheel install step.

-        pip install . --no-user -v
+        pip install . -v
🧹 Nitpick comments (6)
tilelang/carver/arch/metal.py (3)

17-20: Sort all to satisfy Ruff RUF022.

 __all__ = [
-    'is_metal_arch',
-    'METAL',
+    'METAL',
+    'is_metal_arch',
 ]

11-14: Optionally validate Target kind is metal.

Prevents accidental misuse with a non-Metal TVM target.

     def __init__(self, target: Target | str):
         super().__init__()
         if isinstance(target, str):
             target = Target(target)
+        # Ensure we were given a Metal target
+        if getattr(getattr(target, "kind", None), "name", None) != "metal":
+            raise ValueError(f"METAL requires a TVM Target(kind='metal'), got: {getattr(getattr(target, 'kind', None), 'name', None)}")
         self.target = target

Please confirm the TVM version exposes target.kind.name; if not, we can adapt the check.


9-15: Confirm whether get_avaliable_tensorintrin_shapes is required for Metal paths.

If invoked downstream, not overriding it will raise NotImplementedError from the base class.

Proposed minimal safe stub:

 class METAL(TileDevice):
 
     def __init__(self, target: Target | str):
         super().__init__()
         if isinstance(target, str):
             target = Target(target)
         self.target = target
+
+    def get_avaliable_tensorintrin_shapes(self):
+        # Metal/MPS: no tensor intrinsics exposed yet
+        return []
.github/workflows/metal_ci.yml (3)

12-14: Trim trailing space and drop unnecessary write permission.

Set minimal perms and remove the trailing space flagged by yamllint.

     permissions:
-      contents: write 
+      contents: read

10-10: Pin runner to Apple Silicon if you require MPS/Metal.

macos-latest can drift; pin to macos-14 (Apple Silicon) to ensure MPS availability.

-    runs-on: [macos-latest]
+    runs-on: macos-14

(Apply to both jobs.)

Also applies to: 63-63


26-45: Rethink venv “persistence”; use actions/cache instead.

runner.tool_cache doesn’t persist across jobs/runs on hosted macOS. Use actions/cache keyed by Python and requirements hash.

Example step to add before “Ensure venv”:

- name: Cache venv
  uses: actions/cache@v4
  with:
    path: ${{ runner.tool_cache }}/${{ env.VENV_DIR }}
    key: venv-${{ runner.os }}-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt', 'pyproject.toml', 'setup.cfg', 'setup.py') }}

Then create venv if missing and install deps; drop the custom MARKER logic.

Also applies to: 80-98

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12ca66077563253c09d1530c948b9fcf28cbbdd1 and d546487de7e3a2031c9106e89210669f666bbccd.

📒 Files selected for processing (2)
  • .github/workflows/metal_ci.yml (1 hunks)
  • tilelang/carver/arch/metal.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/carver/arch/metal.py (2)
tilelang/carver/arch/arch_base.py (1)
  • TileDevice (4-37)
tilelang/carver/template/base.py (1)
  • arch (152-159)
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
tilelang/carver/arch/metal.py

17-20: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🔇 Additional comments (3)
tilelang/carver/arch/metal.py (1)

5-6: Helper looks good.

.github/workflows/metal_ci.yml (2)

51-51: Drop ROCm flag in Metal CI configure step.

This job targets Metal; forcing -DUSE_ROCM=ON on macOS is likely wrong and may fail. Prefer no ROCm flag (or a Metal-specific one if your CMake supports it).

-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_ROCM=ON; cd ..
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON; cd ..

If your build supports it, consider adding: -DUSE_METAL=ON.


105-111: pytest-timeout declared — no change required

pytest-timeout is listed in requirements-test.txt (line 16); --timeout is supported so keep the workflow as-is.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
setup.py (2)

269-276: Fix possible TypeError when CMake isn't installed.

shutil.which can return None; os.path.exists(None) crashes.

 def get_cmake_path() -> str:
     """Return the path to the CMake compiler.
     """
-    # found which cmake is used
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
         raise Exception("CMake is not installed, please install it first.")
     return cmake_path

829-844: Ensure CUDA is disabled when building for Metal.

Without explicitly setting USE_CUDA OFF here, CMake template defaults could enable CUDA inadvertently.

 if USE_METAL:
     content_lines += [
         "set(USE_METAL ON)",
         "set(USE_ROCM OFF)",
+        "set(USE_CUDA OFF)",
     ]
♻️ Duplicate comments (8)
setup.py (1)

36-43: Boolean env parsing still wrong; empty/unknown values fall back to default and '' branch is unreachable.

Normalize and handle both true/false tokens (incl. empty string). Current logic can flip GPU toggles unexpectedly.

Apply:

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default
.github/workflows/metal_ci.yml (7)

93-97: Remove --no-user and avoid duplicate installs.

Install requirements here; leave project install to the dedicated step below.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
           [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
-          pip install . --no-user
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

21-25: Upgrade setup-python to v5.

v2 is deprecated and fails on current runners.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

76-79: Upgrade setup-python to v5 (second job).

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

29-31: sha256sum isn’t available on macOS runners; use shasum -a 256.

-        REQS_HASH=$(sha256sum requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")

83-85: Consistently hash requirements; don’t use cat.

-        REQS_HASH=$(cat requirements-test.txt 2>/dev/null || true)
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")

40-43: Drop invalid pip flag --no-user.

This option doesn’t exist and can fail even inside venvs.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

101-104: Remove --no-user (wheel install step).

-        pip install . --no-user -v
+        pip install . -v
🧹 Nitpick comments (5)
setup.py (4)

51-53: More robust Metal default detection.

Prefer platform checks that don’t rely on mac_ver quirks.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+MAYBE_METAL = (platform.system() == 'Darwin' and platform.machine() == 'arm64')

102-106: Shorten exception message (Ruff TRY003) and keep details in logs.

Long inline messages trigger TRY003 and add noise.

-if USE_CUDA and not CUDA_HOME:
-    raise ValueError(
-        "CUDA support is enabled by default on linux if `USE_ROCM=False`, but CUDA_HOME is not set or detected.")
+if USE_CUDA and not CUDA_HOME:
+    raise ValueError("CUDA enabled but CUDA_HOME not set/detected.")

463-484: Use logger.warning instead of “WARNING:” via info.

This preserves severity and integrates with log filters.

-                logger.info(f"WARNING: {item} not found in any expected directories!")
+                logger.warning(f"{item} not found in any expected directories!")

886-889: Update metadata to reflect macOS/Metal support.

Expose correct OS in metadata; remove Linux-only bias when targeting Metal.

-    platforms=[
-        "Environment :: GPU :: NVIDIA CUDA" if not USE_ROCM else "Environment :: GPU :: AMD ROCm",
-        "Operating System :: POSIX :: Linux",
-    ],
+    platforms=[
+        ("Environment :: GPU :: NVIDIA CUDA" if USE_CUDA else
+         "Environment :: GPU :: AMD ROCm" if USE_ROCM else
+         "Environment :: GPU"),
+        "Operating System :: MacOS :: MacOS X" if USE_METAL else "Operating System :: POSIX :: Linux",
+    ],
-        "Operating System :: OS Independent",
+        ("Operating System :: MacOS :: MacOS X" if USE_METAL else "Operating System :: POSIX :: Linux"),

Also applies to: 893-900

.github/workflows/metal_ci.yml (1)

13-13: Trailing space.

Minor YAML lint fix.

-      contents: write 
+      contents: write
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d546487de7e3a2031c9106e89210669f666bbccd and 5b306179cb8b24f66118ff9847bad7a1847e82ff.

📒 Files selected for processing (2)
  • .github/workflows/metal_ci.yml (1 hunks)
  • setup.py (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
setup.py (1)
tilelang/env.py (1)
  • get (136-139)
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format-check
🔇 Additional comments (1)
setup.py (1)

108-111: Linux-only accelerator requirement is fine.

The guard correctly avoids enforcing CUDA/ROCm on macOS/Metal builds.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
.github/workflows/ci.yml (2)

22-25: Upgrade to setup-python v5.

actions/setup-python@v2 is deprecated and flagged by actionlint.

-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5

Also applies to: 78-81


40-45: Remove invalid pip install --no-user.

--no-user is not a valid pip flag (inside venvs it’s unnecessary anyway).

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-          pip install flash_attn==2.5.8 --no-user --no-build-isolation
+          pip install flash_attn==2.5.8 --no-build-isolation
-          pip install . --no-user
+          pip install .
-        pip install . --no-user -v
+        pip install . -v

Also applies to: 96-101, 107-107

.github/workflows/amd_ci.yml (1)

23-26: Upgrade to setup-python v5.

v2 is deprecated; actionlint will flag this.

-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5

Also applies to: 77-80

♻️ Duplicate comments (3)
.github/workflows/metal_ci.yml (3)

21-25: Update setup-python to v5.

Same as earlier bot comment; apply in both jobs.

-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

Also applies to: 76-79


29-31: Fix hashing on macOS (no sha256sum).

Use shasum -a 256 (portable on macOS). Apply to both blocks.

-        REQS_HASH=$(sha256sum requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")
-        REQS_HASH=$(cat requirements-test.txt 2>/dev/null || true)
+        REQS_HASH=$(shasum -a 256 requirements-test.txt 2>/dev/null | awk '{print $1}' || echo "no_requirements")

Also applies to: 83-85


40-44: Remove invalid pip install --no-user.

Not a valid flag and unnecessary in venvs. Apply in all occurrences.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-        pip install . --no-user -v
+        pip install . -v

Also applies to: 93-96, 102-102

🧹 Nitpick comments (4)
.github/workflows/ci.yml (1)

12-14: Tighten permissions.

Format check doesn’t push commits; prefer read-only.

-    permissions:
-      contents: write 
+    permissions:
+      contents: read
.github/workflows/metal_ci.yml (3)

13-13: Trim trailing space.

YAMLlint flags trailing spaces.

-      contents: write 
+      contents: write

10-10: Target Apple Silicon explicitly for MPS.

To avoid Intel mac runners without MPS, prefer macOS 14 (Apple Silicon).

-    runs-on: [macos-latest]
+    runs-on: [macos-14]

Please confirm your org has access to macOS-14 ARM runners; otherwise keep latest and skip MPS tests when unavailable.

Also applies to: 63-63


104-109: Add a quick MPS sanity check before tests.

Fail fast if MPS isn’t available.

     - name: Run metal test
       run: |
         source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
+        python - <<'PY'
+import sys
+try:
+    import torch
+    print("torch:", torch.__version__, "MPS available:", torch.backends.mps.is_available())
+    assert torch.backends.mps.is_available(), "MPS not available on this runner"
+except Exception as e:
+    print("Precheck failed:", e, file=sys.stderr)
+    sys.exit(1)
+PY
         cd testing/python
         unset PYTHONPATH
         python -m pytest -k metal -v -r fE --durations=0 --timeout=3600
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b306179cb8b24f66118ff9847bad7a1847e82ff and 65ffe111a4df5414ce44a0a14c5113c6491e35c8.

📒 Files selected for processing (3)
  • .github/workflows/amd_ci.yml (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .github/workflows/metal_ci.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

22-22: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


76-76: the runner of "actions/setup-python@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-metal

Comment on lines 49 to 40
git submodule update --init --recursive --recommend-shallow
mkdir -p build
cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard --recommend-shallow for older Git on macOS.

Add a fallback to keep runners with older Git working.

-        git submodule update --init --recursive --recommend-shallow
+        if git submodule update -h 2>&1 | grep -q -- '--recommend-shallow'; then
+          git submodule update --init --recursive --recommend-shallow
+        else
+          git submodule update --init --recursive
+        fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git submodule update --init --recursive --recommend-shallow
mkdir -p build
cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..
if git submodule update -h 2>&1 | grep -q -- '--recommend-shallow'; then
git submodule update --init --recursive --recommend-shallow
else
git submodule update --init --recursive
fi
mkdir -p build
cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
.github/workflows/metal_ci.yml (5)

16-16: Remove trailing whitespace (yamllint violation).

Line 16 has a trailing space after "write".

-      contents: write 
+      contents: write

40-50: Make format-check fail deterministically on formatting drift.

Currently you fail only when format.sh exits non‑zero; if the script exits 0 but emits the sentinel text, the job passes. Harden the logic.

-        if ! output=$(./format.sh 2>&1); then
-          echo "------------------------------------"
-          echo "message:"
-          echo "$output"
-          printf '%s\n' "$output" | grep "Please review and stage the changes."
-          echo "------------------------------------"
-          exit 1
-        fi
+        set -euo pipefail
+        status=0
+        output="$(./format.sh 2>&1)" || status=$?
+        if [ "$status" -ne 0 ] || printf '%s\n' "$output" | grep -q "Please review and stage the changes."; then
+          echo "------------------------------------"
+          echo "message:"
+          printf '%s\n' "$output"
+          echo "------------------------------------"
+          exit 1
+        fi

13-13: Pin to Apple Silicon runners for MPS/Metal determinism.

To ensure MPS availability and consistent behavior, prefer macOS 14 (Apple Silicon) explicitly rather than the moving alias macos-latest.

-    runs-on: [macos-latest]
+    runs-on: macos-14

Apply to both jobs after confirming that your Metal/MPS tests require Apple Silicon on GitHub-hosted runners.

Also applies to: 54-54


15-17: Use least-privilege permissions (write not needed).

This job doesn’t push changes; read is sufficient.

-    permissions:
-      contents: write
+    permissions:
+      contents: read

82-86: Use working-directory instead of cd; keep PYTHONPATH unset.

Cleaner, avoids subshell state and makes logs clearer.

-    - name: Run metal test
-      run: |
-        cd testing/python
-        unset PYTHONPATH
-        python -m pytest -k metal -v -r fE --durations=0 --timeout=3600
+    - name: Run metal test
+      working-directory: testing/python
+      run: |
+        unset PYTHONPATH
+        python -m pytest -k metal -v -r fE --durations=0 --timeout=3600
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65ffe111a4df5414ce44a0a14c5113c6491e35c8 and ca6546ac34e89cde32673641f20a0aca904533c8.

📒 Files selected for processing (1)
  • .github/workflows/metal_ci.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

28-28: property "os" is not defined in object type {}

(expression)


68-68: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 16-16: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: format-check
  • GitHub Check: bot-task

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.github/workflows/metal_ci.yml (1)

63-63: Fix invalid cache key reference to matrix.os (actionlint failure).

Replace undefined matrix.os with runner.os and include runner.arch for uniqueness.

-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}
🧹 Nitpick comments (5)
.github/workflows/metal_ci.yml (5)

16-16: Remove trailing space (yamllint).

-      contents: write 
+      contents: write

15-17: Scope down permissions to least privilege.

This job doesn’t appear to need write. Prefer read.

     permissions:
-      contents: write
+      contents: read

33-46: Make format check fail on detected changes even if script exits 0.

Currently failure only triggers when format.sh returns non-zero. Also fail on the marker text and on any git diff.

     - name: Run format check
       run: |
-        source .venv/bin/activate
-        mkdir -p build
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON; cd ..
-        if ! output=$(./format.sh 2>&1); then
-          echo "------------------------------------"
-          echo "message:"
-          echo "$output"
-          printf '%s\n' "$output" | grep "Please review and stage the changes."
-          echo "------------------------------------"
-          exit 1
-        fi
-        rm -rf build
+        set -euo pipefail
+        source .venv/bin/activate
+        mkdir -p build
+        (cd build && cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_METAL=ON)
+        output=$(./format.sh 2>&1) || true
+        echo "------------------------------------"
+        echo "$output"
+        echo "------------------------------------"
+        if printf '%s\n' "$output" | grep -q "Please review and stage the changes."; then
+          exit 1
+        fi
+        if ! git diff --quiet; then
+          git --no-pager diff
+          exit 1
+        fi
+        rm -rf build

13-13: Consider pinning to macos-14 for stable Apple Silicon/MPS runners.

Reduces surprises from future macos-latest changes.

-    runs-on: [macos-latest]
+    runs-on: macos-14

Also applies to: 49-49


77-81: Optional: print ccache stats after tests.

Quick visibility into cache effectiveness.

     - name: Run metal test
       run: |
         cd testing/python
         unset PYTHONPATH
         python -m pytest -k metal -v -r fE --durations=0 --timeout=3600
+
+    - name: ccache stats
+      run: ccache -s || true
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca6546ac34e89cde32673641f20a0aca904533c8 and e9cd15f744d1cb61534c907cad248305982953e0.

📒 Files selected for processing (1)
  • .github/workflows/metal_ci.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

63-63: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 16-16: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: bot-task

@oraluben oraluben force-pushed the metal branch 3 times, most recently from 9f7474c to 98336a3 Compare September 12, 2025 10:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.github/workflows/metal_ci.yml (1)

64-64: Fix cache key: matrix.os is undefined in this workflow
Using matrix.os without a matrix breaks actionlint and can affect cache keys. Use runner context.

-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}
🧹 Nitpick comments (3)
.github/workflows/metal_ci.yml (3)

14-14: Trim trailing whitespace (yamllint error)

There’s a trailing space after write flagged by YAMLlint.

-      contents: write 
+      contents: write

11-11: Pin macOS runner to Apple Silicon image for stable MPS/Metal
macos-latest can shift. Pin to macos-14 (or your validated version) to ensure consistent MPS/Metal availability.

-    runs-on: [macos-latest]
+    runs-on: macos-14

(Apply to both jobs.)

Also applies to: 47-47


13-15: Restrict permissions to least privilege
This job doesn’t push changes; contents: read suffices.

-    permissions:
-      contents: write
+    permissions:
+      contents: read
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9cd15f744d1cb61534c907cad248305982953e0 and 98336a39372308bb8964379727b6155dc665cdb3.

📒 Files selected for processing (2)
  • .github/workflows/metal_ci.yml (1 hunks)
  • format.sh (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

64-64: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 14-14: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: format-check
  • GitHub Check: bot-task

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
.github/workflows/ci.yml (3)

124-125: Stabilize Metal job: pin Apple Silicon runner and preflight MPS availability.

To avoid drift if macos-latest changes, pin to an Apple Silicon image (e.g., macos-14). Add a quick MPS check to fail fast with a clear message if MPS isn’t available.

Apply:

-    runs-on: [macos-latest]
+    runs-on: macos-14

Insert before tests:

+    - name: Verify MPS availability
+      run: |
+        python - <<'PY'
+        import sys
+        try:
+          import torch
+          ok = getattr(torch.backends, "mps", None) and torch.backends.mps.is_available()
+        except Exception:
+          ok = False
+        print(f"MPS available: {ok}")
+        sys.exit(0 if ok else 1)
+        PY

Also applies to: 156-160


133-136: Improve checkout completeness for submodules on macOS.

Ensure the whole history is available when submodules/versions require it.

Apply:

       with:
         submodules: recursive
+        fetch-depth: 0

12-14: Least-privilege: format-check likely only needs read permission.

Unless you auto-commit formatting, drop to contents: read.

Apply:

-    permissions:
-      contents: write 
+    permissions:
+      contents: read
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98336a39372308bb8964379727b6155dc665cdb3 and fd8a47bf287afd3a2435815e3b9370d4e2ec4d49.

📒 Files selected for processing (1)
  • .github/workflows/ci.yml (2 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/ci.yml

141-141: property "os" is not defined in object type {}

(expression)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: format-check
  • GitHub Check: bot-task

@@ -7,7 +7,7 @@ env:

jobs:
format-check:
runs-on: [self-hosted, nvidia]
runs-on: [ubuntu-latest]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Make format-check runner GPU-agnostic: drop flash-attn build and don’t force CUDA in CMake.

On ubuntu-latest, building flash_attn commonly fails (no CUDA toolchain). Also forcing -DUSE_CUDA=ON can make CMake error out on non-GPU runners. These can break formatting CI unrelated to GPU.

Apply:

-          pip install flash_attn==2.5.8 --no-user --no-build-isolation
+          # GPU-specific deps are not needed for format check; avoid fragile builds
+          true
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=ON; cd ..
+        # Generate compile_commands.json without assuming CUDA
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=OFF; cd ..

Also applies to: 41-45, 53-54

🤖 Prompt for AI Agents
.github/workflows/ci.yml around line 10 (and also update the same changes at
lines 41-45 and 53-54): the format-check runner is currently tied to GPU tooling
— remove the step that builds flash-attn and stop forcing CUDA in CMake for this
runner; update the job matrix/steps so the format-check job does not invoke
building flash-attn and remove or conditionally set any -DUSE_CUDA=ON CMake
flags (only enable CUDA on GPU-capable jobs), ensuring the workflow uses
CPU-only defaults so formatting CI will run on ubuntu-latest without requiring a
CUDA toolchain.

Comment on lines 138 to 142
uses: hendrikmuhs/ccache-action@v1.2
with:
create-symlink: true
key: ${{ github.job }}-${{ matrix.os }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix ccache key: matrix.os is undefined (will fail at runtime).

matrix.os isn’t defined in this job (no matrix), which actionlint already flags. Use runner.os and include Python/version and requirements hash to improve cache hits.

Apply:

-      with:
-        create-symlink: true
-        key: ${{ github.job }}-${{ matrix.os }}
+      with:
+        create-symlink: true
+        key: ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements-*.txt') }}
+        restore-keys: |
+          ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: hendrikmuhs/ccache-action@v1.2
with:
create-symlink: true
key: ${{ github.job }}-${{ matrix.os }}
uses: hendrikmuhs/ccache-action@v1.2
with:
create-symlink: true
key: ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements-*.txt') }}
restore-keys: |
${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-
🧰 Tools
🪛 actionlint (1.7.7)

141-141: property "os" is not defined in object type {}

(expression)

🤖 Prompt for AI Agents
In .github/workflows/ci.yml around lines 138 to 142 the ccache key uses
matrix.os which is undefined and will fail; replace matrix.os with runner.os and
expand the key to include stable identifiers like the Python version and a hash
of requirements (or lockfile) to improve cache correctness and hit rate — e.g.
construct the key as a combination of github.job, runner.os, the python-version
variable, and a requirements.txt (or poetry/poetry.lock) hash so the cache is
unique per OS/Python/deps and will not error at runtime.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/amd_ci.yml (1)

64-120: Remove invalid pip flag and modernize Python setup

  • pip --no-user is invalid and will fail; remove it (both install locations).
  • Update actions/setup-python to v5 in this job too.
-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
@@
-          USE_ROCM=True pip install . --no-user
+          USE_ROCM=True pip install .
@@
-        USE_ROCM=True pip install . --no-user
+        USE_ROCM=True pip install .
.github/workflows/metal_ci.yml (1)

88-94: Install the built wheel explicitly

Avoid picking up sdists (which could rebuild without USE_METAL). Install only the wheel.

-        uv pip install dist/*
+        uv pip install dist/*.whl
♻️ Duplicate comments (2)
.github/workflows/metal_ci.yml (2)

82-87: Don’t source .venv with setup-uv; ensure Metal build flag set

setup-uv already activates the env; .venv/bin/activate may not exist and will fail. Also ensure Metal is enabled during build.

-        source .venv/bin/activate
-        uv pip install build wheel ninja cmake
-        python -mbuild -w --no-isolation
+        uv pip install build wheel ninja cmake
+        USE_METAL=1 python -m build -w --no-isolation

65-70: Fix undefined matrix.os in ccache key (actionlint)

matrix.os isn’t defined; use runner context to keep keys valid and unique.

-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}
🧹 Nitpick comments (4)
setup.py (1)

102-106: Improve error message clarity.

The error message for missing CUDA_HOME uses a backslash for line continuation which could be confusing. Consider using parentheses for cleaner multi-line strings.

Apply this diff to improve readability:

-        "CUDA support is enabled by default on linux if `USE_ROCM=False`," \
-        " but CUDA_HOME is not set or detected.")
+        ("CUDA support is enabled by default on linux if `USE_ROCM=False`, "
+         "but CUDA_HOME is not set or detected."))
.github/workflows/amd_ci.yml (1)

9-62: Gating with if: false — consider a toggle and update setup-python

  • Replace hardcoded if: false with a single env/input (e.g., ENABLE_AMD_CI) so both jobs can be enabled together without editing YAML.
  • Update actions/setup-python from v2 to v5 to avoid deprecation issues.
-  format-check:
-    if: false
+  format-check:
+    if: ${{ env.ENABLE_AMD_CI == 'true' }}
@@
-    - name: Set up Python
-      uses: actions/setup-python@v2
+    - name: Set up Python
+      uses: actions/setup-python@v5
.github/workflows/metal_ci.yml (2)

10-11: Pin runner to Apple Silicon

Metal/MPS tests may require Apple Silicon. Prefer macos-14 (arm64) over macos-latest to avoid Intel runners.

-    runs-on: [macos-latest]
+    runs-on: macos-14

12-14: Tighten permissions and fix trailing space

Use least-privilege and remove trailing space flagged by yamllint.

-    permissions:
-      contents: write 
+    permissions:
+      contents: read
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd8a47bf287afd3a2435815e3b9370d4e2ec4d49 and 32edb2446da0b9e707397767677eb68b032d9962.

📒 Files selected for processing (4)
  • .github/workflows/amd_ci.yml (2 hunks)
  • .github/workflows/ci.yml (2 hunks)
  • .github/workflows/metal_ci.yml (1 hunks)
  • setup.py (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/ci.yml
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

69-69: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-metal
  • GitHub Check: bot-task
🔇 Additional comments (9)
setup.py (8)

36-43: LGTM! Boolean environment parsing is now more robust.

The updated _read_bool_env function correctly handles both truthy and falsy values explicitly, including empty strings as false. This prevents the previous issue where explicit false values like "0" or "false" would incorrectly return the default.


65-81: LGTM! Platform-specific library naming is correctly handled.

The prebuild items lists are properly differentiated between Linux (.so) and macOS (.dylib) shared libraries, with appropriate deletion items for macOS debug symbols.


614-622: LGTM! CMakeExtension constructor is properly updated.

The addition of **kwargs parameter allows for flexible extension configuration while maintaining backward compatibility.


669-669: LGTM! Platform-specific file globbing is correctly implemented.

The dynamic selection between .so and .dylib files based on the platform is appropriate for cross-platform compatibility.


700-710: Document Metal-specific Cython compilation issue.

The comment explains the issue with Cython compilation on Metal/macOS requiring libpython linkage. This is a reasonable workaround until the issue is investigated further.


761-763: Document the macOS Cython linking issue.

The FIXME comment correctly identifies that aarch64 Darwin needs additional Python library linking. This is consistent with the earlier Metal-specific handling.


822-857: LGTM! CMake configuration logic properly handles Metal backend.

The Metal-first configuration approach is well-implemented:

  • Metal takes precedence when USE_METAL is enabled
  • Proper fallback to ROCM/CUDA backends
  • Correct mutual exclusion of backends

487-494: LGTM! Cleanup logic for platform-specific artifacts is properly implemented.

The loop handles deletion of macOS-specific build artifacts (like .dSYM files) that need to be removed during the build process.

.github/workflows/metal_ci.yml (1)

35-49: Format step looks good

CMake prep with USE_METAL=ON and clean failure output are clear.

Comment on lines 51 to 67
IS_LINUX = platform.system() == 'Linux'
MAYBE_METAL = platform.mac_ver()[2] == 'arm64'

# Add LLVM control environment variable
USE_LLVM = os.environ.get("USE_LLVM", "False").lower() == "true"
USE_LLVM = _read_bool_env('USE_LLVM')
# Add ROCM control environment variable
USE_ROCM = _read_bool_env("USE_ROCM")
# Add ROCM control environment variable
USE_METAL = _read_bool_env("USE_METAL", MAYBE_METAL)
# Add ROCM control environment variable
USE_ROCM = os.environ.get("USE_ROCM", "False").lower() == "true"
USE_CUDA = _read_bool_env("USE_CUDA", IS_LINUX and not USE_ROCM)
# Build with Debug mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify Metal availability detection logic.

The MAYBE_METAL variable checks for arm64 architecture on macOS, but this may not be sufficient for determining Metal support availability. Consider if additional checks are needed for Metal framework availability.


🏁 Script executed:

#!/bin/bash
# Description: Check if Metal detection logic in other files is consistent with setup.py

# Search for Metal availability checks in the codebase
rg -n "check_metal_availability|metal.*availability" --type=py -A3 -B3

# Look for platform-specific Metal checks
rg -n "mac_ver|arm64|MAYBE_METAL" --type=py -A2 -B2

Length of output: 2480


Replace arch-only Metal detection with feature-based detection

Both setup.py (MAYBE_METAL = platform.mac_ver()[2] == 'arm64') and tilelang/utils/target.py (check_metal_availability returning arch == 'arm64') assume Metal only on arm64 and thus exclude Intel Macs that may support Metal. Replace these ad-hoc checks with a single helper that verifies macOS release and Metal framework availability (e.g., presence of /System/Library/Frameworks/Metal.framework or attempting to load the framework) and use it both in setup.py and target selection.

Locations: setup.py (MAYBE_METAL, ~lines 51–62); tilelang/utils/target.py (check_metal_availability, ~lines 46–51; used at ~line 84).

🤖 Prompt for AI Agents
In setup.py around lines 51–62 and tilelang/utils/target.py around lines 46–51
and usage at ~84, replace the arch-only Metal detection with a feature-based
helper: add a cross-file helper is_metal_available() (in a shared util module or
a small platform_helpers.py) that returns True only if platform.system() ==
'Darwin', macOS release is new enough (parse platform.mac_ver()[0] or
platform.release()), and the Metal framework is present (check existence of
/System/Library/Frameworks/Metal.framework or attempt to load it via
ctypes.CDLL); then change MAYBE_METAL in setup.py to call is_metal_available()
instead of checking arm64, and replace check_metal_availability / arch ==
'arm64' in tilelang/utils/target.py to call this new helper so both files use
the same robust feature-detection logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
.github/workflows/amd_ci.yml (2)

41-45: Remove invalid pip flag --no-user.

pip install --no-user isn’t a valid option and will fail.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt

98-105: Remove --no-user in remaining pip installs.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
@@
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-rocm.txt
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-rocm.txt
@@
-          USE_ROCM=True pip install . --no-user
+          USE_ROCM=True pip install .
@@
-        USE_ROCM=True pip install . --no-user
+        USE_ROCM=True pip install .

Also applies to: 112-112

setup.py (2)

271-279: get_cmake_path can raise TypeError when CMake is missing.

Guard None before os.path.exists.

 def get_cmake_path() -> str:
@@
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
         raise Exception("CMake is not installed, please install it first.")
     return cmake_path

843-857: Honor USE_CUDA=False and explicitly disable CUDA when using Metal.

Current logic enables CUDA if CUDA_HOME is set even when USE_CUDA=0, and Metal path doesn’t force USE_CUDA OFF.

-        if USE_METAL:
+        if USE_METAL:
             content_lines += [
                 "set(USE_METAL ON)",
                 "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
             ]
-        elif USE_ROCM:
+        elif USE_ROCM:
             content_lines += [
                 f"set(USE_ROCM {ROCM_HOME})",
                 "set(USE_CUDA OFF)",
             ]
-        elif CUDA_HOME:
+        elif USE_CUDA and CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]
♻️ Duplicate comments (4)
.github/workflows/ci.yml (1)

41-45: Remove invalid --no-user and avoid fragile GPU deps in format-check.

-          python -m pip install --upgrade pip --no-user
+          python -m pip install --upgrade pip
-          [[ -f requirements-test.txt ]] && \
-            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
+          [[ -f requirements-test.txt ]] && \
+            PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt
-          pip install flash_attn==2.5.8 --no-user --no-build-isolation
+          # GPU-specific deps not needed for formatting
+          true
@@
-          pip install flash_attn==2.5.8 --no-user --no-build-isolation
-          pip install . --no-user
+          # Skip GPU-only deps here
+          pip install .
@@
-        pip install . --no-user -v
+        pip install . -v
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=ON; cd ..
+        # Do not force CUDA for formatting
+        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=OFF; cd ..

Also applies to: 101-103, 109-109

.github/workflows/metal_ci.yml (2)

84-86: Don’t source .venv when using setup-uv with activation.

The UV action already activates the environment; .venv may not exist.

-        source .venv/bin/activate
-        uv pip install -v --no-build-isolation .
+        uv pip install -v --no-build-isolation .

69-69: Fix cache key: matrix.os is undefined (actionlint error).

Use runner context to avoid runtime failure.

-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-${{ runner.arch }}-py${{ env.PYTHON_VERSION }}
+        restore-keys: |
+          ${{ github.workflow }}-${{ github.job }}-${{ runner.os }}-
setup.py (1)

36-44: Boolean env parser misinterprets explicit false values (and ignores empty string).

if env := os.environ.get(name): skips ''; elif env in ['', ...] is unreachable. This can enable backends unintentionally.

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ("1", "true", "on", "yes", "y"):
+        return True
+    if s in ("0", "false", "off", "no", "n", ""):
+        return False
+    return default
🧹 Nitpick comments (13)
.github/workflows/amd_ci.yml (4)

11-11: Don’t hard-disable AMD CI; gate via repo var or input.

Replace if: false with a controllable switch so maintainers can re-enable without code changes.

-    if: false
+    if: ${{ vars.DISABLE_AMD_CI != '1' }}

65-65: Same here: avoid permanent disable.

Use a variable-gated condition instead of if: false.

-    if: false
+    if: ${{ vars.DISABLE_AMD_CI != '1' }}

24-26: Upgrade setup-python to v5.

actions/setup-python@v2 is deprecated.

-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

Also applies to: 79-81


15-15: Trim trailing whitespace in permissions.

-      contents: write 
+      contents: write
.github/workflows/ci.yml (4)

10-10: Avoid permanent disable; gate via a repo variable.

-    if: false
+    if: ${{ vars.DISABLE_NVIDIA_CI != '1' }}

66-66: Same for build-test-nvidia: use a toggle instead of if: false.

-    if: false
+    if: ${{ vars.DISABLE_NVIDIA_CI != '1' }}

23-25: Update to actions/setup-python@v5.

-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ env.PYTHON_VERSION }}

Also applies to: 80-82


14-14: Trailing space after contents: write.

-      contents: write 
+      contents: write
.github/workflows/metal_ci.yml (2)

13-13: Trim trailing whitespace.

-      contents: write 
+      contents: write

40-40: Consider generator parity with setup.py.

If the build expects Ninja elsewhere, specify -G Ninja here too for consistency, or rely on default uniformly.

setup.py (3)

51-53: Metal detection: prefer platform.system()/platform.machine().

platform.mac_ver() is brittle; simplify and be explicit.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == "Linux"
+MAYBE_METAL = (platform.system() == "Darwin" and platform.machine() == "arm64")

900-903: Classifiers/platforms: include macOS when Metal builds are possible.

-    platforms=[
-        "Environment :: GPU :: NVIDIA CUDA" if not USE_ROCM else "Environment :: GPU :: AMD ROCm",
-        "Operating System :: POSIX :: Linux",
-    ],
+    platforms=[
+        ("Environment :: GPU :: NVIDIA CUDA" if (USE_CUDA and not USE_ROCM) else
+         "Environment :: GPU :: AMD ROCm" if USE_ROCM else
+         "Environment :: GPU :: Apple Metal"),
+        ("Operating System :: MacOS" if USE_METAL else "Operating System :: POSIX :: Linux"),
+    ],

103-106: Long exception message (TRY003).

Consider shortening or using a constant; non-blocking.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd8a47bf287afd3a2435815e3b9370d4e2ec4d49 and 608fef209e5a99d4fe6a6cc6d1feb5b1abcf7092.

📒 Files selected for processing (4)
  • .github/workflows/amd_ci.yml (2 hunks)
  • .github/workflows/ci.yml (2 hunks)
  • .github/workflows/metal_ci.yml (1 hunks)
  • setup.py (11 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

69-69: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

python-version: ${{ env.PYTHON_VERSION }}

- name: Ensure venv (local & persistent)
run: uv pip install -r requirements-test.txt -r requirements-build.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guard optional requirements to avoid failures if the file is missing.

-      run: uv pip install -r requirements-test.txt -r requirements-build.txt
+      run: |
+        set -e
+        [[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt
+        [[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: uv pip install -r requirements-test.txt -r requirements-build.txt
run: |
set -e
[[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt
[[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt
🤖 Prompt for AI Agents
In .github/workflows/metal_ci.yml around line 80, the step currently tries to
install optional requirement files unconditionally which will fail the job if
those files are missing; change the run command to detect each optional
requirements file (e.g., requirements-test.txt and requirements-build.txt) and
only include them in the pip install command if they exist (use a shell
conditional to build the -r arguments or test -f before calling pip), ensuring
that missing files are skipped but real install errors still fail the job.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
setup.py (1)

271-279: get_cmake_path can raise TypeError when cmake is missing

shutil.which() returns None; os.path.exists(None) throws. Guard None explicitly.

 def get_cmake_path() -> str:
@@
-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
-        raise Exception("CMake is not installed, please install it first.")
-    return cmake_path
+    cmake_path = shutil.which("cmake")
+    if not cmake_path or not os.path.exists(cmake_path):
+        raise RuntimeError("CMake is not installed or not on PATH.")
+    return cmake_path
♻️ Duplicate comments (5)
.github/workflows/metal_ci.yml (3)

84-85: Don’t source .venv with setup-uv; it’s already activated

.setup-uv with activate-environment: true exports the env; .venv may not exist.

-        source .venv/bin/activate
-        uv pip install -v --no-build-isolation .
+        uv pip install -v --no-build-isolation .

69-69: Fix actionlint error: undefined matrix.os in cache key

No matrix is defined; this breaks actionlint and can impact cache reuse.

-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}

79-81: Guard optional requirement files

Unconditionally installing both files fails if one is absent.

-      run: uv pip install -r requirements-test.txt -r requirements-build.txt
+      run: |
+        set -euo pipefail
+        [[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt
+        [[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt
setup.py (2)

51-62: Prefer feature-based Metal detection over arch-only heuristic

Intel Macs can support Metal; arch-only check excludes them. Centralize detection (Darwin + Metal.framework presence) and reuse.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+
+def _has_metal() -> bool:
+    if platform.system() != 'Darwin':
+        return False
+    return os.path.exists('/System/Library/Frameworks/Metal.framework')
+
+MAYBE_METAL = _has_metal()

36-43: Boolean env parsing bug: empty string never reaches the “false” branch

The truthy guard prevents '' from being treated as false; explicit “0/false/off/''” can incorrectly return default.

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default
🧹 Nitpick comments (4)
.github/workflows/metal_ci.yml (2)

12-14: Drop unnecessary write permission and fix trailing space

This job doesn't publish artifacts; reduce to read-only and remove trailing space on Line 13.

-    permissions:
-      contents: write 
+    permissions:
+      contents: read

71-78: Ensure Ninja is present or fall back to default generator

CMake is invoked with “-G Ninja” in setup.py; missing Ninja will fail on some mac runners.

     - name: Install python via uv
       uses: astral-sh/setup-uv@v6
       with:
         enable-cache: true
         ignore-nothing-to-cache: true
         activate-environment: true
         python-version: ${{ env.PYTHON_VERSION }}
+
+    - name: Ensure Ninja
+      run: |
+        if ! command -v ninja >/dev/null 2>&1; then
+          brew update && brew install ninja
+        fi

Optionally, make setup.py fall back to default generator if Ninja is absent.

setup.py (2)

823-825: Prefer Python3_EXECUTABLE for CMake

FindPython prefers Python3_EXECUTABLE; improves compatibility across CMake toolchains.

-            f"-DPython_EXECUTABLE={sys.executable}",
+            f"-DPython3_EXECUTABLE={sys.executable}",

844-859: Always emit mutually exclusive GPU flags

When CUDA is selected, set USE_ROCM OFF; when ROCm is selected, set USE_CUDA OFF; when neither is selected, set both OFF explicitly to avoid stale cache effects.

         if USE_METAL:
             content_lines += [
                 "set(USE_METAL ON)",
-                "set(USE_ROCM OFF)",
+                "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
             ]
         elif USE_ROCM:
             content_lines += [
                 f"set(USE_ROCM {ROCM_HOME})",
                 "set(USE_CUDA OFF)",
             ]
         elif CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]
+        else:
+            content_lines += [
+                "set(USE_CUDA OFF)",
+                "set(USE_ROCM OFF)",
+            ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 608fef209e5a99d4fe6a6cc6d1feb5b1abcf7092 and 4c42d7eea70d878a010888bc4944585c059109e7.

📒 Files selected for processing (2)
  • .github/workflows/metal_ci.yml (1 hunks)
  • setup.py (11 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

69-69: property "os" is not defined in object type {}

(expression)

🪛 YAMLlint (1.37.1)
.github/workflows/metal_ci.yml

[error] 13-13: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
setup.py

103-105: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-test-metal
  • GitHub Check: build-test-amd
  • GitHub Check: bot-task
🔇 Additional comments (2)
setup.py (2)

102-111: Linux-only GPU requirement may be too strict

This hard-errors on Linux without CUDA/ROCM even for CPU-only users or CI. Confirm this is intentional; otherwise downgrade to warning or gate behind env.

If relaxing is desired:

-if IS_LINUX and not (CUDA_HOME or ROCM_HOME):
-    raise ValueError(
-        "Failed to automatically detect CUDA or ROCM installation. Please set the CUDA_HOME or ROCM_HOME environment variable manually (e.g., export CUDA_HOME=/usr/local/cuda or export ROCM_HOME=/opt/rocm)."
-    )
+if IS_LINUX and not (CUDA_HOME or ROCM_HOME):
+    logger.warning("No CUDA/ROCM detected; building CPU-only components.")

700-710: Remove CythonExtension from ext_modules when MAYBE_METAL is true

setup.py returns early for Metal without producing the Cython artifact, but ext_modules still declares CythonExtension — this can break packaging/editable installs.

File: setup.py (around lines 700–710)

-    ext_modules=[
-        CMakeExtension("TileLangCXX", sourcedir="."),
-        CythonExtension("TileLangCython", sourcedir="."),
-    ],
+    ext_modules=[
+        CMakeExtension("TileLangCXX", sourcedir="."),
+    ] + ([] if MAYBE_METAL else [CythonExtension("TileLangCython", sourcedir=".")]),

Alternative: keep the extension but provide a stub .so or mark the extension optional so build_ext is satisfied.

setup.py Outdated
Comment on lines 762 to 760
# fixme: aarch64 darwin needs something like `-Lxxx -lpython3.12`
command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
# logger.info(command)
os.system(command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix Darwin link flags for Cython build (non-Metal macs)

Using “-shared” on macOS is incorrect; use -dynamiclib and allow undefined symbols to Python.

-                            command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
-                            # logger.info(command)
-                            os.system(command)
+                            if sys.platform == "darwin":
+                                ldflags = "-dynamiclib -undefined dynamic_lookup"
+                            else:
+                                ldflags = "-shared -pthread -fPIC"
+                            command = f"{cc} {ldflags} -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
+                            logger.info(command)
+                            subprocess.check_call(command, shell=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# fixme: aarch64 darwin needs something like `-Lxxx -lpython3.12`
command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
# logger.info(command)
os.system(command)
# fixme: aarch64 darwin needs something like `-Lxxx -lpython3.12`
if sys.platform == "darwin":
ldflags = "-dynamiclib -undefined dynamic_lookup"
else:
ldflags = "-shared -pthread -fPIC"
command = f"{cc} {ldflags} -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
logger.info(command)
subprocess.check_call(command, shell=True)
🧰 Tools
🪛 Ruff (0.12.2)

765-765: Starting a process with a shell, possible injection detected

(S605)

🤖 Prompt for AI Agents
In setup.py around lines 762-765, the build command uses the Linux-style
"-shared" which is incorrect on macOS; change the link flags for Darwin to use
"-dynamiclib" and allow undefined symbols to be resolved by Python by adding
"-undefined dynamic_lookup" (and keep other flags as-is or remove -fPIC if
desired for macOS), so on macOS construct the command with -dynamiclib and
-undefined dynamic_lookup instead of -shared before invoking os.system(command).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
setup.py (2)

674-683: Copy macOS-built Python extension modules (.so) too; not just .dylib.

On macOS, C extensions are typically .so. Include both patterns.

-import glob
-files = glob.glob("*.so" if IS_LINUX else "*.dylib")
+import glob
+files = glob.glob("*.so")
+if not IS_LINUX:
+    files += glob.glob("*.dylib")

839-854: CMake GPU toggles: explicitly disable unused backends and honor USE_CUDA flag.

  • In Metal branch, also set USE_CUDA OFF.
  • In CUDA branch, respect USE_CUDA rather than presence of CUDA_HOME alone.
-        if USE_METAL:
+        if USE_METAL:
             content_lines += [
                 "set(USE_METAL ON)",
                 "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
             ]
         elif USE_ROCM:
@@
-        elif CUDA_HOME:
+        elif USE_CUDA and CUDA_HOME:
             content_lines += [
                 f"set(USE_CUDA {CUDA_HOME})",
                 "set(USE_ROCM OFF)",
             ]
+        else:
+            content_lines += [
+                "set(USE_METAL OFF)",
+                "set(USE_ROCM OFF)",
+                "set(USE_CUDA OFF)",
+            ]
♻️ Duplicate comments (2)
setup.py (2)

36-44: Boolean env parsing still mishandles empty strings and unrecognized values; treat true/false explicitly.

Current truthiness gate skips empty strings; also missing yes/no aliases. Use explicit parsing.

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default

753-761: Fix Darwin link flags for Cython artifact and avoid shell-based os.system.

Use -dynamiclib and -undefined dynamic_lookup on macOS; prefer subprocess.check_call without shell.

-                            cython = get_cython_compiler()
+                            cython = get_cython_compiler()
                             if cython is None:
                                 raise Exception("Cython is not installed, please install it first.")
-                            os.system(f"{cython} {cython_wrapper_path} --cplus -o {source_path}")
+                            subprocess.check_call([cython, str(cython_wrapper_path), "--cplus", "-o", str(source_path)])
                             python_include_path = sysconfig.get_path("include")
                             cc = get_cplus_compiler()
-                            if MAYBE_METAL:
-                                cc += ' -Wl,-undefined,dynamic_lookup'
-                            command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
-                            logger.info(command)
-                            os.system(command)
+                            if sys.platform == "darwin":
+                                cmd = [
+                                    cc, "-dynamiclib", "-undefined", "dynamic_lookup",
+                                    "-fwrapv", "-O2", "-Wall", "-fno-strict-aliasing",
+                                    "-I", python_include_path, str(source_path), "-o", str(temp_path)
+                                ]
+                            else:
+                                cmd = [
+                                    cc, "-shared", "-pthread", "-fPIC",
+                                    "-fwrapv", "-O2", "-Wall", "-fno-strict-aliasing",
+                                    "-I", python_include_path, str(source_path), "-o", str(temp_path)
+                                ]
+                            logger.info(" ".join(cmd))
+                            subprocess.check_call(cmd)
🧹 Nitpick comments (3)
setup.py (3)

56-66: Refine Metal default detection; avoid platform.mac_ver()[2] and gate by OS/arch/version.

Arm64-only check is brittle. Prefer feature/OS checks and MPS minimum macOS.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+def _is_apple_silicon_macos(min_ver: str = "12.3") -> bool:
+    if platform.system() != "Darwin":
+        return False
+    try:
+        from packaging.version import Version
+        mac_ver = platform.mac_ver()[0] or "0"
+        return platform.machine().lower() in ("arm64", "aarch64") and Version(mac_ver) >= Version(min_ver)
+    except Exception:
+        return platform.machine().lower() in ("arm64", "aarch64")
+MAYBE_METAL = _is_apple_silicon_macos()

Also consider centralizing this helper (shared with target selection) to keep detection consistent across files.


920-930: Optional: expand classifiers/platforms for Metal/macOS builds.

Consider adding macOS classifier and GPU backend markers reflecting Metal when enabled.


46-69: Optional: document env var precedence and defaults in README.

Clarify how USE_CUDA/USE_ROCM/USE_METAL interact and defaults per platform to reduce surprise.

I can draft a short “Build matrix and env vars” section for README if helpful.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c42d7eea70d878a010888bc4944585c059109e7 and 7037ce81e29c74ac7a31a48b1e795aca5254e496.

📒 Files selected for processing (1)
  • setup.py (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
setup.py (1)
tilelang/env.py (1)
  • get (136-139)
🪛 Ruff (0.12.2)
setup.py

108-110: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-test-metal
  • GitHub Check: format-check
  • GitHub Check: format-check
🔇 Additional comments (3)
setup.py (3)

400-401: LGTM: Linux-only rpath patch guard.


70-86: LGTM: platform-specific prebuilt artifact lists and .dSYM cleanup.


619-623: LGTM: Extension kwargs passthrough for CMakeExtension.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
testing/python/metal/test_metal_codegen.py (1)

52-53: Tensor dimensions are correct for matrix multiplication.

The tensor dimensions in the test are properly set up for matrix multiplication A @ B:

  • a has shape (M, K)
  • b has shape (K, N)
  • c has shape (M, N)

This correctly implements the mathematical operation C = A × B where the inner dimension K is consistent.

Also applies to: 55-57

🧹 Nitpick comments (2)
testing/python/metal/test_metal_codegen.py (2)

18-20: Consider memory alignment for shared memory allocations.

The shared memory allocations for A_shared and B_shared may benefit from memory alignment optimizations to improve performance on Metal. Consider aligning the allocation sizes to powers of 2 or Metal's preferred alignment boundaries.


79-80: Add error message when MPS is not available.

The test silently does nothing when MPS is not available. Consider adding an informative message or skip marker to clarify why tests aren't running.

 if __name__ == "__main__":
     if torch.mps.is_available():
         tilelang.testing.main()
+    else:
+        print("MPS not available. Skipping Metal backend tests.")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7037ce81e29c74ac7a31a48b1e795aca5254e496 and 5dafa7782e286ebec4ba8a41cc41060259260d34.

📒 Files selected for processing (1)
  • testing/python/metal/test_metal_codegen.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
testing/python/metal/test_metal_codegen.py (6)
tilelang/jit/__init__.py (1)
  • jit (241-314)
tilelang/language/allocate.py (2)
  • alloc_shared (21-36)
  • alloc_fragment (53-64)
tilelang/language/fill.py (1)
  • clear (24-48)
tilelang/language/pipeline.py (1)
  • Pipelined (9-46)
tilelang/language/copy.py (1)
  • copy (84-152)
tilelang/jit/kernel.py (1)
  • kernel_source (470-471)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-test-amd
  • GitHub Check: build-test-metal
  • GitHub Check: bot-task
🔇 Additional comments (2)
testing/python/metal/test_metal_codegen.py (2)

63-63: Good test coverage for kernel source verification.

The assertion that kernel_source is not None properly validates that the Metal backend generates source code as expected.


25-26: Verify coalesced_width=2 for Metal backend.
T.copy uses coalesced_width=2 in testing/python/metal/test_metal_codegen.py (lines 25–26, 31). Confirm that 2 is the optimal coalescing width for the Metal backend and the tensor element type; if not, update to the backend-specific value or make it configurable.



def test_gemm_float16():
assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

High tolerance values for float16 and int32 tests.

The tolerance value of atol=1 for float16 and int32 tests seems quite high. For int32, the results should match exactly (atol=0), and for float16, a smaller tolerance might be more appropriate to catch potential precision issues.

 def test_gemm_float16():
-    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1)
+    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='float16', atol=1e-2)
 
 
 def test_gemm_int32():
-    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='int32', atol=1)
+    assert_gemm(1024, 1024, 1024, 16, 16, 16, dtype='int32', atol=0)

Also applies to: 75-75

🤖 Prompt for AI Agents
In testing/python/metal/test_metal_codegen.py around lines 71 and 75, the tests
use an overly large absolute tolerance (atol=1) for float16 and int32; change
the int32 assertions to use atol=0 (exact match) and tighten the float16
tolerance (e.g., atol=1e-2 or another small value appropriate for float16
precision) so the tests detect precision regressions while still allowing
expected float16 rounding error.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
setup.py (1)

279-284: Fix None-path check for CMake discovery.

os.path.exists(None) raises TypeError if cmake not found.

Apply:

-    cmake_path = shutil.which("cmake")
-    if not os.path.exists(cmake_path):
-        raise Exception("CMake is not installed, please install it first.")
-    return cmake_path
+    cmake_path = shutil.which("cmake")
+    if not cmake_path:
+        raise Exception("CMake is not installed, please install it first.")
+    return cmake_path
tilelang/jit/kernel.py (1)

47-48: Type hint mismatch: include "torch" in execution_backend Literal.

The code accepts "torch" at runtime but type hints exclude it.

Apply:

-        execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc"] = "cython",
+        execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc", "torch"] = "cython",

Also update the same Literal in from_database signature if you plan to support "torch" there later, or explicitly document it’s unsupported.

🧹 Nitpick comments (12)
tilelang/carver/arch/cpu.py (1)

23-26: Sort all to satisfy Ruff RUF022.

Keep exports alphabetically ordered.

Apply:

 __all__ = [
-    'is_cpu_arch',
-    'CPU',
+    'CPU',
+    'is_cpu_arch',
 ]

Based on static analysis hints (RUF022).

test.sh (1)

1-3: Add shebang and strict mode; fail fast on errors.

Improves portability and avoids running the second command if the first fails.

Apply:

+#!/usr/bin/env bash
+set -euo pipefail
+
 python test_metal_perf.py
 
 PYTORCH_MPS_PREFER_METAL=1 python test_metal_perf.py
tilelang/carver/arch/cdna.py (1)

35-38: Sort all to satisfy Ruff RUF022.

Keep exports alphabetically ordered.

Apply:

 __all__ = [
-    'is_cdna_arch',
-    'CDNA',
+    'CDNA',
+    'is_cdna_arch',
 ]

Based on static analysis hints (RUF022).

tilelang/carver/arch/metal.py (1)

17-20: Sort all to satisfy Ruff RUF022.

Keep exports alphabetically ordered.

Apply:

 __all__ = [
-    'is_metal_arch',
-    'METAL',
+    'METAL',
+    'is_metal_arch',
 ]

Based on static analysis hints (RUF022).

setup.py (2)

756-761: Use correct Darwin link mode and avoid os.system; compile Cython wrapper robustly.

On macOS, use -dynamiclib -undefined dynamic_lookup; prefer subprocess.check_call.

Apply:

-                            cc = get_cplus_compiler()
-                            if MAYBE_METAL:
-                                cc += ' -Wl,-undefined,dynamic_lookup'
-                            command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
-                            logger.info(command)
-                            os.system(command)
+                            cc = get_cplus_compiler()
+                            if not cc:
+                                raise Exception("No C++ compiler found for Cython JIT adapter.")
+                            if sys.platform == "darwin":
+                                ldflags = "-dynamiclib -undefined dynamic_lookup"
+                            else:
+                                ldflags = "-shared -pthread -fPIC"
+                            command = f"{cc} {ldflags} -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
+                            logger.info(command)
+                            subprocess.check_call(command, shell=True)

903-905: Update platform classifiers to include macOS when Metal is enabled.

Add MacOS classifier to reflect new backend support.

Apply:

     platforms=[
         "Environment :: GPU :: NVIDIA CUDA" if not USE_ROCM else "Environment :: GPU :: AMD ROCm",
-        "Operating System :: POSIX :: Linux",
+        "Operating System :: POSIX :: Linux",
+        "Operating System :: MacOS",
     ],
tilelang/jit/adapter/__init__.py (1)

6-6: Avoid importing heavy torch path at adapter package import time.

Importing MetalKernelAdapter here pulls torch for all users (even non-Metal). Prefer lazy import at call sites to reduce import-time side effects.

tilelang/carver/arch/cuda.py (1)

150-159: LGTM: Explicit public exports.

Explicit all improves API clarity. Optional: sort entries.

tilelang/utils/target.py (2)

46-52: Strengthen Metal availability detection beyond arch check.

Relying on mac_ver arm64 is brittle; check macOS and Metal framework presence to better reflect availability.

Apply:

-from platform import mac_ver
+from platform import mac_ver, system
+import ctypes.util
@@
-def check_metal_availability() -> bool:
-    mac_release, _, arch = mac_ver()
-    if not mac_release:
-        return False
-    # todo: check torch version?
-    return arch == 'arm64'
+def check_metal_availability() -> bool:
+    if system() != 'Darwin':
+        return False
+    # Feature check: Metal framework present
+    try:
+        return bool(ctypes.util.find_library("Metal"))
+    except Exception:
+        return True

84-88: Error message consistency: say “Metal” (not MPS) to match target names.

User-visible clarity; target name is "metal".

Apply:

-        else:
-            raise ValueError("No CUDA or HIP or MPS available on this system.")
+        else:
+            raise ValueError("No CUDA, HIP, or Metal available on this system.")
tilelang/jit/adapter/cython/adapter.py (1)

24-29: Import-only improvement: this module compiles at import; avoid forcing C++ compiler on mac/metal users.

Top-level compilation triggers even when backend isn’t used. Consider lazy-compiling inside CythonKernelAdapter init.

tilelang/carver/arch/__init__.py (1)

34-35: Guard MPS detection for robustness across builds.
Some builds may not expose torch.mps; a defensive hasattr check avoids attribute errors.

Apply:

-    elif torch.mps.is_available():
+    elif hasattr(torch, "mps") and torch.mps.is_available():
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e73725d315d04c2b39c10ea4284b10bf352f41d7 and 6226096.

📒 Files selected for processing (30)
  • .github/workflows/metal_ci.yml (1 hunks)
  • CMakeLists.txt (3 hunks)
  • install_metal.sh (1 hunks)
  • setup.py (11 hunks)
  • src/target/rt_mod_metal.cc (1 hunks)
  • test.sh (1 hunks)
  • test_metal_perf.py (1 hunks)
  • testing/python/metal/test_metal_codegen.py (1 hunks)
  • tilelang/autotuner/tuner.py (2 hunks)
  • tilelang/cache/kernel_cache.py (1 hunks)
  • tilelang/carver/arch/__init__.py (3 hunks)
  • tilelang/carver/arch/cdna.py (1 hunks)
  • tilelang/carver/arch/cpu.py (1 hunks)
  • tilelang/carver/arch/cuda.py (1 hunks)
  • tilelang/carver/arch/metal.py (1 hunks)
  • tilelang/contrib/cc.py (2 hunks)
  • tilelang/engine/lower.py (1 hunks)
  • tilelang/jit/__init__.py (3 hunks)
  • tilelang/jit/adapter/__init__.py (1 hunks)
  • tilelang/jit/adapter/cython/adapter.py (3 hunks)
  • tilelang/jit/adapter/torch/__init__.py (1 hunks)
  • tilelang/jit/adapter/torch/metal.py (1 hunks)
  • tilelang/jit/adapter/utils.py (1 hunks)
  • tilelang/jit/adapter/wrapper.py (3 hunks)
  • tilelang/jit/kernel.py (4 hunks)
  • tilelang/profiler/bench.py (4 hunks)
  • tilelang/testing/__init__.py (1 hunks)
  • tilelang/utils/device.py (1 hunks)
  • tilelang/utils/target.py (4 hunks)
  • tilelang/utils/tensor.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/target/rt_mod_metal.cc
🚧 Files skipped from review as they are similar to previous changes (12)
  • tilelang/autotuner/tuner.py
  • tilelang/utils/tensor.py
  • tilelang/contrib/cc.py
  • tilelang/utils/device.py
  • install_metal.sh
  • tilelang/jit/adapter/utils.py
  • tilelang/profiler/bench.py
  • testing/python/metal/test_metal_codegen.py
  • tilelang/cache/kernel_cache.py
  • CMakeLists.txt
  • tilelang/jit/init.py
  • tilelang/jit/adapter/torch/init.py
🧰 Additional context used
🧬 Code graph analysis (9)
tilelang/jit/adapter/wrapper.py (3)
tilelang/jit/adapter/utils.py (1)
  • is_metal_target (63-64)
tilelang/jit/adapter/cython/adapter.py (1)
  • lib_code (511-513)
tilelang/jit/adapter/libgen.py (1)
  • update_lib_code (54-55)
tilelang/jit/adapter/torch/metal.py (2)
tilelang/jit/adapter/base.py (1)
  • BaseKernelAdapter (8-55)
tilelang/engine/param.py (1)
  • KernelParam (12-103)
tilelang/carver/arch/__init__.py (1)
tilelang/carver/arch/metal.py (1)
  • METAL (9-14)
tilelang/jit/kernel.py (3)
tilelang/jit/__init__.py (1)
  • jit (241-314)
tilelang/jit/adapter/utils.py (1)
  • is_metal_target (63-64)
tilelang/jit/adapter/torch/metal.py (1)
  • MetalKernelAdapter (13-70)
tilelang/engine/lower.py (1)
tilelang/language/ast/ir.py (1)
  • target (1682-1713)
tilelang/jit/adapter/cython/adapter.py (2)
tilelang/jit/adapter/utils.py (4)
  • is_cuda_target (51-52)
  • is_hip_target (55-56)
  • is_cpu_target (59-60)
  • is_metal_target (63-64)
tilelang/contrib/cc.py (2)
  • get_cplus_compiler (68-90)
  • is_mac (93-94)
test_metal_perf.py (7)
tilelang/autotuner/tuner.py (1)
  • autotune (712-805)
tilelang/jit/__init__.py (1)
  • jit (241-314)
tilelang/language/allocate.py (2)
  • alloc_shared (21-36)
  • alloc_fragment (53-64)
tilelang/language/fill.py (1)
  • clear (24-48)
tilelang/language/pipeline.py (1)
  • Pipelined (9-46)
tilelang/language/copy.py (1)
  • copy (84-152)
tilelang/language/parallel.py (1)
  • Parallel (8-28)
tilelang/carver/arch/metal.py (3)
tilelang/language/ast/ir.py (1)
  • target (1682-1713)
tilelang/carver/arch/arch_base.py (1)
  • TileDevice (4-37)
tilelang/carver/template/base.py (1)
  • arch (152-159)
tilelang/jit/adapter/__init__.py (1)
tilelang/jit/adapter/torch/metal.py (1)
  • MetalKernelAdapter (13-70)
🪛 Ruff (0.13.1)
tilelang/carver/arch/cpu.py

23-26: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/jit/adapter/torch/metal.py

45-45: Avoid specifying long messages outside the exception class

(TRY003)

setup.py

108-110: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/carver/arch/__init__.py

2-2: from .cuda import * used; unable to detect undefined names

(F403)


3-3: from .cpu import * used; unable to detect undefined names

(F403)


4-4: from .cdna import * used; unable to detect undefined names

(F403)


5-5: from .metal import * used; unable to detect undefined names

(F403)


22-22: METAL may be undefined, or defined from star imports

(F405)


40-55: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)


41-41: is_cpu_arch may be undefined, or defined from star imports

(F405)


42-42: is_cuda_arch may be undefined, or defined from star imports

(F405)


43-43: is_volta_arch may be undefined, or defined from star imports

(F405)


44-44: is_ampere_arch may be undefined, or defined from star imports

(F405)


45-45: is_ada_arch may be undefined, or defined from star imports

(F405)


46-46: is_hopper_arch may be undefined, or defined from star imports

(F405)


47-47: is_tensorcore_supported_precision may be undefined, or defined from star imports

(F405)


48-48: has_mma_support may be undefined, or defined from star imports

(F405)


49-49: is_cdna_arch may be undefined, or defined from star imports

(F405)


50-50: is_metal_arch may be undefined, or defined from star imports

(F405)


51-51: CUDA may be undefined, or defined from star imports

(F405)


52-52: CDNA may be undefined, or defined from star imports

(F405)


53-53: METAL may be undefined, or defined from star imports

(F405)


54-54: CPU may be undefined, or defined from star imports

(F405)

tilelang/carver/arch/cdna.py

35-38: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

test_metal_perf.py

93-93: Do not assign a lambda expression, use a def

Rewrite torch_add as a def

(E731)


93-93: Function definition does not bind loop variable a

(B023)


93-93: Function definition does not bind loop variable b

(B023)


93-93: Function definition does not bind loop variable c

(B023)


100-100: Do not assign a lambda expression, use a def

Rewrite tl_add as a def

(E731)


100-100: Function definition does not bind loop variable jit_kernel

(B023)


100-100: Function definition does not bind loop variable a

(B023)


100-100: Function definition does not bind loop variable b

(B023)


100-100: Function definition does not bind loop variable c

(B023)


103-103: Do not assign a lambda expression, use a def

Rewrite torch_add as a def

(E731)


103-103: Function definition does not bind loop variable a

(B023)


103-103: Function definition does not bind loop variable b

(B023)


103-103: Function definition does not bind loop variable c

(B023)

tilelang/carver/arch/cuda.py

150-159: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/carver/arch/metal.py

17-20: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

tilelang/jit/adapter/__init__.py

6-6: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tilelang/utils/target.py

87-87: Avoid specifying long messages outside the exception class

(TRY003)

🪛 actionlint (1.7.7)
.github/workflows/metal_ci.yml

69-69: property "os" is not defined in object type {}

(expression)

🪛 GitHub Actions: CI Test on Metal
test_metal_perf.py

[error] 3-3: F401: 'time.sleep' imported but unused


[error] 8-8: F401: 'os' imported but unused


[error] 93-93: E731: Do not assign a lambda expression; use a def (here: torch_add)


[error] 93-93: B023: Function definition does not bind loop variable 'a'


[error] 93-93: B023: Function definition does not bind loop variable 'b'


[error] 93-93: B023: Function definition does not bind loop variable 'c'


[error] 100-100: E731: Do not assign a lambda expression; use a def (here: tl_add)


[error] 100-100: B023: Function definition does not bind loop variable 'jit_kernel'


[error] 100-100: B023: Function definition does not bind loop variable 'a'


[error] 100-100: B023: Function definition does not bind loop variable 'b'


[error] 100-100: B023: Function definition does not bind loop variable 'c'


[error] 103-103: E731: Do not assign a lambda expression; use a def (here: torch_add)


[error] 103-103: B023: Function definition does not bind loop variable 'a'


[error] 103-103: B023: Function definition does not bind loop variable 'b'


[error] 103-103: B023: Function definition does not bind loop variable 'c'

🪛 GitHub Actions: CI Test on AMD
test_metal_perf.py

[error] 3-3: F401: 'time.sleep' imported but unused. Remove unused import.


[error] 8-8: F401: 'os' imported but unused. Remove unused import.


[error] 93-93: E731 Do not assign a lambda expression, use a def


[error] 93-93: B023 Function definition does not bind loop variable a


[error] 93-93: B023 Function definition does not bind loop variable b


[error] 93-93: B023 Function definition does not bind loop variable c


[error] 100-100: E731 Do not assign a lambda expression, use a def


[error] 100-100: B023 Function definition does not bind loop variable a


[error] 100-100: B023 Function definition does not bind loop variable b


[error] 100-100: B023 Function definition does not bind loop variable c


[error] 103-103: E731 Do not assign a lambda expression, use a def


[error] 103-103: B023 Function definition does not bind loop variable a


[error] 103-103: B023 Function definition does not bind loop variable b


[error] 103-103: B023 Function definition does not bind loop variable c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format-check
🔇 Additional comments (25)
.github/workflows/metal_ci.yml (4)

80-80: Guard optional requirements to avoid failures when files are missing.

Prevents job failures if requirements files are absent.

Apply:

-      run: uv pip install -r requirements-test.txt -r requirements-build.txt
+      run: |
+        set -e
+        [[ -f requirements-test.txt ]] && uv pip install -r requirements-test.txt
+        [[ -f requirements-build.txt ]] && uv pip install -r requirements-build.txt

55-58: Force Metal builds in CI to avoid accidental skips.

Set USE_METAL for consistent enablement on mac runners.

Apply:

     env:
       CMAKE_C_COMPILER_LAUNCHER: ccache
       CMAKE_CXX_COMPILER_LAUNCHER: ccache
+      USE_METAL: "1"

83-86: Don’t source .venv when setup-uv activates the environment.

setup-uv with activate-environment=true already exports the env; .venv may not exist.

Apply:

-        source .venv/bin/activate
-        uv pip install -v --no-build-isolation .
+        uv pip install -v --no-build-isolation .

69-69: Fix invalid cache key reference to matrix.os (actionlint error).

No matrix is defined; use runner context.

Apply:

-        key: ${{ github.job }}-${{ matrix.os }}
+        key: ${{ github.job }}-${{ runner.os }}-${{ runner.arch }}

Based on actionlint hint.

tilelang/carver/arch/metal.py (1)

11-15: Call base initializer and populate device/platform for METAL.

Without super().init(), base TileDevice fields aren’t set; also expose device and platform like other arch backends.

Apply:

-from tvm.target import Target
+import tvm
+from tvm.target import Target
@@
 class METAL(TileDevice):
 
     def __init__(self, target: Target | str):
+        super().__init__()
         if isinstance(target, str):
             target = Target(target)
         self.target = target
+        # Align with CPU/CDNA patterns
+        device = tvm.runtime.metal(0)
+        if not device.exist:
+            raise RuntimeError("Cannot find Metal device 0.")
+        self.device: tvm.runtime.Device = device
+        self.platform: str = "METAL"

(Reiterating earlier feedback)

test_metal_perf.py (3)

21-21: Provide defaults for block sizes in matmul signature
tune() calls matmul(m,n,k,…) without block_* args, causing a TypeError. Add defaults, e.g.:

-def matmul(M, N, K, block_M, block_N, block_K, dtype="float32", accum_dtype="float"):
+def matmul(M, N, K, block_M=16, block_N=16, block_K=16, dtype="float32", accum_dtype="float"):

61-64: Fix MPS Event API usage
Use torch.mps.event.Event (not torch.mps.Event). For PyTorch ≥2.7 you can alternatively use torch.Event(enable_timing=True, device="mps"), but pick one API consistently.

     with torch.mps.profiler.profile(mode="interval,event", wait_until_completed=True):
-        start = torch.mps.Event(enable_timing=True)
-        end   = torch.mps.Event(enable_timing=True)
+        start = torch.mps.event.Event(enable_timing=True)
+        end   = torch.mps.event.Event(enable_timing=True)

93-105: Replace lambdas in loop with defs and bind loop vars (fix E731/B023, avoid late-binding bugs).

Resolves lint failures and correctness hazards.

Apply:

-            torch_add = lambda: torch.matmul(a, b, out=c)
-            torch_add()
-            r.append(benchmark(torch_add, n=100))
+            def torch_add(a=a, b=b, c=c):
+                return torch.matmul(a, b, out=c)
+            torch_add()
+            r.append(benchmark(torch_add, n=100))
@@
-            # warmup
-            tl_add = lambda: jit_kernel(a, b, c)
-            tl_add()
-
-            torch_add = lambda: torch.matmul(a, b, out=c)
-            torch_add()
+            # warmup
+            def tl_add(jit_kernel=jit_kernel, a=a, b=b, c=c):
+                return jit_kernel(a, b, c)
+            tl_add()
 
             r.append(benchmark(tl_add, n=100))

Based on pipeline failures (E731, B023).

setup.py (5)

179-213: PEP 440: avoid multiple '+' local segments when adding commit id.

Current logic can emit invalid versions like 0.1+ubuntu-22.04+abc. Consolidate into a single local part.

Apply:

-def get_tilelang_version(with_cuda=USE_CUDA,
-                         with_system_info=not MAYBE_METAL,
-                         with_commit_id=False) -> str:
+def get_tilelang_version(with_cuda=USE_CUDA,
+                         with_system_info=not MAYBE_METAL,
+                         with_commit_id=False) -> str:
     version = find_version(get_path(".", "VERSION"))
-    local_version_parts = []
+    local_version_parts = []
@@
-    if local_version_parts:
-        version += f"+{'.'.join(local_version_parts)}"
-
-    if with_commit_id:
+    if with_commit_id:
         commit_id = None
         try:
             commit_id = subprocess.check_output(['git', 'rev-parse', 'HEAD'],
                                                 stderr=subprocess.DEVNULL,
                                                 encoding='utf-8').strip()
         except subprocess.SubprocessError as error:
             logger.warning(f"Ignore commit id because failed to get git commit id: {str(error)}")
-        if commit_id:
-            version += f"+{commit_id}"
+        if commit_id:
+            local_version_parts.append(commit_id)
+
+    if local_version_parts:
+        version += f"+{'.'.join(local_version_parts)}"

36-43: Fix boolean env parsing (regression): empty string and more representations mishandled.

The walrus check skips the empty-string case and you don’t accept common yes/no variants. This regresses the previously addressed issue.

Apply:

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default

56-58: Replace arch-only Metal detection with feature-based detection; unify defaults.

Defaulting Metal on arm64 only excludes Intel Macs with Metal and enables it where Metal tooling may be absent. Prefer feature detection.

Apply:

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+
+def _is_metal_available() -> bool:
+    # Feature-based detection: macOS + Metal framework present
+    if platform.system() != 'Darwin':
+        return False
+    try:
+        import ctypes.util
+        return bool(ctypes.util.find_library("Metal")) or \
+               os.path.exists("/System/Library/Frameworks/Metal.framework")
+    except Exception:
+        # Best-effort on macOS
+        return True
+
+MAYBE_METAL = _is_metal_available()
@@
-USE_METAL = _read_bool_env("USE_METAL", MAYBE_METAL)
+USE_METAL = _read_bool_env("USE_METAL", MAYBE_METAL)

Follow-up: consider reusing a single helper across the repo (setup-safe, not importing package code). Based on learnings

Also applies to: 64-66


107-116: Gate Linux GPU toolchain checks by requested backends.

This raises on Linux even when both CUDA/ROCm are disabled. Guard by flags as previously suggested.

Apply:

 if USE_CUDA and not CUDA_HOME:
     raise ValueError(
         "CUDA support is enabled by default on linux if `USE_ROCM=False`," \
         " but CUDA_HOME is not set or detected.")
 
-# Ensure one of CUDA or ROCM is available
-if IS_LINUX and not (CUDA_HOME or ROCM_HOME):
+# Ensure one of CUDA or ROCM is available only if requested
+if IS_LINUX and (USE_CUDA or USE_ROCM) and not (CUDA_HOME or ROCM_HOME):
     raise ValueError(
-        "Failed to automatically detect CUDA or ROCM installation. Please set the CUDA_HOME or ROCM_HOME environment variable manually (e.g., export CUDA_HOME=/usr/local/cuda or export ROCM_HOME=/opt/rocm)."
+        "Failed to detect requested GPU backend. Set CUDA_HOME or ROCM_HOME "
+        "(e.g., export CUDA_HOME=/usr/local/cuda or export ROCM_HOME=/opt/rocm)."
     )

886-891: ext_modules gate ignored: always builds CythonExtension.

You compute ext_modules above but override it with a hard-coded list in setup(); Metal gating won’t take effect.

Apply:

 ext_modules = [
     CMakeExtension("TileLangCXX", sourcedir="."),
 ]
 if not MAYBE_METAL:
     ext_modules.append(CythonExtension("TileLangCython", sourcedir="."))
@@
-    ext_modules=[
-        CMakeExtension("TileLangCXX", sourcedir="."),
-        CythonExtension("TileLangCython", sourcedir="."),
-    ],
+    ext_modules=ext_modules,

Also applies to: 920-923

tilelang/engine/lower.py (1)

184-186: LGTM: Metal codegen hook added.

Correctly wires device_codegen_without_compile for metal via target.build.metal.

Ensure TVM is built with Metal target so "target.build.metal" is registered at runtime.

tilelang/jit/adapter/cython/adapter.py (1)

455-457: LGTM: map Metal target to torch mps device.

Device mapping aligns with PyTorch MPS.

tilelang/jit/kernel.py (1)

101-108: LGTM: backend validation includes torch.

Validation set matches new branch below.

tilelang/jit/adapter/wrapper.py (3)

6-7: Importing is_metal_target is correct and needed for routing.
No issues.


1069-1089: Minimal Metal wrapper stub is acceptable for now.
This no-op wrapper aligns with the Torch-based Metal path. If you later need host-side dispatch embedding (like CUDA/HIP), extend this class similarly; otherwise this is fine.

Please confirm that all Metal code paths exclusively use the Torch adapter and never rely on lib_code post-processing from this wrapper.


1129-1131: Correctly routes Metal targets to TLMetalSourceWrapper.
This integrates Metal into TLWrapper.wrap as expected.

tilelang/carver/arch/__init__.py (3)

2-5: Replace wildcard imports; they trigger Ruff (F403/F405) and risk namespace collisions.
Prefer explicit imports or module imports with curated re-exports to keep a clear API surface.

Example approach (module imports + re-export selected names):

-from .cuda import *
-from .cpu import *
-from .cdna import *
-from .metal import *
+from . import cuda as _cuda, cpu as _cpu, cdna as _cdna, metal as _metal
+# Re-export public names from submodules
+for _mod in (_cuda, _cpu, _cdna, _metal):
+    for _name in getattr(_mod, "__all__", []):
+        globals()[_name] = getattr(_mod, _name)

This removes F403/F405 while preserving the public API. As per coding guidelines.


21-22: Add Metal branch to get_arch — LGTM.
Handles Target(kind="metal") correctly.


40-55: All all entries are defined in their submodules
Every symbol listed in __all__ has a corresponding def or class in its respective file—no broken exports detected.

tilelang/jit/adapter/torch/metal.py (3)

28-31: Fix kernel naming and kernel cache scope.

  • Using func_or_mod.name breaks for IRModules.
  • _kernel must be instance-scoped to avoid cross-adapter leakage.

Apply:

-        self.kernel_global_source = kernel_global_source
-        self.kernel_name = func_or_mod.__name__ + '_kernel'
+        self.kernel_global_source = kernel_global_source
+        # Derive kernel base name safely
+        if hasattr(func_or_mod, "__name__"):
+            _base = func_or_mod.__name__
+        elif hasattr(func_or_mod, "get_global_vars"):
+            _gvs = list(func_or_mod.get_global_vars())
+            if not _gvs:
+                raise ValueError("Cannot determine kernel name from IRModule")
+            _base = _gvs[0].name_hint
+        else:
+            _base = type(func_or_mod).__name__
+        self.kernel_name = f"{_base}_kernel"
         self.verbose = verbose
+        # Instance-scoped cache
+        self._kernel = None
@@
-    _kernel = None

Also applies to: 50-50


35-46: Search device_mod for the matching kernel; don’t assert on the first function.
Also handle missing thread_extent gracefully and validate device_mod is provided.

Apply:

-        for var, func in device_mod.functions.items():
-            assert var.name_hint == self.kernel_name
-            thread_extent = func.attrs['thread_extent']
-            for tag, extent in thread_extent.items():
-                if "threadIdx" in tag:
-                    self.block_info["xyz".index(tag[-1])] = extent
-                elif "blockIdx" in tag:
-                    self.grid_info["xyz".index(tag[-1])] = extent
-            break
-        else:
-            raise AssertionError(f'no kernel with name {func_or_mod.__name__}')
+        if device_mod is None:
+            raise ValueError("device_mod is required to derive launch dimensions")
+        found = False
+        for var, func in device_mod.functions.items():
+            if var.name_hint != self.kernel_name:
+                continue
+            thread_extent = func.attrs.get("thread_extent", {})
+            for tag, extent in thread_extent.items():
+                if "threadIdx" in tag:
+                    self.block_info["xyz".index(tag[-1])] = extent
+                elif "blockIdx" in tag:
+                    self.grid_info["xyz".index(tag[-1])] = extent
+            found = True
+            break
+        if not found:
+            raise AssertionError(f"No kernel with name {self.kernel_name} found in device module")

52-70: Harden shader compilation and wrapper construction.

  • Validate kernel_global_source.
  • Add error handling around compile_shader and kernel lookup.
  • Avoid wraps() AttributeError for non-Python callables.

Apply:

     def _convert_torch_func(self) -> Callable:
 
-        if self._kernel is None:
-
-            _kernel = getattr(torch.mps.compile_shader(self.kernel_global_source), self.kernel_name)
-            _threads = [x * y for (x, y) in zip(self.block_info, self.grid_info)]
-
-            @wraps(_kernel)
-            def launcher(*args: torch.Tensor):
-
-                return _kernel(
-                    *args,
-                    threads=_threads,
-                    group_size=self.block_info,
-                )
-
-            self._kernel = launcher
-
-        return self._kernel
+        if self._kernel is None:
+            if not self.kernel_global_source:
+                raise RuntimeError("kernel_global_source is required for Metal shader compilation")
+            try:
+                compiled = torch.mps.compile_shader(self.kernel_global_source)
+                if not hasattr(compiled, self.kernel_name):
+                    raise AttributeError(f"Compiled shader does not contain kernel '{self.kernel_name}'")
+                _kernel = getattr(compiled, self.kernel_name)
+            except Exception as e:
+                raise RuntimeError(f"Failed to compile Metal shader for '{self.kernel_name}': {e}") from e
+            _threads = [x * y for (x, y) in zip(self.block_info, self.grid_info)]
+            @wraps(_kernel, assigned=(), updated=())
+            def launcher(*args: torch.Tensor):
+                return _kernel(
+                    *args,
+                    threads=_threads,
+                    group_size=self.block_info,
+                )
+            self._kernel = launcher
+        return self._kernel

Comment on lines 155 to 159
cc = get_cplus_compiler()
command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
dynamic_flag = '-Wl,-undefined,dynamic_lookup' if is_mac(
) else '-Wl,--unresolved-symbols=ignore-all'
command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing {dynamic_flag} -I{python_include_path} {source_path} -o {temp_path}"
os.system(command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle missing C++ compiler and fix Darwin link mode; avoid os.system.

get_cplus_compiler() can be None on macOS (per contrib.cc), causing a broken command. Use fallback discovery and Darwin-appropriate flags.

Apply:

-                    python_include_path = sysconfig.get_path("include")
-                    cc = get_cplus_compiler()
-                    dynamic_flag = '-Wl,-undefined,dynamic_lookup' if is_mac(
-                    ) else '-Wl,--unresolved-symbols=ignore-all'
-                    command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing {dynamic_flag} -I{python_include_path} {source_path} -o {temp_path}"
-                    os.system(command)
+                    python_include_path = sysconfig.get_path("include")
+                    cc = get_cplus_compiler()
+                    if cc is None:
+                        import shutil
+                        cc = shutil.which("clang++") or shutil.which("c++") or shutil.which("g++")
+                    if cc is None:
+                        raise RuntimeError("No C++ compiler found for Cython JIT adapter.")
+                    if is_mac():
+                        ldflags = "-dynamiclib -undefined dynamic_lookup"
+                    else:
+                        ldflags = "-shared -pthread -fPIC -Wl,--unresolved-symbols=ignore-all"
+                    command = f"{cc} {ldflags} -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
+                    import subprocess
+                    subprocess.check_call(command, shell=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cc = get_cplus_compiler()
command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
dynamic_flag = '-Wl,-undefined,dynamic_lookup' if is_mac(
) else '-Wl,--unresolved-symbols=ignore-all'
command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing {dynamic_flag} -I{python_include_path} {source_path} -o {temp_path}"
os.system(command)
python_include_path = sysconfig.get_path("include")
cc = get_cplus_compiler()
if cc is None:
import shutil
cc = shutil.which("clang++") or shutil.which("c++") or shutil.which("g++")
if cc is None:
raise RuntimeError("No C++ compiler found for Cython JIT adapter.")
if is_mac():
ldflags = "-dynamiclib -undefined dynamic_lookup"
else:
ldflags = "-shared -pthread -fPIC -Wl,--unresolved-symbols=ignore-all"
command = f"{cc} {ldflags} -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
import subprocess
subprocess.check_call(command, shell=True)
🧰 Tools
🪛 Ruff (0.13.1)

159-159: Starting a process with a shell, possible injection detected

(S605)

🤖 Prompt for AI Agents
In tilelang/jit/adapter/cython/adapter.py around lines 155-159,
get_cplus_compiler() can return None and the Darwin link flag is incorrect, and
using os.system is unsafe; update to detect/fallback to a compiler (e.g.,
shutil.which('clang++') or 'c++'), ensure compiler is non-None (raise or log
clear error if not found), use the proper Darwin link flag '-undefined
dynamic_lookup' (or pass through the linker via '-Wl,' only if needed) and build
the invocation with subprocess.run (prefer a list of args or shlex.quote the
paths) so you can check returncode and capture stderr/stdout for error reporting
instead of os.system.

Comment on lines +283 to +296
elif execution_backend == "torch":
assert is_metal_target(target)
adapter = MetalKernelAdapter(
params=artifact.params,
result_idx=out_idx,
# target=target,
func_or_mod=tilelang_func,
# host_mod=artifact.host_mod,
device_mod=artifact.device_mod,
kernel_global_source=artifact.kernel_source,
verbose=verbose,
# pass_configs=pass_configs,
# compile_flags=compile_flags,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

LGTM: Metal backend wiring via Torch adapter.

Asserts Metal target and passes required fields to MetalKernelAdapter.

Ensure public API and docs also include "torch" as a valid execution_backend.


🏁 Script executed:

#!/bin/bash
# Check that public APIs include "torch" in execution_backend hints
rg -n -C2 -P 'execution_backend:.*Literal\[[^]]*\]' tilelang/jit | sed -n '1,120p'

Length of output: 2373


Add “torch” to execution_backend Literal in public APIs
Update execution_backend type hints in tilelang/jit/kernel.py (line 47) and the corresponding definitions in init.py to include "torch" alongside "dlpack", "ctypes", "cython", and "nvrtc".

🤖 Prompt for AI Agents
In tilelang/jit/kernel.py around lines 283 to 296 and in the package
__init__.py, the public API type hints for execution_backend omit "torch";
update the Literal types (at line ~47 in kernel.py and the corresponding
execution_backend definitions in __init__.py) to include the string "torch"
alongside "dlpack", "ctypes", "cython", and "nvrtc" so the type signature
reflects the supported backend; ensure any exported type alias or function
signatures use the updated Literal so static type checkers and public docs
accept "torch".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
setup.py (2)

753-756: Write MD5 only after successful compile to avoid stale cache.

Currently MD5 is written before compilation; failed builds poison the cache.

-                            with open(md5_path, "w") as f:
-                                f.write(code_hash)
                             # compile the cython_wrapper.pyx file into .cpp
@@
-                            os.system(command)
+                            subprocess.check_call(command, shell=True)
@@
-                            # rename the temp file to the library file
+                            # rename the temp file to the library file
                             temp_path.rename(library_path)
+                            with open(md5_path, "w") as f:
+                                f.write(code_hash)

845-861: Disable CUDA in the METAL branch
Add "set(USE_CUDA OFF)" alongside "set(USE_ROCM OFF)" to ensure only one GPU backend is enabled.

 if USE_METAL:
     content_lines += [
         "set(USE_METAL ON)",
         "set(USE_ROCM OFF)",
+        "set(USE_CUDA OFF)",
     ]
🧹 Nitpick comments (2)
tilelang/utils/target.py (1)

87-91: Tighten auto-detect error and update docstrings to include Metal.

  • Message: say “Metal (MPS)” and keep short (ruff TRY003).
  • Docstrings above still mention only CUDA/HIP; please include Metal.
-            raise ValueError("No CUDA or HIP or MPS available on this system.")
+            raise ValueError("No CUDA, ROCm (HIP), or Metal (MPS) available.")

Docstring edits (Lines 57-69) to add “Metal” in summary, Args, Returns, and Raises.

setup.py (1)

909-913: Add macOS classifier when Metal is enabled.

Improve package metadata for Metal wheels.

-    platforms=[
-        "Environment :: GPU :: NVIDIA CUDA" if not USE_ROCM else "Environment :: GPU :: AMD ROCm",
-        "Operating System :: POSIX :: Linux",
-    ],
+    platforms=[
+        ("Environment :: GPU :: NVIDIA CUDA" if not USE_ROCM else "Environment :: GPU :: AMD ROCm"),
+        "Operating System :: POSIX :: Linux",
+        "Operating System :: MacOS :: MacOS X",
+    ],
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63a6bd0 and 7a52fc2.

📒 Files selected for processing (2)
  • setup.py (11 hunks)
  • tilelang/utils/target.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
setup.py (2)
tilelang/env.py (1)
  • get (136-139)
tilelang/engine/lower.py (1)
  • lower (192-244)
🪛 Ruff (0.13.1)
setup.py

110-112: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/utils/target.py

90-90: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-test-amd
  • GitHub Check: format-check
  • GitHub Check: format-check
🔇 Additional comments (7)
setup.py (5)

760-767: Use correct Darwin link flags and avoid os.system; add -bundle/-undefined dynamic_lookup.

“-shared” is invalid on macOS. Prefer subprocess and platform-specific ldflags.

-                            python_include_path = sysconfig.get_path("include")
-                            cc = get_cplus_compiler()
-                            if MAYBE_METAL:
-                                cc += ' -Wl,-undefined,dynamic_lookup'
-                            command = f"{cc} -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I{python_include_path} {source_path} -o {temp_path}"
-                            logger.info(command)
-                            os.system(command)
+                            python_include_path = sysconfig.get_path("include")
+                            cc = get_cplus_compiler()
+                            if sys.platform == "darwin":
+                                ldflags = ["-bundle", "-undefined", "dynamic_lookup"]
+                            else:
+                                ldflags = ["-shared", "-pthread", "-fPIC"]
+                            command = [cc, *ldflags, "-fwrapv", "-O2", "-Wall",
+                                       "-fno-strict-aliasing",
+                                       "-I", python_include_path,
+                                       str(source_path), "-o", str(temp_path)]
+                            logger.info(" ".join(command))
+                            subprocess.check_call(command)

36-44: Fix boolean env parsing (empty string bug; strip/yes/no support).

Current walrus check skips '' path; lack of strip and yes/no handling.

-def _read_bool_env(name: str, default: bool = False) -> bool:
-    if env := os.environ.get(name):
-        env = env.lower()
-        if env in ['on', '1', 'true']:
-            return True
-        elif env in ['', 'off', '0', 'false']:
-            return False
-    return default
+def _read_bool_env(name: str, default: bool = False) -> bool:
+    val = os.environ.get(name)
+    if val is None:
+        return default
+    s = val.strip().lower()
+    if s in ('1', 'true', 'on', 'yes', 'y'):
+        return True
+    if s in ('0', 'false', 'off', 'no', 'n', ''):
+        return False
+    return default

56-58: Replace arch-only Metal probe with OS/version/framework detection (keep in sync with utils/target).

Avoid drift; don’t rely on mac_ver()[2] only.

-IS_LINUX = platform.system() == 'Linux'
-MAYBE_METAL = platform.mac_ver()[2] == 'arm64'
+IS_LINUX = platform.system() == 'Linux'
+_mac_release, _, _mac_machine = platform.mac_ver()
+_has_metal = os.path.exists("/System/Library/Frameworks/Metal.framework")
+def _meets_mps_version(rel: str) -> bool:
+    try:
+        maj, minr, *_ = (int(x) for x in rel.split("."))
+        return (maj, minr) >= (12, 3)
+    except Exception:
+        return False
+MAYBE_METAL = (platform.system() == "Darwin" and _has_metal
+               and _mac_machine in ("arm64", "aarch64") and _meets_mps_version(_mac_release))

109-116: Don’t fail Linux builds when GPU backends are disabled.

Gate the “CUDA/ROCM required” error by user intent flags.

-# Ensure one of CUDA or ROCM is available
-if IS_LINUX and not (CUDA_HOME or ROCM_HOME):
+# Ensure one of CUDA or ROCM is available only if requested
+if IS_LINUX and (USE_CUDA or USE_ROCM) and not (CUDA_HOME or ROCM_HOME):
     raise ValueError(
-        "Failed to automatically detect CUDA or ROCM installation. Please set the CUDA_HOME or ROCM_HOME environment variable manually (e.g., export CUDA_HOME=/usr/local/cuda or export ROCM_HOME=/opt/rocm)."
+        "Requested GPU backend not detected. Set CUDA_HOME or ROCM_HOME."
     )

893-898: Bug: ext_modules gating ignored; setup() still hardcodes both extensions.

Pass the computed ext_modules to setup() so Metal gating is respected.

 ext_modules = [
     CMakeExtension("TileLangCXX", sourcedir="."),
 ]
 if not MAYBE_METAL:
     ext_modules.append(CythonExtension("TileLangCython", sourcedir="."))
@@
-    ext_modules=[
-        CMakeExtension("TileLangCXX", sourcedir="."),
-        CythonExtension("TileLangCython", sourcedir="."),
-    ],
+    ext_modules=ext_modules,

Verification:

#!/bin/bash
rg -nP 'setup\(' -n -C3 setup.py
rg -nP 'ext_modules\s*=' setup.py -n -C2

Also applies to: 927-930

tilelang/utils/target.py (2)

46-52: Replace arch-only Metal detection with feature-based check (OS/version/framework) and reuse across setup.py.

Current logic misses Intel Macs with Metal and doesn’t gate by macOS version; also risks drift with setup.py.

Apply:

-from platform import mac_ver
+import os
+import platform
+from platform import mac_ver
@@
-def check_metal_availability() -> bool:
-    mac_release, _, arch = mac_ver()
-    if not mac_release:
-        return False
-    # todo: check torch version?
-    return arch == 'arm64'
+def check_metal_availability() -> bool:
+    # Require macOS with Metal framework present; prefer Apple Silicon + macOS >= 12.3 (MPS).
+    if platform.system() != "Darwin":
+        return False
+    release, _, machine = mac_ver()
+    if not release:
+        return False
+    try:
+        major, minor, *_ = (int(x) for x in release.split("."))
+    except Exception:
+        major, minor = 0, 0
+    has_framework = os.path.exists("/System/Library/Frameworks/Metal.framework")
+    if not has_framework:
+        return False
+    # Prefer Apple Silicon for torch.mps; still allow feature-only Metal codegen if needed.
+    is_as = machine in ("arm64", "aarch64")
+    meets_version = (major, minor) >= (12, 3)
+    # If torch is available, respect its MPS availability.
+    try:
+        import torch  # type: ignore
+        if getattr(torch.backends, "mps", None) is not None:
+            return bool(torch.backends.mps.is_available())
+    except Exception:
+        pass
+    return is_as and meets_version

16-16: Approve changes; downstream “metal” support verified
Grep confirms is_metal_target and all adapter, kernel, lower, and arch code paths handle "metal".

@LeiWang1999
Copy link
Member

Do we really need ccache? It isn’t pre-installed on Metal devices.

@oraluben
Copy link
Contributor Author

oraluben commented Oct 5, 2025

Do we really need ccache? It isn’t pre-installed on Metal devices.

The ccache setup also handles cache upload and reuse. Without this, building tvm each time (on a 8 cpu GitHub Metal runner) takes a lot of time.

Btw I'm also trying to setup cache on CUDA CI. Since they're not GitHub runner, they seems not that useful: #939

@LeiWang1999
Copy link
Member

LeiWang1999 commented Oct 5, 2025

It's likely that ccache is really only useful when we need to compile tilelang from scratch multiple times, such as on CI machines. but it has now become a build dependency in cmake for metal, which feels a bit redundant for users to install an extra binary that cannot install from pip. can we optimize this part?

@oraluben
Copy link
Contributor Author

oraluben commented Oct 6, 2025

Github hosted runner has a 2000 minutes limit per month, and macos runner even applies a 10x factor of its time.

By default, macos aarch runner only have 3 CPUs so building full tvm and tilelang will instantly run that out.

Unfortunately ccache is not an optional optimization, but necessary to make the test works.

(Actually that's about ~100 runs per month, which does not sounds very sufficient. Should we only run Metal test on main?)

I also uses uv to get a faster venv preparation.

LeiWang1999
LeiWang1999 previously approved these changes Oct 7, 2025
LeiWang1999
LeiWang1999 previously approved these changes Oct 7, 2025
@LeiWang1999 LeiWang1999 merged commit 7fb0677 into tile-ai:main Oct 7, 2025
7 of 8 checks passed
@oraluben oraluben deleted the metal branch October 7, 2025 07:18
@coderabbitai coderabbitai bot mentioned this pull request Oct 9, 2025
21 tasks
@douyixuan
Copy link

Currently metal target only support tl.jit(execution_backend="torch")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants