Skip to content

Conversation

@oraluben
Copy link
Contributor

@oraluben oraluben commented Oct 5, 2025

Resolves #833
Closes #756

pip install . -v  # CUDA / Metal build on Linux / Mac
USE_ROCM=ON pip install . -v  # ROCm build
USE_CUDA=OFF pip install . -v  # CPU build

Detailed information could be found at the doc

This PR basically migrates to a fully cmake-based build system and eliminates python-based building process, and therefore simplifies the building and installing stage.

User and developers of tilelang could just install from source (without setting PYTHONPATH and so on).

  • Use one wheel for different python version via stable abi
    cp38-abi3 wheels for >= python 3.8 (needs Workaround limit api too high in tvm tvm#12)
  • CUDA wheel
    • ROCm wheel
    • Unify CUDA and ROCm wheel? Is that possible?
  • Metal wheel
  • Build cython ext
    • Fix Remove in-tree jit compile
  • git hash and cuda/rocm/metal extension in version. Support build-time toolchain info for CUDA. (e.g. in the artifacts)
    Now tilelang reads version (for cache, etc.) from the package version.
  • Validation
  • Cleanup
    • Remove tox-based build scripts (currently we may have three host platform (linux+{x86,aarch64}, metal), and for each platform we only need 1 wheel. That makes tox unnecessary for building different python versions and insufficient for building against different platform) and refractor the scripts to build locally.
  • CI related fixes
    • auditwheel and delocate for linux / darwin wheels
  • Update doc
    • Build wheel
    • Editable install
    • Use custom tvm
    • ...

Summary by CodeRabbit

  • New Features

    • Added a scheduled cross-platform CI workflow for daily and release wheel builds; introduced dynamic build-time version metadata.
  • Documentation

    • Installation guide updated: raised Python minimum, tightened CUDA requirements, clarified bundled vs external backend and nightly guidance.
  • Refactor

    • Modernized CMake and packaging flow; unified backend selection and simplified native/Python integration and library discovery.
  • Chores

    • Removed legacy install/build scripts, multi-version tox/docker workflows, and replaced monolithic packaging script with modern toolchain.

@github-actions
Copy link

github-actions bot commented Oct 5, 2025

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 5, 2025

Walkthrough

Migrates packaging to scikit-build-core, modernizes CMake (≥3.26) and TVM integration, removes legacy setup/tox/install scripts, requires a prebuilt cython_wrapper at import, centralizes third‑party discovery and dynamic version metadata, updates CI/Docker to uv/cibuildwheel flows, and adds a Dist GitHub Actions workflow.

Changes

Cohort / File(s) Summary
New CI distribution workflow
​.github/workflows/dist.yml
Adds Dist workflow using cibuildwheel matrix (ubuntu/macOS/arm), captures built wheel name and uploads artifact; sets concurrency/cancel, PYTHON_VERSION=3.12, CUDA_VERSION matrix, and NO_VERSION_LABEL behavior.
CMake & scikit-build migration
CMakeLists.txt, cmake/load_tvm.cmake, pyproject.toml, version_provider.py
Raises CMake min to 3.26, enables modern defaults and ccache, centralizes TVM loading (cmake/load_tvm.cmake), introduces object/shared targets and cython_wrapper wiring, switches build backend to scikit-build-core, and implements dynamic version metadata provider.
Packaging & distribution scripts
maint/scripts/pypi_distribution.sh, maint/scripts/local_distribution.sh, maint/scripts/docker_local_distribute.sh, maint/scripts/docker_pypi_distribute.sh, maint/scripts/pypi.manylinux.Dockerfile
Replace ad-hoc flows with strict set -eux scripts, adopt uv pip/venv handling, generate sdist/wheel via modern tooling, and run wheel repair (auditwheel/delocate) with multi-arch builder logic.
Large removals: legacy orchestration
setup.py, tox.ini, install_*.sh (install_cpu.sh,install_cuda.sh,install_rocm.sh,install_metal.sh), maint/scripts/*tox*.sh, maint/scripts/pypi.Dockerfile
Remove monolithic setup.py, legacy installers and tox-based multi-Python builders, and older Dockerfile; replaced by scikit-build/CI-driven workflows and scripts.
Env & library discovery refactor
tilelang/env.py, tilelang/libinfo.py, tilelang/__init__.py, tilelang/version.py (removed)
Add SITE_PACKAGES/THIRD_PARTY_ROOT/TL_LIBS and prepend_pythonpath; switch lib lookup to TL_LIBS with new find_lib_path(name: str, py_ext=False); move __version__ sourcing to importlib.metadata and remove tilelang/version.py.
Cython/JIT adapter simplification
tilelang/jit/adapter/cython/adapter.py
Remove dynamic runtime Cython compilation/cache logic; require/import prebuilt cython_wrapper, raising on ImportError.
Call-site import updates
tilelang/autotuner/tuner.py, tilelang/cache/kernel_cache.py
Update imports to take __version__ from tilelang instead of tilelang.version.
Lib loading API change
tilelang/libinfo.py
Replace DLL candidate discovery with TL_LIBS-driven search and platform-aware filename selection; change find_lib_path signature to find_lib_path(name: str, py_ext=False).
Requirements, .gitignore & deps cleanup
.gitignore, requirements-build.txt, requirements-dev.txt, requirements.txt
Rework build/dev deps for scikit-build/CMake flow (add scikit-build-core, uv, auditwheel/delocate), adjust runtime deps (add ml_dtypes), broaden *dist/ ignore, and remove many legacy entries.
Docs & CI tweaks
docs/get_started/Installation.md, .github/workflows/{cuda-ci.yml,metal-ci.yml,rocm-ci.yml}
Tighten prerequisites (Python/CUDA), update Docker instructions, migrate CI to uv-based Python flows, refine submodule handling and build environment variables.
Script hygiene & small tooling changes
maint/scripts/*
Add strict shell options, centralize docker run to call scripts, remove obsolete GPU flags, and simplify build orchestration steps/messages.
Third‑party update
3rdparty/tvm
Bump TVM submodule pointer to a newer commit.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant GH as GitHub Actions
  participant Repo as Repository
  participant UV as uv
  participant CIBW as cibuildwheel / scikit-build-core
  participant CMake as CMake (>=3.26)
  participant TVM as TVM (load_tvm.cmake)
  participant Repair as auditwheel/delocate
  participant Store as Artifact Store

  Dev->>GH: push / release
  GH->>Repo: checkout + submodules
  GH->>UV: setup Python env & install build deps via uv
  GH->>CIBW: run cibuildwheel (matrix)
  CIBW->>CMake: configure & build (load_tvm.cmake, cython_wrapper)
  CMake->>TVM: resolve TVM_SOURCE / INCLUDES
  CMake-->>CIBW: produce raw wheel(s)
  CIBW->>Repair: repair wheel(s)
  GH->>Store: upload artifact(s)
Loading
sequenceDiagram
  autonumber
  participant User as pip / installer
  participant SB as scikit-build-core
  participant CMake as CMake
  participant TVM as TVM

  User->>SB: pip install -v .
  SB->>CMake: configure & build targets (tilelang_module, cython_wrapper, ...)
  CMake->>TVM: load via cmake/load_tvm.cmake
  TVM-->>CMake: provide headers/targets
  CMake-->>SB: built artifacts
  SB-->>User: package installed / wheel produced
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

I nibble at CMake, hop through each line,
scikit-build hums while wheels align.
UV brews a venv, auditwheel sews seams,
TVM paths tidy, and version adds gleams.
A rabbit ships wheels — hippity-hop, build-time dreams! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “[Build] Migrate to scikit-build-core” succinctly captures the primary objective of replacing the existing Python-centric build workflow with scikit-build-core integration, matching the PR’s core change without unnecessary detail or ambiguity.
Linked Issues Check ✅ Passed The PR fully migrates the build system to scikit-build-core by removing setup.py and Python-centric packaging code, introducing CMake-based CMakeLists, updating pyproject.toml with scikit-build-core tooling, and implementing dynamic version metadata as specified in issue #833, and it adds ARM (aarch64) wheel support via the ubuntu-22.04-arm build matrix in CI as required by issue #756.
Out of Scope Changes Check ✅ Passed All code modifications in this PR are directly related to migrating the build and packaging workflow to scikit-build-core and supporting installation and runtime use from site-packages, with no unrelated or extraneous changes outside the scope of the linked issues.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf0edde and 44e9644.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

20-20: label "macos-16" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@oraluben oraluben mentioned this pull request Oct 5, 2025
8 tasks
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7eaee1 and c174093.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

43-43: property "repair" is not defined in object type {}

(expression)


44-44: property "repair" is not defined in object type {}

(expression)

Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.github/workflows/dist.yml (1)

15-24: Provide CUDA_VERSION for every matrix leg.

env.CUDA_VERSION reads matrix.CUDA_VERSION, but the macOS row never defines it. On that leg GitHub errors when resolving the expression. Add a matrix default (e.g., CUDA_VERSION: "") or an include entry for macOS so the key always exists; the snippet below sets a default.

Apply this diff:

     strategy:
       matrix:
+        CUDA_VERSION: [""]
         os: [ubuntu-22.04, ubuntu-22.04-arm, macos-14]
         include:
         - os: ubuntu-22.04
           CUDA_VERSION: 12.1
         - os: ubuntu-22.04-arm
           CUDA_VERSION: 12.8
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c174093 and 01c7c5a.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

26-26: could not parse as YAML: yaml: line 26: did not find expected key

(syntax-check)

🪛 YAMLlint (1.37.1)
.github/workflows/dist.yml

[error] 33-33: syntax error: expected , but found '-'

(syntax)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
version_provider.py (1)

59-61: Consider defensive parsing of CUDA_VERSION.

The unpacking assignment major, minor, *_ = cuda_version.split('.') assumes at least two dot-separated components. While CUDA versions typically follow this format, adding validation would make the code more robust against unexpected input.

Apply this diff to add defensive parsing:

             if cuda_version := os.environ.get('CUDA_VERSION'):
-                major, minor, *_ = cuda_version.split('.')
-                backend = f'cu{major}{minor}'
+                parts = cuda_version.split('.')
+                if len(parts) >= 2:
+                    backend = f'cu{parts[0]}{parts[1]}'
+                else:
+                    backend = 'cuda'
             else:
                 backend = 'cuda'
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 707d433 and 54b40fa.

📒 Files selected for processing (2)
  • .github/workflows/dist.yml (1 hunks)
  • version_provider.py (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

67-67: property "repair" is not defined in object type {ls-whl: {conclusion: string; outcome: string; outputs: {string => string}}}

(expression)

🪛 Ruff (0.13.3)
version_provider.py

23-23: Starting a process with a partial executable path

(S607)


36-36: Unused function argument: settings

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-test-metal
  • GitHub Check: build-test-amd
  • GitHub Check: build-wheels (ubuntu-22.04)
  • GitHub Check: build-wheels (macos-14)
  • GitHub Check: build-wheels (ubuntu-22.04-arm)

LeiWang1999 pushed a commit to tile-ai/tvm that referenced this pull request Oct 12, 2025
Needed-by: tile-ai/tilelang#939

Currently, tvm will build limited api cython library for 3.12+, even if we're targeting 3.8+ in tilelang.
This workaround just relax the version.

This issue should be solved by future tvm-ffi integration.

This part is no longer in upstream tvm so I submitted here.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54b40fa and 5d3415f.

📒 Files selected for processing (3)
  • .github/workflows/dist.yml (1 hunks)
  • 3rdparty/tvm (1 hunks)
  • pyproject.toml (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • 3rdparty/tvm
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

52-52: property "repair" is not defined in object type {ls-whl: {conclusion: string; outcome: string; outputs: {string => string}}}

(expression)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-test-metal
  • GitHub Check: build-test-amd
  • GitHub Check: build-wheels (ubuntu-22.04-arm)
  • GitHub Check: build-wheels (ubuntu-22.04)
  • GitHub Check: build-wheels (macos-14)

@oraluben
Copy link
Contributor Author

oraluben commented Oct 12, 2025

This should be ready for review.

Main parts that this PR touched:

  1. Build system
    1. Now the project builds fully by cmake driven by sk-build-core. pip install . is supposed to cover all use case.
    2. Build dir is set to ./build intentionally, to help IDE indexing, but it's not suppose to use libs in build dir.
    3. All libs will be installed into python's site-packages, no matter if it's a editable install.
    4. Add a patch in tile-ai/tvm Workaround limit api too high in tvm tvm#12 so that tvm's cython ext could be used by py>=3.8. But upstream tvm seems to have something new? cc @Hzfengsy
    5. Thanks to @XuehaiPan , we're using a pure manylinux2014 builder for x86_64, which means we can support torch<2.6 easier. (But manylinux2014 only have a up-to-12.4 cuda, does that mean we need another wheel for CUDA-13?)
    6. Use ccache if possilbe.
  2. Runtime compile and cache
    1. cython_wrapper.so will be compiled at build time, and installed directly under site-packages, to support import cython_wrapper. This is not the best practice, but I think we have plan to migrate to tvm-ffi soon?
    2. Then I removed the runtime compilation logic of cython_wrapper.so. Developers who want to modify that file needs to use an editable install and re-install with pip to make sure the file is re-generated. This takes ~30 seconds each time, because pip needs to pack and unpack each time it installs. I wonder if this looks good to you @LeiWang1999 , or shell we restore the old logic?
    3. Now tilelang use the package version (e.g. 0.1.6.post1+cu121.gitxxxxxxxx) as the version used in its cache folder.
  3. CI and releases
    1. Added a CI job to generate wheels for linux+{x86,aarch64} and mac. I plan to set it to a daily cron job and for new releases, for nightly users to use and for maintainers to be uploaded to pypi. I'm not sure if this aligned with your workflow. @LeiWang1999
    2. And do we need different release for CUDA 12/13?

oraluben and others added 2 commits October 12, 2025 23:21
Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d3415f and 96030a8.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

53-53: property "repair" is not defined in object type {ls-whl: {conclusion: string; outcome: string; outputs: {string => string}}}

(expression)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 110e27a and cf0edde.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

21-21: label "macos-16" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-test-metal
  • GitHub Check: build-wheels (ubuntu-22.04-arm)
  • GitHub Check: build-wheels (ubuntu-22.04)

Copy link
Member

@LeiWang1999 LeiWang1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LeiWang1999
Copy link
Member

  1. I think it’s currently difficult to have a single wheel that supports both CUDA 12 and CUDA 13, and it seems that there aren’t many users on CUDA 13 yet. We will hold off on providing a CUDA 13 release until there is clear demand from the community.
  2. We do have a plan to migrate to tvm_ffi — see issue Plan for TileLang integration with TVM-FFI #970.
  3. Phasing out the JIT compilation of cython sounds good to me, as re‑compiling Cython on the fly is a very rare requirement during the development, not to mention users instead of developers.
  4. The daily build CI looks very helpful.

@LeiWang1999 LeiWang1999 merged commit d89ba5b into tile-ai:main Oct 13, 2025
6 of 8 checks passed
@oraluben oraluben deleted the sk-build-core branch October 23, 2025 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Migrate to scikit-build-core ARM wheels

3 participants