Skip to content

[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793

Open
tmm77 wants to merge 55 commits into
taichi-dev:masterfrom
ROCm:amd-integration
Open

[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793
tmm77 wants to merge 55 commits into
taichi-dev:masterfrom
ROCm:amd-integration

Conversation

@tmm77

@tmm77 tmm77 commented Apr 15, 2026

Copy link
Copy Markdown

Issue: #

Brief Summary

These code changes update LLVM to version 20 for AMD GPU code generation to enable Taichi on MI300X, MI325X, and MI355X.


Note

High Risk
Touches core JIT/codegen for CPU, CUDA, AMDGPU, and DX12 with a major LLVM API migration; incorrect AMDGPU or pass-manager behavior would break kernel compilation and GPU execution.

Overview
This PR modernizes the Taichi/ROCm stack around LLVM 20 so AMDGPU kernels can be built and JIT-compiled for Instinct MI3xx targets, with supporting packaging and docs.

Compiler & runtime: Clang discovery and tested ceiling move to LLVM/Clang 20 (CMakeLists.txt, CI compiler.py). LLVM setup for AMDGPU builds prefers system/ROCm toolchains (LLVM_DIR, ROCM_PATH, /usr/lib/llvm-20) instead of only downloading prebuilt LLVM 15 zips. Across CPU, CUDA, AMDGPU, and DX12 paths, optimization moves from the legacy pass manager to LLVM’s New Pass Manager (PassBuilder), with version-guarded APIs for codegen opt levels and assembly emission. Opaque pointers and related IR changes touch shared LLVM codegen, struct layout, AMDGPU basic-block insertion, CUDA global loads (replacing removed nvvm.ldg intrinsics with invariant loads), and AMDGPU kernel pointer/addrspace handling. AMDGPU JIT (jit_amdgpu.cpp) is updated for the new pass pipeline and object/HSACO emission.

Language & build: erf / erfc are wired through IR, Python ops, and LLVM codegen (including CUDA). Dockerfile.rocm adds a multi-stage image that installs LLVM 20, applies spdlog_fmt.patch, and builds/installs wheels. setup.py strips non-numeric suffixes from patch versions for AMD packaging.

Tooling & docs: Root README.md is replaced with a deprecation notice (legacy content moved to README-deprecated.md). New ROCm Sphinx docs and readthedocs.yaml describe install/examples. Microbenchmarks default to amdgpu with --arch / --benchmark_plan CLI; Vulkan setup in CI entry is limited to Linux. The PR title presubmit workflow and ci/assets/mitm-ca.crt are removed.

Misc: Debug symbols for AMDGPU (-g in TaichiCore.cmake), ImGui Vulkan init API updates, and a large .wordlist.txt for documentation spelling.

Reviewed by Cursor Bugbot for commit 79f75c6. Bugbot is set up for automated code reviews on this repo. Configure here.

tmm77 and others added 30 commits April 29, 2025 17:14
Parameterize microbenchmarks and vulkan sdk update
fix: Patch to avoid the need to fetch source to build Taichi wheel
Taichi Dockerfile
Co-authored-by: Bhavesh Lad <Bhavesh.Lad@amd.com>
Co-authored-by: Tiffany Mintz <tiffany.mintz@amd.com>
…TX handling, and implement new pass manager setup
 from johnnynunez/taichi master branch; some of the changes from these were captured in the previous commit to rocm/taichi
Comment thread taichi/codegen/dx12/dx12_global_optimize_module.cpp
Comment thread taichi/runtime/amdgpu/jit_amdgpu.cpp
if ((u.system, u.machine) not in (("Linux", "arm64"), ("Linux", "aarch64"))) and not (cmake_args.get_effective("TI_WITH_AMDGPU")):
os.environ["LLVM_DIR"] = "/usr/lib/llvm-20/cmake"
os.environ["CUDA_HOME"] = "/usr/local/cuda"
os.environ["CPATH"] = "/usr/local/cuda/include"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM_DIR hardcoded to Linux path for all platforms

Medium Severity

The final LLVM_DIR assignment unconditionally sets it to /usr/lib/llvm-20/cmake for all non-ARM-Linux, non-AMDGPU platforms, including macOS and Windows. The original code used str(out) which pointed to the platform-specific downloaded LLVM path. This overwrites the correct out-based paths for Darwin and Windows, breaking LLVM discovery on those platforms. Similarly, CUDA_HOME and CPATH are set to Linux-specific paths.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.

Comment thread docs/conf.py
f.read())
if not match:
raise ValueError("VERSION not found!")
version_number = match[1]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs conf.py searches for nonexistent CMake function

Medium Severity

The docs/conf.py searches for rocm_setup_version(VERSION ...) in CMakeLists.txt, but the project's CMakeLists.txt does not contain this function call. This causes a ValueError("VERSION not found!") to be raised every time the documentation is built, completely breaking the docs build pipeline.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.

@tmm77 tmm77 changed the title LLVM 20 updates for AMD MI3xx GPUs [amdgpu] LLVM 20 updates for AMD MI3xx GPUs Apr 16, 2026
This is to address AMD security concerns
Comment thread taichi/runtime/amdgpu/jit_amdgpu.cpp

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

There are 5 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit b9c05cd. Configure here.

Comment thread cmake/TaichiCore.cmake

if (TI_WITH_AMDGPU)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DTI_WITH_AMDGPU")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -DTI_WITH_AMDGPU")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo drops AMDGPU runtime sources

High Severity

With TI_WITH_AMDGPU enabled, taichi/runtime/amdgpu/runtime.cpp is appended to TAIHI_CORE_SOURCE instead of TAICHI_CORE_SOURCE. The core object library is built only from TAICHI_CORE_SOURCE, so that runtime translation unit is never linked into taichi_core.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b9c05cd. Configure here.

parent_ty = ptr_ty->getPointerElementType();
if (auto ptr_ty = llvm::dyn_cast<llvm::PointerType>(parent_ty)) {
TI_NOT_IMPLEMENTED;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Root SNode lookup aborts

High Severity

For root SNodeLookupStmt, when the parent LLVM value comes from a BitCastInst to a pointer type, codegen hits TI_NOT_IMPLEMENTED instead of emitting a GEP. With opaque pointers in LLVM 20, that path is common and kernel compilation fails.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b9c05cd. Configure here.

url = "https://github.com/GaleSeLee/assets/releases/download/v0.0.5/taichi-llvm-15.0.0-linux.zip"
# We should use LLVM toolchains shipped with OS.
os.environ["LLVM_DIR"] = os.environ["LLVM_PATH"]+"/lib/cmake"
os.environ["CPATH"] = os.environ["ROCM_PATH"]+"/include"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU LLVM setup needs env

Medium Severity

On Linux x86_64 with TI_WITH_AMDGPU, setup_llvm sets LLVM_DIR and CPATH from LLVM_PATH and ROCM_PATH without checking they exist, so a missing variable raises KeyError and aborts the build.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b9c05cd. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants