fix: build vLLM from source for ARM64 CUDA 13 (NVIDIA DGX)#637
fix: build vLLM from source for ARM64 CUDA 13 (NVIDIA DGX)#637doringeman wants to merge 1 commit intodocker:mainfrom
Conversation
Summary of ChangesHello @doringeman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request provides a critical fix for deploying vLLM on ARM64 systems with CUDA 13, particularly NVIDIA DGX platforms, by resolving ABI incompatibility issues with prebuilt vLLM wheels and PyTorch CUDA 13 nightly builds. It achieves this by implementing a conditional mechanism to build vLLM directly from source for these specific environments. The changes also include an update to the latest vLLM version and a refinement of the Docker build process for AMD64 to ensure compatibility with new CUDA 13 wheel requirements, alongside making the Docker build process more flexible for custom arguments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Consider guarding the CUDA 13 PATH/LD_LIBRARY_PATH
ENVexports so they are only set whencuda-toolkit-13-0is actually installed (e.g., on ARM64 +VLLM_ARM64_BUILD_FROM_SOURCE=true), to avoid unintentionally overriding CUDA locations on other architectures or configurations. - In the
apt installstep, you useuname -mto detectaarch64while elsewhere you rely onTARGETARCH; aligning onTARGETARCHfor architecture checks would make the build logic more consistent and easier to reason about.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider guarding the CUDA 13 PATH/LD_LIBRARY_PATH `ENV` exports so they are only set when `cuda-toolkit-13-0` is actually installed (e.g., on ARM64 + `VLLM_ARM64_BUILD_FROM_SOURCE=true`), to avoid unintentionally overriding CUDA locations on other architectures or configurations.
- In the `apt install` step, you use `uname -m` to detect `aarch64` while elsewhere you rely on `TARGETARCH`; aligning on `TARGETARCH` for architecture checks would make the build logic more consistent and easier to reason about.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to build vLLM from source for ARM64 platforms with CUDA 13, addressing an ABI incompatibility issue. The changes primarily involve modifications to the Dockerfile to handle conditional compilation and dependency installation, along with Makefile adjustments to pass extra build arguments. My review identified a critical issue in the Dockerfile that would break the amd64 build path due to incorrect shell syntax. Additionally, I've provided several suggestions to improve the robustness and clarity of the Dockerfile scripting, such as using TARGETARCH consistently and addressing unconditionally set environment variables.
| python3 python3-venv python3-dev \ | ||
| curl ca-certificates build-essential \ | ||
| git cmake ninja-build \ | ||
| && if [ "$(uname -m)" = "aarch64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \ |
There was a problem hiding this comment.
You're using uname -m to check the architecture. While this works, it's more idiomatic and robust in Dockerfiles to use the built-in TARGETARCH build argument, which is explicitly provided by the builder for the target platform. This avoids any potential discrepancies with the build environment and improves consistency with other parts of the Dockerfile.
&& if [ "$TARGETARCH" = "arm64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \
| ENV PATH=/usr/local/cuda-13.0/bin:$PATH | ||
| ENV LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH |
There was a problem hiding this comment.
The PATH and LD_LIBRARY_PATH environment variables for the CUDA toolkit are set unconditionally for this build stage. However, the CUDA toolkit is only installed for arm64 builds when VLLM_ARM64_BUILD_FROM_SOURCE is true. For other build configurations (like amd64), these paths will point to non-existent directories. This pollutes the environment and could potentially lead to subtle build issues. Consider setting these environment variables only when they are actually needed, for example by moving this logic into a script that is executed within the RUN instruction where the build from source happens.
| ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "$WHEEL_URL"; \ | ||
| WHEEL_URL="https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}%2B${VLLM_CUDA_VERSION}-${VLLM_PYTHON_TAG}-manylinux_2_35_x86_64.whl" \ | ||
| && ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "$WHEEL_URL"; \ | ||
| elif [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \ |
There was a problem hiding this comment.
The condition for building from source only checks VLLM_ARM64_BUILD_FROM_SOURCE. While the variable name implies it's for arm64, and the logic for installing dependencies also checks the architecture, it's safer and clearer to be explicit here as well. Adding a check for TARGETARCH makes the intent unambiguous and prevents this branch from being accidentally taken on other architectures if the build arguments are misconfigured.
elif [ "$TARGETARCH" = "arm64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \
The prebuilt vLLM ARM64 wheels have ABI incompatibility with PyTorch CUDA 13 nightly builds. For ARM64 with CUDA 13 (e.g., NVIDIA DGX GB300 Blackwell, DGX GB200): - Install CUDA toolkit 13.0 for compilation - Use PyTorch nightly with cu130 support - Build vLLM from source to ensure ABI compatibility Add VLLM_ARM64_BUILD_FROM_SOURCE build arg (default: true) to allow opting out of source builds for faster build times on non-CUDA 13 systems. Also: - Update AMD64 wheel path to manylinux_2_35 (required for cu130) - Bump vLLM to 0.15.1 Signed-off-by: Dorin Geman <dorin.geman@docker.com>
492352a to
b4c36ea
Compare
The prebuilt vLLM ARM64 wheels have ABI incompatibility with PyTorch CUDA 13 nightly builds. For ARM64 with CUDA 13 (e.g., NVIDIA DGX GB300 Blackwell, DGX GB200):
Add VLLM_ARM64_BUILD_FROM_SOURCE build arg (default: true) to allow opting out of source builds for faster build times on non-CUDA 13 systems.
Also:
See https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#use-the-local-cutlass-for-compilation.