Skip to content

fix: build vLLM from source for ARM64 CUDA 13 (NVIDIA DGX)#637

Open
doringeman wants to merge 1 commit intodocker:mainfrom
doringeman:linux-arm64-cuda-vllm
Open

fix: build vLLM from source for ARM64 CUDA 13 (NVIDIA DGX)#637
doringeman wants to merge 1 commit intodocker:mainfrom
doringeman:linux-arm64-cuda-vllm

Conversation

@doringeman
Copy link
Contributor

The prebuilt vLLM ARM64 wheels have ABI incompatibility with PyTorch CUDA 13 nightly builds. For ARM64 with CUDA 13 (e.g., NVIDIA DGX GB300 Blackwell, DGX GB200):

  • Install CUDA toolkit 13.0 for compilation
  • Use PyTorch nightly with cu130 support
  • Build vLLM from source to ensure ABI compatibility

Add VLLM_ARM64_BUILD_FROM_SOURCE build arg (default: true) to allow opting out of source builds for faster build times on non-CUDA 13 systems.

Also:

  • Update AMD64 wheel path to manylinux_2_35 (required for cu130)
  • Bump vLLM to 0.15.1

See https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#use-the-local-cutlass-for-compilation.


# Default: build from source (NVIDIA DGX / CUDA 13)
$ make docker-run-vllm

# Opt-out: use prebuilt wheel (faster build, may not work on CUDA 13)
$ make docker-run-vllm EXTRA_DOCKER_BUILD_ARGS="--build-arg VLLM_ARM64_BUILD_FROM_SOURCE=false"
$ sudo dmidecode -s system-product-name
DGX Station GB300

$ make docker-run-vllm # took ~42 min (built vLLM from source)
$ MODEL_RUNNER_HOST=http://localhost:8080 docker model run smollm2-vllm hi
Hello there! I’m here to help with an interesting task. What can I assist you with?

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @doringeman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request provides a critical fix for deploying vLLM on ARM64 systems with CUDA 13, particularly NVIDIA DGX platforms, by resolving ABI incompatibility issues with prebuilt vLLM wheels and PyTorch CUDA 13 nightly builds. It achieves this by implementing a conditional mechanism to build vLLM directly from source for these specific environments. The changes also include an update to the latest vLLM version and a refinement of the Docker build process for AMD64 to ensure compatibility with new CUDA 13 wheel requirements, alongside making the Docker build process more flexible for custom arguments.

Highlights

  • Conditional vLLM Source Build for ARM64 CUDA 13: Introduced a new build argument, VLLM_ARM64_BUILD_FROM_SOURCE, which defaults to true, to enable building vLLM from source specifically for ARM64 architectures running CUDA 13 (e.g., NVIDIA DGX systems). This addresses ABI incompatibility issues with prebuilt wheels.
  • vLLM Version Bump: Updated the default vLLM version used in the Dockerfile from 0.12.0 to 0.15.1.
  • AMD64 Wheel Path Update: Adjusted the AMD64 wheel path for vLLM to manylinux_2_35_x86_64, which is required for CUDA 13 compatibility.
  • Flexible Docker Build Arguments: Modified the Makefile to allow passing additional build arguments to docker buildx build commands via the EXTRA_DOCKER_BUILD_ARGS variable, enabling easier customization of the build process.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • Dockerfile
    • Updated VLLM_VERSION from 0.12.0 to 0.15.1.
    • Added a new build argument VLLM_ARM64_BUILD_FROM_SOURCE (defaulting to true) to control whether vLLM is built from source on ARM64.
    • Expanded build dependencies to include git, cmake, ninja-build, and conditionally cuda-toolkit-13-0 for ARM64 source builds.
    • Set PATH and LD_LIBRARY_PATH environment variables for CUDA 13.0.
    • Modified vLLM installation logic: AMD64 now uses manylinux_2_35_x86_64 wheels, and ARM64 with VLLM_ARM64_BUILD_FROM_SOURCE=true now installs PyTorch nightly (cu130) and compiles vLLM from its source repository.
  • Makefile
    • Introduced EXTRA_DOCKER_BUILD_ARGS variable to pass additional arguments to Docker build commands.
    • Modified docker-build and docker-build-multiplatform targets to incorporate $(EXTRA_DOCKER_BUILD_ARGS).
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Consider guarding the CUDA 13 PATH/LD_LIBRARY_PATH ENV exports so they are only set when cuda-toolkit-13-0 is actually installed (e.g., on ARM64 + VLLM_ARM64_BUILD_FROM_SOURCE=true), to avoid unintentionally overriding CUDA locations on other architectures or configurations.
  • In the apt install step, you use uname -m to detect aarch64 while elsewhere you rely on TARGETARCH; aligning on TARGETARCH for architecture checks would make the build logic more consistent and easier to reason about.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider guarding the CUDA 13 PATH/LD_LIBRARY_PATH `ENV` exports so they are only set when `cuda-toolkit-13-0` is actually installed (e.g., on ARM64 + `VLLM_ARM64_BUILD_FROM_SOURCE=true`), to avoid unintentionally overriding CUDA locations on other architectures or configurations.
- In the `apt install` step, you use `uname -m` to detect `aarch64` while elsewhere you rely on `TARGETARCH`; aligning on `TARGETARCH` for architecture checks would make the build logic more consistent and easier to reason about.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to build vLLM from source for ARM64 platforms with CUDA 13, addressing an ABI incompatibility issue. The changes primarily involve modifications to the Dockerfile to handle conditional compilation and dependency installation, along with Makefile adjustments to pass extra build arguments. My review identified a critical issue in the Dockerfile that would break the amd64 build path due to incorrect shell syntax. Additionally, I've provided several suggestions to improve the robustness and clarity of the Dockerfile scripting, such as using TARGETARCH consistently and addressing unconditionally set environment variables.

python3 python3-venv python3-dev \
curl ca-certificates build-essential \
git cmake ninja-build \
&& if [ "$(uname -m)" = "aarch64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

You're using uname -m to check the architecture. While this works, it's more idiomatic and robust in Dockerfiles to use the built-in TARGETARCH build argument, which is explicitly provided by the builder for the target platform. This avoids any potential discrepancies with the build environment and improves consistency with other parts of the Dockerfile.

    && if [ "$TARGETARCH" = "arm64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \

Comment on lines +110 to +111
ENV PATH=/usr/local/cuda-13.0/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PATH and LD_LIBRARY_PATH environment variables for the CUDA toolkit are set unconditionally for this build stage. However, the CUDA toolkit is only installed for arm64 builds when VLLM_ARM64_BUILD_FROM_SOURCE is true. For other build configurations (like amd64), these paths will point to non-existent directories. This pollutes the environment and could potentially lead to subtle build issues. Consider setting these environment variables only when they are actually needed, for example by moving this logic into a script that is executed within the RUN instruction where the build from source happens.

~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "$WHEEL_URL"; \
WHEEL_URL="https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}%2B${VLLM_CUDA_VERSION}-${VLLM_PYTHON_TAG}-manylinux_2_35_x86_64.whl" \
&& ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "$WHEEL_URL"; \
elif [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition for building from source only checks VLLM_ARM64_BUILD_FROM_SOURCE. While the variable name implies it's for arm64, and the logic for installing dependencies also checks the architecture, it's safer and clearer to be explicit here as well. Adding a check for TARGETARCH makes the intent unambiguous and prevents this branch from being accidentally taken on other architectures if the build arguments are misconfigured.

    elif [ "$TARGETARCH" = "arm64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \

The prebuilt vLLM ARM64 wheels have ABI incompatibility with PyTorch CUDA 13 nightly builds. For ARM64 with CUDA 13 (e.g., NVIDIA DGX GB300 Blackwell, DGX GB200):
- Install CUDA toolkit 13.0 for compilation
- Use PyTorch nightly with cu130 support
- Build vLLM from source to ensure ABI compatibility

Add VLLM_ARM64_BUILD_FROM_SOURCE build arg (default: true) to allow opting out of source builds for faster build times on non-CUDA 13 systems.

Also:
- Update AMD64 wheel path to manylinux_2_35 (required for cu130)
- Bump vLLM to 0.15.1

Signed-off-by: Dorin Geman <dorin.geman@docker.com>
@doringeman doringeman force-pushed the linux-arm64-cuda-vllm branch from 492352a to b4c36ea Compare February 5, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant