-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 #26698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Verified okay on Neoverse-N2 server with RTX-4090.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for CUDA architecture 8.9 to the Arm64 release builds, which is necessary for RTX 40-series GPUs. The changes correctly update the torch_cuda_arch_list in the Buildkite pipeline for both wheel building and release image creation. My review feedback focuses on improving the maintainability of this CI configuration by addressing the duplication of the architecture list. I've suggested using a pipeline-level environment variable to define this list once, which will prevent potential inconsistencies and make future updates simpler and less error-prone.
| # #NOTE: torch_cuda_arch_list is derived from upstream PyTorch build files here: | ||
| # https://github.com/pytorch/pytorch/blob/main/.ci/aarch64_linux/aarch64_ci_build.sh#L7 | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg VLLM_MAIN_CUDA_VERSION=12.9 --build-arg torch_cuda_arch_list='8.7 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ." | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg VLLM_MAIN_CUDA_VERSION=12.9 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid duplicating the torch_cuda_arch_list, you can define it once as a pipeline-level environment variable. This makes future updates easier and less error-prone.
You can add this at the top of your .buildkite/release-pipeline.yaml file (before the steps: block):
env:
TORCH_CUDA_ARCH_LIST_ARM64: '8.7 8.9 9.0 10.0+PTX 12.0'Then, you can use this environment variable in this command. The same change should be applied to the docker build command on line 79.
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg VLLM_MAIN_CUDA_VERSION=12.9 --build-arg torch_cuda_arch_list=\"$TORCH_CUDA_ARCH_LIST_ARM64\" --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ." | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the comment for line 11, this hardcoded torch_cuda_arch_list should be replaced with the proposed environment variable $TORCH_CUDA_ARCH_LIST_ARM64 to avoid duplication and improve maintainability.
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list=\"$TORCH_CUDA_ARCH_LIST_ARM64\" --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ."
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense as a PCIe card
…ect#26698) Signed-off-by: 1994 <1994@users.noreply.github.com>
…ect#26698) Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…ect#26698) Signed-off-by: bbartels <benjamin@bartels.dev>
…ect#26698) Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ect#26698) Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ect#26698) Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…ect#26698) Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
Support RTX-40xx card (arch=8.9) for Arm64 release.
Test Result
Verified manually on Arm Neoverse-N2 server with RTX-4090 card.