-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
buildkite release pipeline: add torch_cuda_arch_list including 12.0 to the Docker "Build release image" build args in order to enable Blackwell SM120 support #19747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o the Docker "Build release image" build args in order to enable Blackwell SM120 support, see vllm-project#13306 (comment)
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @cyril23, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on enhancing the project's Docker image build process to support newer Nvidia GPU architectures, specifically Blackwell SM120. By modifying the Buildkite release pipeline, I've ensured that the Docker images are compiled with the necessary CUDA architecture flags, addressing a critical compatibility gap for users leveraging the project's Docker distributions.
Highlights
- Buildkite Pipeline Update: I've updated the
.buildkite/release-pipeline.yaml
to modify thedocker build
command for thevllm-openai
target. This change specifically adds the--build-arg torch_cuda_arch_list
with an extended list of CUDA architectures, including12.0+PTX
. - Nvidia Blackwell SM120 Support: The primary purpose of adding
12.0+PTX
to thetorch_cuda_arch_list
is to enable support for Nvidia Blackwell SM120 GPUs within the Docker images built by this pipeline. This addresses the need for newer hardware compatibility, particularly for users deploying via Docker.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request adds support for Nvidia Blackwell SM120 by including the 12.0 architecture in the torch_cuda_arch_list
build argument for the Docker image build process. This change addresses issues related to using vLLM with Docker and provides a workaround for the PyPI wheel size limit. A suggestion was made to also include RUN_WHEEL_CHECK=false
.
…ecause here we build the Docker image only for AWS, not for pypi.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR updates the Buildkite release pipeline to include support for Nvidia Blackwell GPUs by modifying the torch_cuda_arch_list
build argument and disabling the wheel size check. The primary feedback concerns the specific values used in torch_cuda_arch_list
, suggesting a more standard list for specifying CUDA compute capabilities.
/gemini summary |
Summary of ChangesThis pull request focuses on enabling NVIDIA Blackwell SM120 GPU support for vLLM when deployed via Docker images. It achieves this by updating the Buildkite release pipeline to pass specific build arguments to the Docker image creation process, including an expanded list of supported CUDA architectures and disabling wheel size checks. This strategy provides Blackwell compatibility for Docker users, circumventing the size constraints faced by PyPI wheel distributions. Highlights
Changelog
Activity
|
I've tried running my Docker image
but got serious problems
full output: problem.txt I'll try build it again on my PC now, using
and see how it works. |
I'm trying to identify now which of those recent Commits broke SM120 compability. |
edit: I've opened an Issue about it here: #19810 |
Alternative solutions, proposed in #19810 (comment) :
|
Thank you for aggregating all of these issues and information @cyril23 , it is really great. I think we would like to add SM 12.0 in the base dockerfile and wheel (in my mentioned PR) to solve most issues now, so we can hopefully close them! |
absolutely right, i'll close this PR here |
Essential Elements of an Effective PR Description Checklist
Purpose
We need Nvidia Blackwell SM120 support. There are a lot of issues regarding it.
There are two alternative solutions for solving Number 2 (proving a SM 120 capable Docker image), see #13306 (comment)
This is the PR for alternative 2 since I didn't want to touch the defaults.
Besides adding
--build-arg torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0 10.0 10.1 12.0+PTX'
I needed to add--build-arg RUN_WHEEL_CHECK=false
too of course. This shouldn't be a problem since AWS does not have the same size restrictions as pypi's wheel size limits.Test Plan
wurstdeploy/vllm:dev
Test Result
Standard E96s v6 (96 vcpus, 768 GiB memory)
machine. The build took ~1 hour, max system memory usage, including OS: 181125 MiB RAM. This build is based on the following commit:wurstdeploy/vllm:dev
. I did it as follows:TinyLlama/TinyLlama-1.1B-Chat-v1.0
and a test prompt. Screenshots: [1], [2], [3] edit: but it only works with the docker environment variable-e VLLM_USE_FLASHINFER_SAMPLER=0
being set, else I get a kernel error. Investigation follows in the comments