-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Update PyTorch to 2.7.0 #16859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update PyTorch to 2.7.0 #16859
Conversation
Signed-off-by: Huy Do <huydhn@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
This reverts commit 1be359a. Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Now that torch 2.7 has released (https://pypi.org/project/torch/2.7.0/) can this be updated? |
Yup, it can be updated now. Let me start working on that. On the other hand, I think I will keep the state of this PR as a reference because we plan to do similar validation for the next PyTorch release. Ideally, the validation needs to be done with PyTorch RC before publishing to pypi. |
docker/Dockerfile
Outdated
|
||
# TESTING: install xformers from source to test 2.7.0 final RC | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
TORCH_CUDA_ARCH_LIST='7.5 8.0 8.6 8.9 9.0+PTX' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for blackwell support we will wnat to add sm100 here, although the +PTX should handle this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 10.0 as well as 12.0. +PTX is not enough. (Same comment in the several places this sequence appears.) cc @kushanam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the size of this PR is relative big already and its signals are ready, let me add this change to a subsequent PR. It would be clearer that way I guess
docker/Dockerfile
Outdated
--index-strategy unsafe-best-match | ||
|
||
# TESTING: install xformers from source to test 2.7.0 final RC | ||
RUN --mount=type=cache,target=/root/.cache/uv \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intention is that the we are just going to wait till xformers releases a 12.8 compat whl though ignore above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I think we can just build the package from source for CI, then switch to the official xformers package once it's ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does this take to compile xformers? If it's too long I don't want to slow down our CI time for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me dig out the number for this once the build finish. Without caching, it would be significant from what I see locally, but let's see what it takes to build once this is cached.
When to sync to pip repository |
Does vllm publish nightlies to some pip channel? Asking to try out vllm with pytorch 2.7.0 |
|
Couldn't find this nightly instruction in the installation section of README. Might be good to add it there too! |
For me, this again tries to fetch 2.6.0 :( despite having 2.7.0 installed:
|
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Basically, can't find direct URLs to nightly wheels :( which might be needed to circumvent pip not wanting to install the nightly for some reason So far managed to find published commit from:
|
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
This PR fixes the other issue discovered in vllm-project#16859 when upgrading from PyTorch 2.6 to PyTorch 2.7. I don't know why the code used to work in PyTorch 2.6, but the explanation is: - when we are running PiecewiseCompileInterpreter, we end up doing FakeTensor propagation - FakeTensor propagation requires `enable_python_dispatcher` to work. The mechanism is that some of our "C++ implementations" for operations, like matmul, force specialization of dynamic shapes. torch.compile works around this by replacing PyTorch's "C++ implementation" for matmul with a python-based implementation for matmul that does not force specialization. Test Plan: - Ran `pytest -v tests/models/test_transformers.py -k test_models[meta-llama/Llama-3.2-1B-Instruct-transformers]` with PyTorch >= 2.7 and vllm-project#17330, verified that the test passes. Signed-off-by: rzou <zou3519@gmail.com>
Hi,why upgrade to pytorch2.7? |
Signed-off-by: minpeter <kali2005611@gmail.com>
I found that after this commit, the tpot/itl of the Qwen/Qwen2.5-14B-Instruct model on an h20 dropped from 20ms to 10ms. I want to know which part of the code benefited from this. Is it pytorch2.7? |
Notable changes:
xformers(0.0.30 is ready now), flashinfer, and mamba-ssm packages, so let build them from source for now. They can be installed from pypi once they are built upstream with 2.7.0intel-extension-for-pytorch