Skip to content

Update PyTorch to 2.7.0 #16859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Apr 30, 2025
Merged

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Apr 18, 2025

Notable changes:

  • PyTorch 2.7.0 has dropped CUDA 12.4, so the remaining options are 12.6 and 12.8
  • We need a new xformers (0.0.30 is ready now), flashinfer, and mamba-ssm packages, so let build them from source for now. They can be installed from pypi once they are built upstream with 2.7.0
  • Leave XPU for later for Intel folks to pick it up as it requires a newer version of intel-extension-for-pytorch
  • Update the release pipeline to build with CUDA 12.8 too

Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn requested a review from tlrmchlsmth as a code owner April 18, 2025 16:55
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Apr 18, 2025
Copy link

mergify bot commented Apr 18, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @huydhn.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 18, 2025
Signed-off-by: Huy Do <huydhn@gmail.com>
@mergify mergify bot removed the needs-rebase label Apr 18, 2025
@mgoin mgoin self-requested a review April 18, 2025 17:07
huydhn added 17 commits April 18, 2025 12:43
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
This reverts commit 1be359a.

Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@mgoin
Copy link
Member

mgoin commented Apr 23, 2025

Now that torch 2.7 has released (https://pypi.org/project/torch/2.7.0/) can this be updated?

@huydhn
Copy link
Contributor Author

huydhn commented Apr 23, 2025

Now that torch 2.7 has released (https://pypi.org/project/torch/2.7.0/) can this be updated?

Yup, it can be updated now. Let me start working on that. On the other hand, I think I will keep the state of this PR as a reference because we plan to do similar validation for the next PyTorch release. Ideally, the validation needs to be done with PyTorch RC before publishing to pypi.


# TESTING: install xformers from source to test 2.7.0 final RC
RUN --mount=type=cache,target=/root/.cache/uv \
TORCH_CUDA_ARCH_LIST='7.5 8.0 8.6 8.9 9.0+PTX' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for blackwell support we will wnat to add sm100 here, although the +PTX should handle this..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 10.0 as well as 12.0. +PTX is not enough. (Same comment in the several places this sequence appears.) cc @kushanam

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the size of this PR is relative big already and its signals are ready, let me add this change to a subsequent PR. It would be clearer that way I guess

--index-strategy unsafe-best-match

# TESTING: install xformers from source to test 2.7.0 final RC
RUN --mount=type=cache,target=/root/.cache/uv \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is that the we are just going to wait till xformers releases a 12.8 compat whl though ignore above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I think we can just build the package from source for CI, then switch to the official xformers package once it's ready

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this take to compile xformers? If it's too long I don't want to slow down our CI time for this.

Copy link
Contributor Author

@huydhn huydhn Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me dig out the number for this once the build finish. Without caching, it would be significant from what I see locally, but let's see what it takes to build once this is cached.

@simon-mo simon-mo merged commit 2c4f59a into vllm-project:main Apr 30, 2025
87 of 90 checks passed
@DogeFlow
Copy link

When to sync to pip repository

@vadimkantorov
Copy link

Does vllm publish nightlies to some pip channel? Asking to try out vllm with pytorch 2.7.0

@mgoin
Copy link
Member

mgoin commented May 4, 2025

@vadimkantorov

pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

@vadimkantorov
Copy link

Couldn't find this nightly instruction in the installation section of README. Might be good to add it there too!

Copy link
Collaborator

simon-mo commented May 8, 2025

@vadimkantorov
Copy link

vadimkantorov commented May 12, 2025

pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

For me, this again tries to fetch 2.6.0 :( despite having 2.7.0 installed:

vllm-0.8.5.post1-cp38-abi3-manylinux1_x86_64.whl does not look like nightly wheel :( looks like it's the release wheel. I think pip discovers this wheel from my local cache because it exists and does not attempt to install the nightly :( Bug in pip? I manually went to https://wheels.vllm.ai/nightly/vllm and found this wheel https://wheels.vllm.ai/vllm-0.8.5.dev599%2Bg9fbf2bfbd-cp38-abi3-manylinux1_x86_64.whl - which indeed looks like nightly, but it gives 404 :(

$ pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
                                                                                                          
Defaulting to user installation because normal site-packages is not writeable                                              
Looking in indexes: https://pypi.org/simple, https://wheels.vllm.ai/nightly                                                
Collecting vllm                                                                                                              
Downloading vllm-0.8.5.post1-cp38-abi3-manylinux1_x86_64.whl (326.4 MB)                                                        
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 326.4/326.4 MB 10.2 MB/s eta 0:00:00
...
Collecting torch==2.6.0
  Downloading torch-2.6.0-cp310-cp310-manylinux1_x86_64.whl (766.7 MB)
     ━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 221.3/766.7 MB 252.7 MB/s eta 0:00:03                                        
ERROR: Operation cancelled by user

# second try:
$ pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly/vllm                                                                                                          
Defaulting to user installation because normal site-packages is not writeable                                              
Looking in indexes: https://pypi.org/simple, https://wheels.vllm.ai/nightly/vllm                                           
Collecting vllm
  Using cached vllm-0.8.5.post1-cp38-abi3-manylinux1_x86_64.whl (326.4 MB)

@vadimkantorov
Copy link

vadimkantorov commented May 12, 2025

Basically, can't find direct URLs to nightly wheels :( which might be needed to circumvent pip not wanting to install the nightly for some reason

So far managed to find published commit from:

pip index versions vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

# and then

pip install -U vllm==0.8.5.dev600+g7ea6cb28b --pre --extra-index-url https://wheels.vllm.ai/nightly

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
zou3519 added a commit to zou3519/vllm that referenced this pull request May 27, 2025
This PR fixes the other issue discovered in vllm-project#16859 when upgrading from
PyTorch 2.6 to PyTorch 2.7. I don't know why the code used to work in
PyTorch 2.6, but the explanation is:
- when we are running PiecewiseCompileInterpreter, we end up doing
  FakeTensor propagation
- FakeTensor propagation requires `enable_python_dispatcher` to work.
  The mechanism is that some of our "C++ implementations" for
  operations, like matmul, force specialization of dynamic shapes.
  torch.compile works around this by replacing PyTorch's "C++
  implementation" for matmul with a python-based implementation for
  matmul that does not force specialization.

Test Plan:
- Ran `pytest -v tests/models/test_transformers.py -k test_models[meta-llama/Llama-3.2-1B-Instruct-transformers]`
  with PyTorch >= 2.7 and vllm-project#17330, verified that the test passes.

Signed-off-by: rzou <zou3519@gmail.com>
@zhanglianjie-163
Copy link

Hi,why upgrade to pytorch2.7?

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: minpeter <kali2005611@gmail.com>
@jessiewiswjc
Copy link

I found that after this commit, the tpot/itl of the Qwen/Qwen2.5-14B-Instruct model on an h20 dropped from 20ms to 10ms. I want to know which part of the code benefited from this. Is it pytorch2.7?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants