-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Disable remote caching when calling compile_fx #16611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, let me play around locally.
The problem is at follows: - vLLM requires its monkeypatched functions to run (e.g. https://github.com/vllm-project/vllm/blob/7b5ecf79bd94aab0d782c70126d0dcc37c16bc60/vllm/compilation/compiler_interface.py#L251) - These functions may not run if (1) a user has torch.compile remote cache setup and (2) there is a remote cache hit. - When the monkeypatched/hijacked functions fail to run, we get some assertions: https://github.com/vllm-project/vllm/blob/7b5ecf79bd94aab0d782c70126d0dcc37c16bc60/vllm/compilation/compiler_interface.py#L299-L302 This PR disables torch.compile remote caching for vLLM compile. Test Plan: - tested locally Signed-off-by: rzou <zou3519@gmail.com>
fda8175
to
cbc49dc
Compare
This PR is mostly to fix a meta-internal bug I noticed (where we do have the torch.compile remote caches on), but I think this is generally applicable to vLLM so we should ship it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
The problem is as follows:
vllm/vllm/compilation/compiler_interface.py
Line 251 in 7b5ecf7
vllm/vllm/compilation/compiler_interface.py
Lines 299 to 302 in 7b5ecf7
This PR disables torch.compile remote caching for vLLM compile.
Test Plan:
vllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000 --override-generation-config='{"attn_temperature_tuning": true}'