-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[TPU] Support multi-host inference #7457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
In ray gpu executor, there are these lines: vllm/vllm/executor/ray_gpu_executor.py Lines 175 to 191 in 7025b11
to make sure the worker index aligns with machine boundary. you might need it in TPU, too. Otherwise local ranks can be wrong. for example, rank 0, 1, 2, 4 in one node, and 3, 5, 6, 7 in another node. |
@youkaichao Can you explain more? |
say you have 2 nodes, 8 TPUs in total. ray actors are launched one by one. when you launch the first actor, it might live in node 0; when you launch the second actor, it might live in node 1. If you use the index of worker as global rank, then it will cause a problem. |
@youkaichao Thanks for the explanation. Let me merge this PR first as there are users waiting for this and the scope of this PR is isolated to the TPU backend. I will address your comment in a followup PR. |
Hi @WoosukKwon - thanks for the office hours today! And thanks for your hard work on this! I’m eager to start using the multi-host inference support on TPUs. Do you know when this feature will be available for general use? Thanks again! |
I think it should work now , though I haven't tried it yet especially for bigger model like mistral large v2 , any doc or tutorial would help, @WoosukKwon |
Signed-off-by: Alvant <alvasian@yandex.ru>
Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
Changing global rank and world size into local rank and local world size.