-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix ray instance detect issue #9439
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@youkaichao can you help review this change? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. One question is should we also change the other branch?
vllm/executor/ray_utils.py
Outdated
num_gpus=parallel_config.world_size) | ||
# Try to connect existing ray instance and create a new one if not found | ||
try: | ||
ray.init('auto') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use double quotes for consistency.
Agree. So I unify the |
The update code seems not equivalent to the original one? Currently for non-hip and non-xpu cases we don't init Ray with all GPUs. |
For non-hip and non-xpu cases, it will finally create a local instance with detected gpus if fails to connect existing cluster based on explanation. |
Sounds reasonable to me, but cc @rkooo567 @richardliaw to double check. |
@yma11 please resolve the conflict |
@youkaichao Thanks for reminder. @comaniac I switched the fix back to only change |
bc652ab
to
af18da6
Compare
Signed-off-by: yan ma <yan.ma@intel.com>
@DarkLight1337 please help check is the error related or it occurs in the main branch previously? |
It is a failure from main branch that has since been fixed. You can force merge this. |
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: qishuai <ferdinandzhong@gmail.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Fix ray instance detect so that will first try connecting to latest launched instance and if not, create a new one with
num_gpus=parallel_config.world_size
.