-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Add MultiprocessingGPUExecutor #4539
Conversation
This introduces a python multiprocessing-based executor that can be used as an alternative to Ray for single-node inferencing. With the changes in this PR, Ray will continue to be used for parallel workers if it's installed, otherwise vanilla python multiprocessing is used. It can also be overridden with --no-worker-use-ray. The existing distributed tests have been updated to run with/without Ray. Worker processes are shut down when the LLMEngine is garbage collected. Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
This is really great work @njhill. Thanks for all the effort! |
I think |
@youkaichao that sounds very reasonable, but maybe it could be a separate PR? This is not actually a newly introduced arg - there is already a boolean |
Yes, although ray is already optional if you are only using single-GPU. I do have some changes to make it an optional "extra" from a python package installation pov but was thinking of a follow-on PR to avoid making this one bigger. |
If it is possible, I suggest add |
@youkaichao any idea why the ray distributed CI test might be failing now due to a gloo timeout? I think it's something to do with a second engine using TP being created in the same pytest process after the first one is shut down (the test now runs with mp executor followed by ray executor). This wasn't a problem with an earlier version of this PR, but I know you've made changes in this area. I will dig in more but just wanted to check if it's anything obvious to you. |
Try to merge the main branch in? I'm not sure, but the latest commit i merge into main can pass the ci test |
@youkaichao fyi the problem is still there after pulling in your latest fix commit, I'll try to narrow it down tomorrow. |
My suspection is improper clean up. You can try to have one test for mp, and another for ray. Then they will not have interference. |
@njhill maybe we should cancel the test for this pr, until you figure it out locally? otherwise the ci will be blocked. |
…c-gpu-executor # Conflicts: # tests/distributed/test_basic_distributed_correctness.py
…c-gpu-executor # Conflicts: # .buildkite/test-pipeline.yaml
@youkaichao I've updated this now to run in separate tests. Do you think it would be worth opening a separate issue to address the distributed cleanup issue? Currently it seems you can't create an
I've now made this update as requested, can enable with @youkaichao @rkooo567 hopefully this is now ready to merge? 🙏 the failing tests look unrelated (same failures on main branch). We could discuss in a follow-on whether it makes sense to change the default from |
I will finish review it by today! |
I have a comment on this: #4508 (comment) TL;DR is Python garbage collection is unreliable. If we want to address the distributed cleanup issue, we need some user-interface change, like context manager to explicitly control the cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the efforts!
Thanks again @youkaichao @rkooo567 @zhuohan123! And thanks for your patience @vrdn-23! |
Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
Hi all, thanks for the efforts, just have one question, is there any performance difference between python multiprocessing and ray? Thanks in advance. |
Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
This introduces a python multiprocessing-based executor that can be used as an alternative to Ray for single-node inferencing.
With the changes in this PR, Ray will continue to be used for parallel workers if it's installed, otherwise vanilla python multiprocessing is used. It can also be overridden with
--no-worker-use-ray
--distributed-executor-backed=mp
.By default, worker process are started using
spawn
. This can be changed tofork
by setting env varVLLM_WORKER_MULTIPROC_METHOD=fork
.fork
mode has a benefit of starting faster.The existing distributed tests have been updated to run with/without Ray.
Worker processes are shut down when the
LLMEngine
is garbage collected.This replaces original PRs #3466 #2898 and #4345. It was originally co-authored by @sahilsuneja1.