-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Core] Multiprocessing executor for single-node multi-GPU [1/2] #4345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This introduces the MultiProcGPUExecutor which uses multiprocessing for tensor parallel as an alternative to ray. This PR does not actually wire it up for use, that will be done in a follow-on PR. This PR includes some refactoring to simplify the executor class hierarchy: - A `MultiGPUExecutor` abstract superclass shared between ray and vanilla multiprocessing implementations - Add a shutdown() method to BaseExecutor abstract class - Simplification/centralization of GPU Worker construction - Move ray_utils.py from engine to executor package - Move function call tracing setup to utils function - Fix various typing things
| @lru_cache(maxsize=None) | ||
| def get_distributed_init_method() -> str: | ||
| ip = get_ip() | ||
| port = get_open_port() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to cache here? what if we want to init distributed for two different groups, and we expect the function to return different results per call?
vllm/utils.py
Outdated
| return so_file | ||
|
|
||
|
|
||
| def enable_trace_function_call_for_process(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this is per-thread. the naming is not accurate.
|
To be honest, I feel like this PR is still too large to be effective reviewed. I suggest to break it down into pieces, each pr having about 100~200 lines of change, so that we can have a quick feedback loop. |
|
@youkaichao here is the first one: #4347 |
|
Next one, just introducing the new multi-gpu abstract executor class: #4348 |
|
Next one, add shutdown method to ExecutorBase: #4349 |
|
Let's keep 1~2 PRs per day to avoid zhuohan and me being overwhelmed ? |
|
@youkaichao these are mostly the contents of the original PR that was already reviewed and discussed, just broken up (and mostly very small). Please feel to review at whatever pace you're comfortable with! Next one is just moving the function tracing setup to a util function: #4352 |
|
Closing this in favor of a set of more granular PRs, some already opened (linked above). |
This introduces the
MultiProcGPUExecutorwhich uses multiprocessing for tensor parallel as an alternative to ray.This PR does not actually wire it up for use, that will be done in a follow-on PR.
This PR includes some refactoring to simplify the executor class hierarchy:
MultiGPUExecutorabstract superclass shared between ray and vanilla multiprocessing implementationsshutdown()method toBaseExecutorabstract class, executor is shutdown when the LLMEngine is garbage collectedWorkerconstructionray_utils.pyfromenginetoexecutorpackage (per @zhuohan123's suggestion)This replaces #3466, see background in that issue.