-
Notifications
You must be signed in to change notification settings - Fork 440
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't workingjiraThis triggers jira syncThis triggers jira syncstalevllmvLLM specific issuesvLLM specific issues
Description
Describe the bug
vLLM defaults to VLLM_WORKER_MULTIPROC_METHOD=fork
, https://docs.vllm.ai/en/v0.6.1/serving/env_vars.html . Forking is incompatible with ROCm and Gaudi.
To Reproduce
- Configure InstructLab to use more than one GPU
- Run
ilab model serve
on a system with more than one AMD GPU
Expected behavior
vLLM works
Screenshots
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Additional context
I recommend to switch to "spawn". Python is switching from fork to spawn for all platforms. The fork method has issues, e.g. it can lead to deadlocks when a process mixes threads and fork.
I switch InstructLab to spawn a long time ago, because it was causing trouble on Gaudi, see #956. InstructLab should set VLLM_WORKER_MULTIPROC_METHOD=spawn
by default.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingjiraThis triggers jira syncThis triggers jira syncstalevllmvLLM specific issuesvLLM specific issues