Skip to content

vLLM's default multiprocessing method is incompatible with ROCm and Gaudi #2439

@tiran

Description

@tiran

Describe the bug
vLLM defaults to VLLM_WORKER_MULTIPROC_METHOD=fork, https://docs.vllm.ai/en/v0.6.1/serving/env_vars.html . Forking is incompatible with ROCm and Gaudi.

To Reproduce

  1. Configure InstructLab to use more than one GPU
  2. Run ilab model serve on a system with more than one AMD GPU

Expected behavior
vLLM works

Screenshots

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Additional context
I recommend to switch to "spawn". Python is switching from fork to spawn for all platforms. The fork method has issues, e.g. it can lead to deadlocks when a process mixes threads and fork.

I switch InstructLab to spawn a long time ago, because it was causing trouble on Gaudi, see #956. InstructLab should set VLLM_WORKER_MULTIPROC_METHOD=spawn by default.

Metadata

Metadata

Labels

bugSomething isn't workingjiraThis triggers jira syncstalevllmvLLM specific issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions