Skip to content

[BUG] vLLM backend hangs with DDP #670

Closed
@lewtun

Description

@lewtun

Describe the bug

When running the latest version of vllm (0.8.3), the backend hangs when evaluating with more than 1 GPU.

To Reproduce

First install vllm:

pip install vllm==0.8.3

Then run:

lighteval vllm "pretrained=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B,dtype=bfloat16,data_parallel_size=2,max_model_length=32768,max_num_batched_tokens=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}" "lighteval|$TASK|0|0"     --use-chat-template     --output-dir $OUTPUT_DIR

The logs will hang at the Ray placement group part:

[2025-04-13 19:07:39,715] [    INFO]: PyTorch version 2.6.0 available. (config.py:54)
INFO 04-13 19:07:48 [__init__.py:239] Automatically detected platform cuda.
[2025-04-13 19:07:51,066] [    INFO]: --- LOADING MODEL --- (pipeline.py:189)
[2025-04-13 19:07:51,529] [    INFO]: --- INIT SEEDS --- (pipeline.py:263)
[2025-04-13 19:07:51,529] [    INFO]: --- LOADING TASKS --- (pipeline.py:216)
[2025-04-13 19:07:51,529] [    INFO]: Found 1 custom tasks in /fsx/lewis/git/hf/lighteval/src/lighteval/tasks/extended/ifeval/main.py (registry.py:142)
[2025-04-13 19:07:51,529] [    INFO]: Found 6 custom tasks in /fsx/lewis/git/hf/lighteval/src/lighteval/tasks/extended/tiny_benchmarks/main.py (registry.py:142)
[2025-04-13 19:07:51,529] [    INFO]: Found 1 custom tasks in /fsx/lewis/git/hf/lighteval/src/lighteval/tasks/extended/mt_bench/main.py (registry.py:142)
[2025-04-13 19:07:51,529] [    INFO]: Found 4 custom tasks in /fsx/lewis/git/hf/lighteval/src/lighteval/tasks/extended/mix_eval/main.py (registry.py:142)
[2025-04-13 19:07:51,529] [    INFO]: Found 5 custom tasks in /fsx/lewis/git/hf/lighteval/src/lighteval/tasks/extended/olympiade_bench/main.py (registry.py:142)
[2025-04-13 19:07:51,530] [    INFO]: Found 1 custom tasks in /fsx/lewis/git/hf/lighteval/src/lighteval/tasks/extended/hle/main.py (registry.py:142)
[2025-04-13 19:07:51,530] [    INFO]: Found 21 custom tasks in /fsx/lewis/git/hf/lighteval/src/lighteval/tasks/extended/lcb/main.py (registry.py:142)
[2025-04-13 19:07:51,534] [    INFO]: HuggingFaceH4/aime_2024 default (lighteval_task.py:187)
[2025-04-13 19:07:51,534] [ WARNING]: Careful, the task lighteval|aime24 is using evaluation data to build the few shot examples. (lighteval_task.py:260)
[2025-04-13 19:07:52,888] [    INFO]: --- RUNNING MODEL --- (pipeline.py:468)
[2025-04-13 19:07:52,888] [    INFO]: Running RequestType.GREEDY_UNTIL requests (pipeline.py:472)
[2025-04-13 19:07:52,906] [ WARNING]: You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring. (data.py:260)
Splits:   0%|                                                                                                                                                                         | 0/2 [00:00<?, ?it/s][2025-04-13 19:07:52,915] [ WARNING]: context_size + max_new_tokens=33238 which is greater than self.max_length=32768. Truncating context to 0 tokens. (vllm_model.py:274)
[2025-04-13 19:07:55,258] [    INFO]: Started a local Ray instance. (worker.py:1841)
(pid=1276189) INFO 04-13 19:08:03 [__init__.py:239] Automatically detected platform cuda.
(run_inference_one_model pid=1276191) INFO 04-13 19:08:16 [config.py:600] This model supports multiple tasks: {'embed', 'score', 'generate', 'classify', 'reward'}. Defaulting to 'generate'.
(pid=1276191) INFO 04-13 19:08:03 [__init__.py:239] Automatically detected platform cuda.
(run_inference_one_model pid=1276191) INFO 04-13 19:08:16 [config.py:1780] Chunked prefill is enabled with max_num_batched_tokens=32768.
(run_inference_one_model pid=1276189) INFO 04-13 19:08:16 [config.py:600] This model supports multiple tasks: {'reward', 'generate', 'embed', 'classify', 'score'}. Defaulting to 'generate'.
(run_inference_one_model pid=1276191) INFO 04-13 19:08:18 [core.py:61] Initializing a V1 LLM engine (v0.8.3) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=main, override_neuron_config=None, tokenizer_revision=main, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=1234, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
(run_inference_one_model pid=1276191) INFO 04-13 19:08:18 [ray_utils.py:288] Ray is already initialized. Skipping Ray initialization.
(run_inference_one_model pid=1276191) INFO 04-13 19:08:18 [ray_utils.py:335] No current placement group found. Creating a new placement group.

Expected behavior

No hanging.

Version info

Running lighteval@cc95ff274718186f587500556d7001645a273ce8. Additional info:

- `transformers` version: 4.51.2
- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
- Python version: 3.11.11
- Huggingface_hub version: 0.30.2
- Safetensors version: 0.5.3
- Accelerate version: 1.6.0
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: NVIDIA H100 80GB HBM3

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions