Open
Description
Your current environment
vLLM API server version 0.7.2
🐛 Describe the bug
I create a cluster with two instance each with 1 GPU.
- head
RAY_num_heartbeats_timeout=600 ray start --head --node-ip-address HEAD-IP \
--port 6379 \
--ray-client-server-port 10001 \
--object-manager-port=8076 \
--node-manager-port=8077
--------------------
Ray runtime started.
--------------------
Next steps
To add another node to this Ray cluster, run
ray start --address='HEAD-IP:6379'
- worker
ray start --object-manager-port=8076 \
--address='HEAD-IP:6379' \
--node-manager-port=8077
-
serve model with two option, first only send
--tensor-parallel-size 1 --pipeline-parallel-size 2
and second time with--tensor-parallel-size 2
and with both I have the following error: -
Error:
2025-02-19 07:41:17,494 INFO worker.py:1654 -- Connecting to existing Ray cluster at address: HEAD-IP:6379...
2025-02-19 07:41:17,507 INFO worker.py:1832 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265
(autoscaler +18s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
(autoscaler +18s) Error: No available node types can fulfill resource request {'node:HEAD-IP:6379': 0.001, 'GPU': 1.0}. Add suitable node types to this cluster to resolve this issue.
- commands:
VLLM_HOST_IP=HEAD-IP:6379 vllm serve NousResearch/Meta-Llama-3.1-8B-Instruct --max-model-len 8192 --gpu-memory-utilization 0.8 \
--tensor-parallel-size 1 --pipeline-parallel-size 2 --distributed-executor-backend ray
VLLM_HOST_IP=HEAD-IP:6379 vllm serve NousResearch/Meta-Llama-3.1-8B-Instruct --max-model-len 8192 --gpu-memory-utilization 0.8 \
--tensor-parallel-size 2 --distributed-executor-backend ray
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Type
Projects
Status
Need User Input