Skip to content

[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node #12132

Closed
@drikster80

Description

@drikster80

Your current environment

Running in Docker container (Kubernetes) on (4) GH200 nodes. 1 GPU per node.

Model Input Dumps

python3 -m vllm.entrypoints.openai.api_server --model /models/my-model
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--served-model-name my-model
--trust-remote-code
--api-key "NONE"
--rope-scaling '{"rope_type":"dynamic","factor":4.0}'
--enable-prefix-caching
--max-model-len 131072

🐛 Describe the bug

@youkaichao, it looks like #11256 forces --tensor-parallel-size to be > per node GPU.

https://github.com/vllm-project/vllm/blob/main/vllm/platforms/cuda.py#L156

        # Use confusing message for more common TP-only case.
        assert tensor_parallel_size <= cuda_device_count, (
            f"please set tensor_parallel_size ({tensor_parallel_size}) "
            f"to less than max local gpu count ({cuda_device_count})")

Currently testing main with (4) nodes, (1) GPU per node results in (same model/code/execution works perfectly in v0.6.4.post1):

Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 382, in run_mp_engine
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 371, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 115, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1244, in create_engine_config
    config = VllmConfig(
             ^^^^^^^^^^^
  File "<string>", line 19, in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 3204, in __post_init__
    current_platform.check_and_update_config(self)
  File "/usr/local/lib/python3.12/dist-packages/vllm/platforms/cuda.py", line 156, in check_and_update_config
    assert tensor_parallel_size <= cuda_device_count, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: please set tensor_parallel_size (4) to less than max local gpu count (1)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions