Skip to content

[Bug]: AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch #25494

@yewentao256

Description

@yewentao256

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

vllm serve --model="deepseek-ai/DeepSeek-R1" --max-num-seqs 512 --data-parallel-size 8 --enable-expert-parallel --gpu-memory-utilization 0.9 --port 9256

vllm bench serve --model deepseek-ai/DeepSeek-R1 --dataset-name random --host 127.0.0.1 --port 9256 --random-input-len 130000 --random-output-len 1 --request-rate inf --num-prompts 1

Will meet an error:

(EngineCore_DP7 pid=3747941)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP7 pid=3747941)     self.run()
(EngineCore_DP7 pid=3747941)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP7 pid=3747941)     self._target(*self._args, **self._kwargs)
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP7 pid=3747941)     raise e
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 701, in run_engine_core
(EngineCore_DP7 pid=3747941)     engine_core.run_busy_loop()
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 1056, in run_busy_loop
(EngineCore_DP7 pid=3747941)     self.execute_dummy_batch()
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 387, in execute_dummy_batch
(EngineCore_DP7 pid=3747941)     self.model_executor.execute_dummy_batch()
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/executor/abstract.py", line 109, in execute_dummy_batch
(EngineCore_DP7 pid=3747941)     self.collective_rpc("execute_dummy_batch")
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP7 pid=3747941)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP7 pid=3747941)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/utils/__init__.py", line 3010, in run_method
(EngineCore_DP7 pid=3747941)     return func(*args, **kwargs)
(EngineCore_DP7 pid=3747941)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_worker.py", line 490, in execute_dummy_batch
(EngineCore_DP7 pid=3747941)     self.model_runner._dummy_run(1, uniform_decode=True)
(EngineCore_DP7 pid=3747941)   File "/home/wentao/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_DP7 pid=3747941)     return func(*args, **kwargs)
(EngineCore_DP7 pid=3747941)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_model_runner.py", line 2918, in _dummy_run
(EngineCore_DP7 pid=3747941)     assert num_reqs <= max_num_reqs, \
(EngineCore_DP7 pid=3747941)            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941) AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions