- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
Closed
Closed
Copy link
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
vllm serve --model="deepseek-ai/DeepSeek-R1" --max-num-seqs 512 --data-parallel-size 8 --enable-expert-parallel --gpu-memory-utilization 0.9 --port 9256
vllm bench serve --model deepseek-ai/DeepSeek-R1 --dataset-name random --host 127.0.0.1 --port 9256 --random-input-len 130000 --random-output-len 1 --request-rate inf --num-prompts 1
Will meet an error:
(EngineCore_DP7 pid=3747941)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP7 pid=3747941)     self.run()
(EngineCore_DP7 pid=3747941)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP7 pid=3747941)     self._target(*self._args, **self._kwargs)
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP7 pid=3747941)     raise e
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 701, in run_engine_core
(EngineCore_DP7 pid=3747941)     engine_core.run_busy_loop()
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 1056, in run_busy_loop
(EngineCore_DP7 pid=3747941)     self.execute_dummy_batch()
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 387, in execute_dummy_batch
(EngineCore_DP7 pid=3747941)     self.model_executor.execute_dummy_batch()
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/executor/abstract.py", line 109, in execute_dummy_batch
(EngineCore_DP7 pid=3747941)     self.collective_rpc("execute_dummy_batch")
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP7 pid=3747941)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP7 pid=3747941)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/utils/__init__.py", line 3010, in run_method
(EngineCore_DP7 pid=3747941)     return func(*args, **kwargs)
(EngineCore_DP7 pid=3747941)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_worker.py", line 490, in execute_dummy_batch
(EngineCore_DP7 pid=3747941)     self.model_runner._dummy_run(1, uniform_decode=True)
(EngineCore_DP7 pid=3747941)   File "/home/wentao/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_DP7 pid=3747941)     return func(*args, **kwargs)
(EngineCore_DP7 pid=3747941)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_model_runner.py", line 2918, in _dummy_run
(EngineCore_DP7 pid=3747941)     assert num_reqs <= max_num_reqs, \
(EngineCore_DP7 pid=3747941)            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP7 pid=3747941) AssertionError: Do not capture num_reqs > max_num_reqs for uniform batchBefore submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working