Closed
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
Start the vLLM service using the following command.
VLLM_ATTENTION_BACKEND=FLASHMLA
VLLM_USE_V1=1
OMP_NUM_THREADS=12
/opt/vllm-0.7.4-dev/bin/vllm serve DeepSeek-R1
--max-model-len 131072
--max-num-batched-tokens 8192
--enable-reasoning --reasoning-parser deepseek_r1
--api_key ${VLLM_API_KEY}
--tensor-parallel-size 8
--trust-remote-code
--disable-log-requests
--enable-prefix-caching
--enable-chunked-prefill
--gpu_memory_utilization=0.95
-O3
Package Version
--------------------------------- ----------------------
aiohappyeyeballs 2.6.1
aiohttp 3.11.13
aiosignal 1.3.2
airportsdata 20250224
annotated-types 0.7.0
anyio 4.8.0
astor 0.8.1
attrs 25.3.0
blake3 1.0.4
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
cloudpickle 3.1.1
compressed-tensors 0.9.2
cupy-cuda12x 13.4.0
depyf 0.18.0
dill 0.3.9
diskcache 5.6.3
distro 1.9.0
dnspython 2.7.0
einops 0.8.1
email_validator 2.2.0
fastapi 0.115.11
fastapi-cli 0.0.7
fastrlock 0.8.3
filelock 3.17.0
flashinfer-python 0.2.2+cu124torch2.5
frozenlist 1.5.0
fsspec 2025.3.0
gguf 0.10.0
h11 0.14.0
httpcore 1.0.7
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.29.3
idna 3.10
importlib_metadata 8.6.1
interegular 0.3.3
Jinja2 3.1.6
jiter 0.9.0
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
lark 1.2.2
llvmlite 0.43.0
lm-format-enforcer 0.10.11
markdown-it-py 3.0.0
MarkupSafe 3.0.2
mdurl 0.1.2
mistral_common 1.5.3
mpmath 1.3.0
msgpack 1.1.0
msgspec 0.19.0
multidict 6.1.0
nest-asyncio 1.6.0
networkx 3.4.2
numba 0.60.0
numpy 1.26.4
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
openai 1.66.3
opencv-python-headless 4.11.0.86
outlines 0.1.11
outlines_core 0.1.26
packaging 24.2
partial-json-parser 0.2.1.1.post5
pillow 11.1.0
pip 25.0.1
prometheus_client 0.21.1
prometheus-fastapi-instrumentator 7.0.2
propcache 0.3.0
protobuf 6.30.1
psutil 7.0.0
py-cpuinfo 9.0.0
pycountry 24.6.1
pydantic 2.11.0b1
pydantic_core 2.31.1
Pygments 2.19.1
python-dotenv 1.0.1
python-json-logger 3.3.0
python-multipart 0.0.20
PyYAML 6.0.2
pyzmq 26.3.0
ray 2.43.0
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rich 13.9.4
rich-toolkit 0.13.2
rpds-py 0.23.1
safetensors 0.5.3
scipy 1.15.2
sentencepiece 0.2.0
setuptools 75.8.2
shellingham 1.5.4
six 1.17.0
sniffio 1.3.1
starlette 0.46.1
sympy 1.13.1
tiktoken 0.9.0
tokenizers 0.21.1
torch 2.5.1+cu124
torchaudio 2.5.1+cu124
torchvision 0.20.1+cu124
tqdm 4.67.1
transformers 4.49.0
triton 3.1.0
typer 0.15.2
typing_extensions 4.12.2
typing-inspection 0.4.0
urllib3 2.3.0
uvicorn 0.34.0
uvloop 0.21.0
vllm 0.7.4.dev418+gd47807ba
watchfiles 1.0.4
websockets 15.0.1
wheel 0.45.1
xformers 0.0.28.post3
xgrammar 0.1.15
yarl 1.18.3
zipp 3.21.0
#0 0x00007f11a6f5fc9b in sched_yield () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#1 0x000056525e213103 in os_sched_yield_impl (module=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/posixmodule.c:7198
#2 os_sched_yield (module=<optimized out>, _unused_ignored=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/clinic/posixmodule.c.h:3017
#3 0x000056525e2483fe in cfunction_vectorcall_NOARGS (func=0x7f11a6daeca0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Include/cpython/methodobject.h:52
#4 0x000056525e25d5ec in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7f11a6daeca0, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:92
#5 PyObject_Vectorcall (callable=0x7f11a6daeca0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:299
#6 0x000056525e250d19 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f0ca5ce7e70, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:4769
#7 0x000056525e2dfeb7 in _PyEval_EvalFrame (throwflag=0, frame=0x7f0ca5ce7e70, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#8 gen_send_ex2 (closing=0, exc=0, presult=<synthetic pointer>, arg=0x0, gen=0x7f0ca5ce7e20) at /usr/local/src/conda/python-3.11.11/Objects/genobject.c:219
#9 gen_iternext (gen=0x7f0ca5ce7e20) at /usr/local/src/conda/python-3.11.11/Objects/genobject.c:594
#10 builtin_next (self=<optimized out>, args=0x7f11a6c97568, nargs=1) at /usr/local/src/conda/python-3.11.11/Python/bltinmodule.c:1477
#11 0x000056525e251b37 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f11a6c97508, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:5050
#12 0x000056525e294602 in _PyEval_EvalFrame (throwflag=0, frame=0x7f11a6c97508, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#13 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x7ffdb180bfe0, locals=0x0, func=0x7f11a6b16fc0, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:6434
#14 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x7ffdb180bfe0, func=0x7f11a6b16fc0) at /usr/local/src/conda/python-3.11.11/Objects/call.c:393
#15 _PyObject_VectorcallTstate (tstate=0x56525e5dd9f8 <_PyRuntime+166328>, callable=0x7f11a6b16fc0, args=0x7ffdb180bfe0, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:92
#16 0x000056525e2551fc in method_vectorcall (kwnames=0x0, nargsf=0, args=0x0, method=0x7f0c95006a80) at /usr/local/src/conda/python-3.11.11/Objects/classobject.c:67
#17 _PyObject_VectorcallTstate (args=0x0, nargsf=0, kwnames=0x0, callable=0x7f0c95006a80, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:92
#18 _PyObject_CallNoArgs (func=0x7f0c95006a80) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:107
#19 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f11a6c97320, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:4422
#20 0x000056525e27577f in _PyEval_EvalFrame (throwflag=0, frame=0x7f11a6c97320, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#21 _PyEval_Vector (kwnames=<optimized out>, argcount=0, args=0x7f11a6c49138, locals=0x0, func=0x7f1075a0b560, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:6434
#22 _PyFunction_Vectorcall (func=0x7f1075a0b560, stack=0x7f11a6c49138, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:393
#23 0x000056525e27f574 in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7f1075a0b560, func=0x56525e275600 <_PyFunction_Vectorcall>, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:257
#24 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7f1075a0b560, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:328
#25 PyObject_Call (callable=0x7f1075a0b560, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:355
#26 0x000056525e25500a in do_call_core (use_tracing=<optimized out>, kwdict=0x7f11a6c33b80, callargs=0x56525e5c3658 <_PyRuntime+58904>, func=0x7f1075a0b560, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:7349
#27 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f11a6c97020, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:5376
#28 0x000056525e307c5d in _PyEval_EvalFrame (throwflag=0, frame=0x7f11a6c97020, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#29 _PyEval_Vector (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, func=func@entry=0x7f11a6dcdf80, locals=locals@entry=0x7f11a6df6b40, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:6434
#30 0x000056525e30739f in PyEval_EvalCode (co=co@entry=0x7f11a6d80e40, globals=globals@entry=0x7f11a6df6b40, locals=locals@entry=0x7f11a6df6b40) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:1148
#31 0x000056525e32530a in run_eval_code_obj (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, co=co@entry=0x7f11a6d80e40, globals=globals@entry=0x7f11a6df6b40, locals=locals@entry=0x7f11a6df6b40) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:1741
#32 0x000056525e320f93 in run_mod (mod=mod@entry=0x565260144508, filename=filename@entry=0x56525e5ba550 <_PyRuntime+21776>, globals=globals@entry=0x7f11a6df6b40, locals=locals@entry=0x7f11a6df6b40, flags=flags@entry=0x7ffdb180c488, arena=arena@entry=0x7f11a6d1b650) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:1762
#33 0x000056525e3159a2 in PyRun_StringFlags (str=<optimized out>, start=<optimized out>, globals=0x7f11a6df6b40, locals=0x7f11a6df6b40, flags=0x7ffdb180c488) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:1632
#34 0x000056525e31575c in PyRun_SimpleStringFlags (command=0x7f11a6d962d0 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=53, pipe_handle=22)\n", flags=flags@entry=0x7ffdb180c488) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:487
#35 0x000056525e3300ec in pymain_run_command (command=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/main.c:255
#36 pymain_run_python (exitcode=0x7ffdb180c480) at /usr/local/src/conda/python-3.11.11/Modules/main.c:596
#37 Py_RunMain () at /usr/local/src/conda/python-3.11.11/Modules/main.c:684
#38 0x000056525e2f7617 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/main.c:738
#39 0x00007f11a6e80d90 in ?? () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#40 0x00007f11a6e80e40 in __libc_start_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#41 0x000056525e2f74ca in _start ()
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.