Skip to content

[Bug]: v0.7.4 dev version CPU usage remains at 100% even when no requests are being processed. #14799

Closed
@AndrewTsao

Description

@AndrewTsao

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

Start the vLLM service using the following command.

VLLM_ATTENTION_BACKEND=FLASHMLA
VLLM_USE_V1=1
OMP_NUM_THREADS=12

  /opt/vllm-0.7.4-dev/bin/vllm serve DeepSeek-R1
      --max-model-len 131072
      --max-num-batched-tokens 8192
      --enable-reasoning --reasoning-parser deepseek_r1
      --api_key ${VLLM_API_KEY}
      --tensor-parallel-size 8
      --trust-remote-code
      --disable-log-requests
      --enable-prefix-caching
      --enable-chunked-prefill
      --gpu_memory_utilization=0.95
      -O3
Package                           Version
--------------------------------- ----------------------
aiohappyeyeballs                  2.6.1
aiohttp                           3.11.13
aiosignal                         1.3.2
airportsdata                      20250224
annotated-types                   0.7.0
anyio                             4.8.0
astor                             0.8.1
attrs                             25.3.0
blake3                            1.0.4
certifi                           2025.1.31
charset-normalizer                3.4.1
click                             8.1.8
cloudpickle                       3.1.1
compressed-tensors                0.9.2
cupy-cuda12x                      13.4.0
depyf                             0.18.0
dill                              0.3.9
diskcache                         5.6.3
distro                            1.9.0
dnspython                         2.7.0
einops                            0.8.1
email_validator                   2.2.0
fastapi                           0.115.11
fastapi-cli                       0.0.7
fastrlock                         0.8.3
filelock                          3.17.0
flashinfer-python                 0.2.2+cu124torch2.5
frozenlist                        1.5.0
fsspec                            2025.3.0
gguf                              0.10.0
h11                               0.14.0
httpcore                          1.0.7
httptools                         0.6.4
httpx                             0.28.1
huggingface-hub                   0.29.3
idna                              3.10
importlib_metadata                8.6.1
interegular                       0.3.3
Jinja2                            3.1.6
jiter                             0.9.0
jsonschema                        4.23.0
jsonschema-specifications         2024.10.1
lark                              1.2.2
llvmlite                          0.43.0
lm-format-enforcer                0.10.11
markdown-it-py                    3.0.0
MarkupSafe                        3.0.2
mdurl                             0.1.2
mistral_common                    1.5.3
mpmath                            1.3.0
msgpack                           1.1.0
msgspec                           0.19.0
multidict                         6.1.0
nest-asyncio                      1.6.0
networkx                          3.4.2
numba                             0.60.0
numpy                             1.26.4
nvidia-cublas-cu12                12.4.5.8
nvidia-cuda-cupti-cu12            12.4.127
nvidia-cuda-nvrtc-cu12            12.4.127
nvidia-cuda-runtime-cu12          12.4.127
nvidia-cudnn-cu12                 9.1.0.70
nvidia-cufft-cu12                 11.2.1.3
nvidia-curand-cu12                10.3.5.147
nvidia-cusolver-cu12              11.6.1.9
nvidia-cusparse-cu12              12.3.1.170
nvidia-nccl-cu12                  2.21.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.4.127
openai                            1.66.3
opencv-python-headless            4.11.0.86
outlines                          0.1.11
outlines_core                     0.1.26
packaging                         24.2
partial-json-parser               0.2.1.1.post5
pillow                            11.1.0
pip                               25.0.1
prometheus_client                 0.21.1
prometheus-fastapi-instrumentator 7.0.2
propcache                         0.3.0
protobuf                          6.30.1
psutil                            7.0.0
py-cpuinfo                        9.0.0
pycountry                         24.6.1
pydantic                          2.11.0b1
pydantic_core                     2.31.1
Pygments                          2.19.1
python-dotenv                     1.0.1
python-json-logger                3.3.0
python-multipart                  0.0.20
PyYAML                            6.0.2
pyzmq                             26.3.0
ray                               2.43.0
referencing                       0.36.2
regex                             2024.11.6
requests                          2.32.3
rich                              13.9.4
rich-toolkit                      0.13.2
rpds-py                           0.23.1
safetensors                       0.5.3
scipy                             1.15.2
sentencepiece                     0.2.0
setuptools                        75.8.2
shellingham                       1.5.4
six                               1.17.0
sniffio                           1.3.1
starlette                         0.46.1
sympy                             1.13.1
tiktoken                          0.9.0
tokenizers                        0.21.1
torch                             2.5.1+cu124
torchaudio                        2.5.1+cu124
torchvision                       0.20.1+cu124
tqdm                              4.67.1
transformers                      4.49.0
triton                            3.1.0
typer                             0.15.2
typing_extensions                 4.12.2
typing-inspection                 0.4.0
urllib3                           2.3.0
uvicorn                           0.34.0
uvloop                            0.21.0
vllm                              0.7.4.dev418+gd47807ba
watchfiles                        1.0.4
websockets                        15.0.1
wheel                             0.45.1
xformers                          0.0.28.post3
xgrammar                          0.1.15
yarl                              1.18.3
zipp                              3.21.0

Image

#0  0x00007f11a6f5fc9b in sched_yield () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x000056525e213103 in os_sched_yield_impl (module=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/posixmodule.c:7198
#2  os_sched_yield (module=<optimized out>, _unused_ignored=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/clinic/posixmodule.c.h:3017
#3  0x000056525e2483fe in cfunction_vectorcall_NOARGS (func=0x7f11a6daeca0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Include/cpython/methodobject.h:52
#4  0x000056525e25d5ec in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7f11a6daeca0, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:92
#5  PyObject_Vectorcall (callable=0x7f11a6daeca0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:299
#6  0x000056525e250d19 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f0ca5ce7e70, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:4769
#7  0x000056525e2dfeb7 in _PyEval_EvalFrame (throwflag=0, frame=0x7f0ca5ce7e70, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#8  gen_send_ex2 (closing=0, exc=0, presult=<synthetic pointer>, arg=0x0, gen=0x7f0ca5ce7e20) at /usr/local/src/conda/python-3.11.11/Objects/genobject.c:219
#9  gen_iternext (gen=0x7f0ca5ce7e20) at /usr/local/src/conda/python-3.11.11/Objects/genobject.c:594
#10 builtin_next (self=<optimized out>, args=0x7f11a6c97568, nargs=1) at /usr/local/src/conda/python-3.11.11/Python/bltinmodule.c:1477
#11 0x000056525e251b37 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f11a6c97508, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:5050
#12 0x000056525e294602 in _PyEval_EvalFrame (throwflag=0, frame=0x7f11a6c97508, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#13 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x7ffdb180bfe0, locals=0x0, func=0x7f11a6b16fc0, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:6434
#14 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x7ffdb180bfe0, func=0x7f11a6b16fc0) at /usr/local/src/conda/python-3.11.11/Objects/call.c:393
#15 _PyObject_VectorcallTstate (tstate=0x56525e5dd9f8 <_PyRuntime+166328>, callable=0x7f11a6b16fc0, args=0x7ffdb180bfe0, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:92
#16 0x000056525e2551fc in method_vectorcall (kwnames=0x0, nargsf=0, args=0x0, method=0x7f0c95006a80) at /usr/local/src/conda/python-3.11.11/Objects/classobject.c:67
#17 _PyObject_VectorcallTstate (args=0x0, nargsf=0, kwnames=0x0, callable=0x7f0c95006a80, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:92
#18 _PyObject_CallNoArgs (func=0x7f0c95006a80) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_call.h:107
#19 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f11a6c97320, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:4422
#20 0x000056525e27577f in _PyEval_EvalFrame (throwflag=0, frame=0x7f11a6c97320, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#21 _PyEval_Vector (kwnames=<optimized out>, argcount=0, args=0x7f11a6c49138, locals=0x0, func=0x7f1075a0b560, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:6434
#22 _PyFunction_Vectorcall (func=0x7f1075a0b560, stack=0x7f11a6c49138, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:393
#23 0x000056525e27f574 in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7f1075a0b560, func=0x56525e275600 <_PyFunction_Vectorcall>, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:257
#24 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7f1075a0b560, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:328
#25 PyObject_Call (callable=0x7f1075a0b560, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.11/Objects/call.c:355
#26 0x000056525e25500a in do_call_core (use_tracing=<optimized out>, kwdict=0x7f11a6c33b80, callargs=0x56525e5c3658 <_PyRuntime+58904>, func=0x7f1075a0b560, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:7349
#27 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7f11a6c97020, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:5376
#28 0x000056525e307c5d in _PyEval_EvalFrame (throwflag=0, frame=0x7f11a6c97020, tstate=0x56525e5dd9f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.11/Include/internal/pycore_ceval.h:73
#29 _PyEval_Vector (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, func=func@entry=0x7f11a6dcdf80, locals=locals@entry=0x7f11a6df6b40, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:6434
#30 0x000056525e30739f in PyEval_EvalCode (co=co@entry=0x7f11a6d80e40, globals=globals@entry=0x7f11a6df6b40, locals=locals@entry=0x7f11a6df6b40) at /usr/local/src/conda/python-3.11.11/Python/ceval.c:1148
#31 0x000056525e32530a in run_eval_code_obj (tstate=tstate@entry=0x56525e5dd9f8 <_PyRuntime+166328>, co=co@entry=0x7f11a6d80e40, globals=globals@entry=0x7f11a6df6b40, locals=locals@entry=0x7f11a6df6b40) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:1741
#32 0x000056525e320f93 in run_mod (mod=mod@entry=0x565260144508, filename=filename@entry=0x56525e5ba550 <_PyRuntime+21776>, globals=globals@entry=0x7f11a6df6b40, locals=locals@entry=0x7f11a6df6b40, flags=flags@entry=0x7ffdb180c488, arena=arena@entry=0x7f11a6d1b650) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:1762
#33 0x000056525e3159a2 in PyRun_StringFlags (str=<optimized out>, start=<optimized out>, globals=0x7f11a6df6b40, locals=0x7f11a6df6b40, flags=0x7ffdb180c488) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:1632
#34 0x000056525e31575c in PyRun_SimpleStringFlags (command=0x7f11a6d962d0 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=53, pipe_handle=22)\n", flags=flags@entry=0x7ffdb180c488) at /usr/local/src/conda/python-3.11.11/Python/pythonrun.c:487
#35 0x000056525e3300ec in pymain_run_command (command=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/main.c:255
#36 pymain_run_python (exitcode=0x7ffdb180c480) at /usr/local/src/conda/python-3.11.11/Modules/main.c:596
#37 Py_RunMain () at /usr/local/src/conda/python-3.11.11/Modules/main.c:684
#38 0x000056525e2f7617 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.11/Modules/main.c:738
#39 0x00007f11a6e80d90 in ?? () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#40 0x00007f11a6e80e40 in __libc_start_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#41 0x000056525e2f74ca in _start ()

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions