Description
Describe the bug
when run DeepSeek-R1-Distill-Qwen-32B on 4 B60 GPU with 20K/12K, it will hang there even concurrency 1
How to reproduce
Steps to reproduce the error:
-
MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-40000}
MAX_MODEL_LEN=${MAX_MODEL_LEN:-40000}
export VLLM_USE_V1=1
python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server
--served-model-name $SERVED_MODEL_NAME
--port $PORT
--model $MODEL_PATH
--trust-remote-code
--block-size 64
--gpu-memory-utilization 0.95
--device xpu
--dtype float16
--enforce-eager
--load-in-low-bit $LOAD_IN_LOW_BIT
--max-model-len $MAX_MODEL_LEN
--max-num-batched-tokens $MAX_NUM_BATCHED_TOKENS
--max-num-seqs $MAX_NUM_SEQS
--tensor-parallel-size $TENSOR_PARALLEL_SIZE
--distributed-executor-backend ray -
for client
input_length=20480
output_length=12288
for bsize in 1 2 4 8 10; do
echo "benchmark serving bs${bsize}"
python /llm/vllm/benchmarks/benchmark_serving.py
--model ${modelname}
--served-model-name ${servedname}
--dataset-name random
--trust_remote_code
--ignore-eos
--num_prompt $bsize
--random-input-len=$input_length
--random-output-len=$output_length
--port 8000
Screenshots

Environment information
root@w05:/home/intel/ipex-llm/python/llm/scripts# bash env-check.sh
PYTHON_VERSION=3.11.13
[W617 13:56:29.995243467 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
[W617 13:56:31.952889367 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
transformers=4.52.4
[W617 13:56:38.056986411 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
[W617 13:56:41.725475380 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
torch=2.6.0+xpu
ipex-llm Version: 2.3.0b20250610
[W617 13:56:49.740651507 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
[W617 13:56:51.860952116 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
ipex=2.6.10+xpu
CPU Information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Xeon(R) w7-3565X
BIOS Model name: Intel(R) Xeon(R) w7-3565X CPU @ 2.5GHz
BIOS CPU family: 179
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
Stepping: 8
Total CPU Memory: 247.097 GB
Operating System:
Ubuntu 24.04.1 LTS \n \l
Linux w05 6.14.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr 6 15:05:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
env-check.sh: line 148: xpu-smi: command not found
env-check.sh: line 154: clinfo: command not found
Driver related package version:
igpu not detected
xpu-smi is not installed. Please install xpu-smi according to README.md
Additional context
Add any other context about the problem here.