Skip to content

qwen32B hung when running 20K/12K w/ 4 GPU #13224

Open
@aitss2017

Description

@aitss2017

Describe the bug
when run DeepSeek-R1-Distill-Qwen-32B on 4 B60 GPU with 20K/12K, it will hang there even concurrency 1

How to reproduce
Steps to reproduce the error:

  1. MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-40000}
    MAX_MODEL_LEN=${MAX_MODEL_LEN:-40000}
    export VLLM_USE_V1=1
    python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server
    --served-model-name $SERVED_MODEL_NAME
    --port $PORT
    --model $MODEL_PATH
    --trust-remote-code
    --block-size 64
    --gpu-memory-utilization 0.95
    --device xpu
    --dtype float16
    --enforce-eager
    --load-in-low-bit $LOAD_IN_LOW_BIT
    --max-model-len $MAX_MODEL_LEN
    --max-num-batched-tokens $MAX_NUM_BATCHED_TOKENS
    --max-num-seqs $MAX_NUM_SEQS
    --tensor-parallel-size $TENSOR_PARALLEL_SIZE
    --distributed-executor-backend ray

  2. for client
    input_length=20480
    output_length=12288

for bsize in 1 2 4 8 10; do
echo "benchmark serving bs${bsize}"
python /llm/vllm/benchmarks/benchmark_serving.py
--model ${modelname}
--served-model-name ${servedname}
--dataset-name random
--trust_remote_code
--ignore-eos
--num_prompt $bsize
--random-input-len=$input_length
--random-output-len=$output_length
--port 8000

Screenshots

Image

Environment information
root@w05:/home/intel/ipex-llm/python/llm/scripts# bash env-check.sh

PYTHON_VERSION=3.11.13

[W617 13:56:29.995243467 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
[W617 13:56:31.952889367 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
transformers=4.52.4

[W617 13:56:38.056986411 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
[W617 13:56:41.725475380 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
torch=2.6.0+xpu

ipex-llm Version: 2.3.0b20250610

[W617 13:56:49.740651507 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
[W617 13:56:51.860952116 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477
new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator())
ipex=2.6.10+xpu

CPU Information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Xeon(R) w7-3565X
BIOS Model name: Intel(R) Xeon(R) w7-3565X CPU @ 2.5GHz
BIOS CPU family: 179
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
Stepping: 8

Total CPU Memory: 247.097 GB

Operating System:
Ubuntu 24.04.1 LTS \n \l


Linux w05 6.14.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr 6 15:05:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

env-check.sh: line 148: xpu-smi: command not found

env-check.sh: line 154: clinfo: command not found

Driver related package version:

igpu not detected

xpu-smi is not installed. Please install xpu-smi according to README.md

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions