Open
Description
Your current environment
The output of `python collect_env.py`
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.post1.dev20250528
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[conda] Could not collect
vLLM Version: 0.9.1
vLLM Ascend Version: 0.9.0rc3.dev33+gdb2f630 (git sha: db2f630)
HDK: 24.1.0.3
CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
🐛 Describe the bug
Apply this pr: https://github.com/vllm-project/vllm-ascend/pull/1273/files
run this command:
python examples/offline_data_parallel.py \
--model="Qwen3-30B-A3B" \
--dp-size=2 \
--tp-size=2 \
--enforce-eager
there is an precision issue with the results in DP rank 0 :
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' Shin-ji, and I am a 1st grade student. My teacher is a 5'
DP rank 1, Prompt: 'The president of the United States is', Generated text: ' a woman. ( ) A. Correct B. Incorrect C. Cannot be determined D. None'
DP rank 1, Prompt: 'The capital of France is', Generated text: ' the city where the main office of the French government is located. What is the capital of France?\n\n'
DP rank 1, Prompt: 'The future of AI is', Generated text: ' bright, but not without challenges. The evolution of AI will be shaped by the ethical and legal frameworks'
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' Phoe, and I am a student. It is nice to meet you. (This is a'
Processed prompts: 100%|██████| 200/200 [00:09<00:00, 21.74it/s, est. speed input: 119.58 toks/s, output: 347.87 toks/s]
DP rank 0, Prompt: 'Hello, my name is', Generated text: ', and, and, and, and, and, and, and, and'
DP rank 0, Prompt: 'The president of the United States is', Generated text: ' the the the the the the the the the the the the the the the the'
DP rank 0, Prompt: 'The capital of France is', Generated text: ' the capital of the capital of the capital of the capital of the capital of the'
DP rank 0, Prompt: 'The future of AI is', Generated text: ' now is the the the the the the the the the the the the the the'
DP rank 0, Prompt: 'Hello, my name is', Generated text: ', and, and, and, and, and, and, and, and'
Executing the command python examples/offline_data_parallel.py --model="Qwen3-30B-A3B" --dp-size=2 --tp-size=2 --enable-expert-parallel --enforce-eager
leads to similar issues. Graph mode exhibits the same issues.