Skip to content

[Bug]: Qwen3-30B-A3B Shows Precision Issues in DP2+TP2 Parallel Mode #1289

Open
@wjx-xin

Description

@wjx-xin

Your current environment

The output of `python collect_env.py`
CPU:
Architecture:                    aarch64
CPU op-mode(s):                  64-bit

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.post1.dev20250528
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[conda] Could not collect
vLLM Version: 0.9.1
vLLM Ascend Version: 0.9.0rc3.dev33+gdb2f630 (git sha: db2f630)

HDK: 24.1.0.3

CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux



🐛 Describe the bug

Apply this pr: https://github.com/vllm-project/vllm-ascend/pull/1273/files
run this command:

python examples/offline_data_parallel.py \
                --model="Qwen3-30B-A3B" \
                --dp-size=2 \
                --tp-size=2 \
                --enforce-eager

there is an precision issue with the results in DP rank 0 :

DP rank 1, Prompt: 'Hello, my name is', Generated text: ' Shin-ji, and I am a 1st grade student. My teacher is a 5'
DP rank 1, Prompt: 'The president of the United States is', Generated text: ' a woman.  ( ) A. Correct B. Incorrect C. Cannot be determined D. None'
DP rank 1, Prompt: 'The capital of France is', Generated text: ' the city where the main office of the French government is located. What is the capital of France?\n\n'
DP rank 1, Prompt: 'The future of AI is', Generated text: ' bright, but not without challenges. The evolution of AI will be shaped by the ethical and legal frameworks'
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' Phoe, and I am a student. It is nice to meet you. (This is a'
Processed prompts: 100%|██████| 200/200 [00:09<00:00, 21.74it/s, est. speed input: 119.58 toks/s, output: 347.87 toks/s]
DP rank 0, Prompt: 'Hello, my name is', Generated text: ', and, and, and, and, and, and, and, and'
DP rank 0, Prompt: 'The president of the United States is', Generated text: ' the the the the the the the the the the the the the the the the'
DP rank 0, Prompt: 'The capital of France is', Generated text: ' the capital of the capital of the capital of the capital of the capital of the'
DP rank 0, Prompt: 'The future of AI is', Generated text: ' now is the the the the the the the the the the the the the the'
DP rank 0, Prompt: 'Hello, my name is', Generated text: ', and, and, and, and, and, and, and, and'

Executing the command python examples/offline_data_parallel.py --model="Qwen3-30B-A3B" --dp-size=2 --tp-size=2 --enable-expert-parallel --enforce-eager leads to similar issues. Graph mode exhibits the same issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions