Skip to content

Compatibility issue between liger-kernel and zero3: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1 #1048

@Snowdar

Description

@Snowdar

🐛 Describe the bug

When I was training qwen3-8b using liger-kernel + zero3, an error occurred during backpropagation:
The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1

After debugging, I found that changing the stage3_param_persistence_threshold parameter in the zero3 config from auto to 1e10 can solve this problem. Alternatively, changing zero3 to zero2 also works.

I want to ask, why is this? I might not have this problem when training gemma3. Can anyone help explain where the compatibility issue between liger-kernel and deepspeed is when training Qwen3?

Hope your answer, thanks!

Reproduce

No response

Versions

Operating System: Linux-5.15.0-126-generic-x86_64-with-glibc2.39
Python version: 3.12.12
Liger Kernel version: 0.5.10
PyTorch version: 2.7.1+cu126
CUDA version: 12.6
HIP(ROCm) version: Not available
Triton version: 3.3.1
Transformers version: 4.51.3
XPU version: XPU Not Available

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions