Compatibility issue between liger-kernel and zero3: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1

### 🐛 Describe the bug

When I was training qwen3-8b using liger-kernel + zero3, an error occurred during backpropagation:
`The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1`

After debugging, I found that changing the `stage3_param_persistence_threshold` parameter in the zero3 config from `auto` to `1e10` can solve this problem. Alternatively, changing zero3 to zero2 also works.

I want to ask, why is this? I might not have this problem when training gemma3. Can anyone help explain where the compatibility issue between liger-kernel and deepspeed is when training Qwen3?

Hope your answer, thanks!

### Reproduce

_No response_

### Versions
Operating System: Linux-5.15.0-126-generic-x86_64-with-glibc2.39
Python version: 3.12.12
Liger Kernel version: 0.5.10
PyTorch version: 2.7.1+cu126
CUDA version: 12.6
HIP(ROCm) version: Not available
Triton version: 3.3.1
Transformers version: 4.51.3
XPU version: XPU Not Available

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility issue between liger-kernel and zero3: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1 #1048

🐛 Describe the bug

Reproduce

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compatibility issue between liger-kernel and zero3: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1 #1048

Description

🐛 Describe the bug

Reproduce

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions