Skip to content

Add support for Qwen3.5 MoE#1109

Open
michaelroyzen wants to merge 6 commits intolinkedin:mainfrom
michaelroyzen:add-qwen3_5_moe
Open

Add support for Qwen3.5 MoE#1109
michaelroyzen wants to merge 6 commits intolinkedin:mainfrom
michaelroyzen:add-qwen3_5_moe

Conversation

@michaelroyzen
Copy link

Add Qwen3.5 MoE support to Liger Kernel

Summary

  • Adds Liger Kernel optimizations for the Qwen3.5 MoE model family (qwen3_5_moe / qwen3_5_moe_text), targeting Transformers v5+
  • Qwen3.5 MoE combines Qwen3 Next's hybrid GDN/attention architecture with Sparse MoE (shared + routed experts), so the implementation mirrors Qwen3 Next's Liger integration: Gemma-style RMSNorm (LigerRMSNormForQwen3Next), fused SwiGLU experts (LigerExperts), and fused linear cross-entropy loss

Changes

New file:

  • src/liger_kernel/transformers/model/qwen3_5_moe.pylce_forward for Qwen3_5MoeForCausalLM, based on the Qwen3 Next version with the load_balancing_loss_func import updated to point to Qwen3.5 MoE's local definition

Modified files:

  • src/liger_kernel/transformers/monkey_patch.pyapply_liger_kernel_to_qwen3_5_moe function (RMSNorm, SwiGLU experts, fused LCE; RoPE disabled) with instance patching for norm layers, shared expert, and routed experts; registered as qwen3_5_moe and qwen3_5_moe_text in MODEL_TYPE_TO_APPLY_LIGER_FN
  • src/liger_kernel/transformers/__init__.py — Export apply_liger_kernel_to_qwen3_5_moe in TYPE_CHECKING, __getattr__, and __all__
  • test/utils.pyrevert_liger_kernel_to_qwen3_5_moe for test cleanup
  • test/convergence/fp32/test_mini_models.py — Availability check, imports, and MiniModelConfig entry for mini_qwen3_5_moe
  • test/transformers/test_monkey_patch.pyis_qwen3_5_moe_available helper and test_apply_liger_kernel_to_instance_for_qwen3_5_moe verifying all patches are applied correctly

Test plan

  • test_apply_liger_kernel_to_instance_for_qwen3_5_moe passes (monkey patch instance patching)
  • mini_qwen3_5_moe convergence test passes (fp32 mini model)
  • Existing Qwen3 Next and Qwen3 MoE tests still pass (no regressions)

@michaelroyzen
Copy link
Author

@shimizust @Tcc0403

@michaelroyzen
Copy link
Author

michaelroyzen commented Feb 26, 2026

Convergence test passes
Screenshot 2026-02-26 at 2 48 14 PM

Copy link
Collaborator

@Tcc0403 Tcc0403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mecoli1219 can you take a look?

@michaelroyzen
Copy link
Author

michaelroyzen commented Feb 27, 2026

Screenshot 2026-02-27 at 1 46 37 PM Screenshot 2026-02-27 at 1 47 22 PM

Confirming Qwen3-Next still passes

@michaelroyzen
Copy link
Author

Are we ready to merge @Tcc0403 @Mecoli1219?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants