Open
Conversation
Author
Author
Tcc0403
reviewed
Feb 27, 2026
Collaborator
Tcc0403
left a comment
There was a problem hiding this comment.
@Mecoli1219 can you take a look?
Author
… bf16 test matching Qwen3 MoE tolerances
Author
|
Are we ready to merge @Tcc0403 @Mecoli1219? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Add Qwen3.5 MoE support to Liger Kernel
Summary
qwen3_5_moe/qwen3_5_moe_text), targeting Transformers v5+LigerRMSNormForQwen3Next), fused SwiGLU experts (LigerExperts), and fused linear cross-entropy lossChanges
New file:
src/liger_kernel/transformers/model/qwen3_5_moe.py—lce_forwardforQwen3_5MoeForCausalLM, based on the Qwen3 Next version with theload_balancing_loss_funcimport updated to point to Qwen3.5 MoE's local definitionModified files:
src/liger_kernel/transformers/monkey_patch.py—apply_liger_kernel_to_qwen3_5_moefunction (RMSNorm, SwiGLU experts, fused LCE; RoPE disabled) with instance patching for norm layers, shared expert, and routed experts; registered asqwen3_5_moeandqwen3_5_moe_textinMODEL_TYPE_TO_APPLY_LIGER_FNsrc/liger_kernel/transformers/__init__.py— Exportapply_liger_kernel_to_qwen3_5_moeinTYPE_CHECKING,__getattr__, and__all__test/utils.py—revert_liger_kernel_to_qwen3_5_moefor test cleanuptest/convergence/fp32/test_mini_models.py— Availability check, imports, andMiniModelConfigentry formini_qwen3_5_moetest/transformers/test_monkey_patch.py—is_qwen3_5_moe_availablehelper andtest_apply_liger_kernel_to_instance_for_qwen3_5_moeverifying all patches are applied correctlyTest plan
test_apply_liger_kernel_to_instance_for_qwen3_5_moepasses (monkey patch instance patching)mini_qwen3_5_moeconvergence test passes (fp32 mini model)