You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Routed experts with identity experts |`longcat_flash`| Defuses routed experts into numbered `gate_proj`, `up_proj`, and `down_proj` modules and preserves zero or identity experts. |
77
-
| Fused dense `gate_up_proj` MLPs |`dia`, `glm`, `glm4`, `glm_image`, `glm_ocr`, `phi3`, `phi4_multimodal`, `zamba2`| Splits fused dense `gate_up_proj` layers into `gate_proj` + `up_proj` and updates the block `forward()` to preserve the original MLP math. |
70
+
| Mixed sparse and shared experts |`deepseek_v3`, deepseek_v4`, `glm_moe_dsa`, `qwen3_5_moe`, `qwen3_5_moe_text`| Runtime expert tensor defusion for routed experts while preserving the model's shared-expert path. |
71
+
| Transposed or packed expert tensors |`gpt_oss`, `phimoe`| Splits transposed fused expert `gate_up_proj` tensors into per-expert `gate_proj` + `up_proj`, preserves expert bias when present, and converts expert tensors into numbered expert `nn.Linear` modules. |
72
+
| Flattened expert layout |`dbrx`| Rebuilds the flattened DBRX expert FFN weights into numbered expert `gate_proj`, `up_proj`, and `down_proj``nn.Linear` modules. |
73
+
| Batched expert-input execution |`llama4`| Runtime expert tensor defusion plus preservation of the llama4 batched expert-input execution contract. |
74
+
| Non-gated expert MLPs |`nemotron_h`| Converts routed expert tensors into numbered `up_proj` and `down_proj``nn.Linear` modules for non-gated experts. |
| Routed experts with identity experts |`longcat_flash`| Defuses routed experts into numbered `gate_proj`, `up_proj`, and `down_proj` modules and preserves zero or identity experts. |
77
+
| Fused dense `gate_up_proj` MLPs |`dia`, `glm`, `glm4`, `glm_image`, `glm_ocr`, `phi3`, `phi4_multimodal`, `zamba2`| Splits fused dense `gate_up_proj` layers into `gate_proj` + `up_proj` and updates the block `forward()` to preserve the original MLP math. |
0 commit comments