You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Model] Qwen3.5 dense and MoE support (no vision) (ggml-org#19435)
* Unified delta net handling
* Remove old methods.
* Refactor and optimize
* Adapt autoregressive version from @ymcki
* Change to decay mask approach
* Fix bad permute
* Qwen 3.5 support
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Further fixes
* Use inheritance, remove unneeded conts
* Not like this!
* Remove ggml.h explicit import
* Remove transformers, fix the views
* ACTUALLY fix views, make super calls explicit in conversion.
* Fix conversion again
* Remove extra ggml.h imports
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
yield from super().modify_tensors(gate, mapped_gate, bid)
4125
+
yield from super().modify_tensors(up, mapped_up, bid)
4138
4126
return
4139
4127
4140
4128
if name.startswith("mlp") or name.startswith("vision_model") or name.startswith("model.vision_tower") or name.startswith("model.multi_modal_projector") or name.startswith("model.visual"):
@@ -4344,6 +4332,40 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter
4344
4332
yield from super().modify_tensors(data_torch, name, bid)
0 commit comments