models : deduplicate delta-net graphs for Qwen family#19597
models : deduplicate delta-net graphs for Qwen family#19597
Conversation
|
@ggerganov out of curiosity - what's the reason for moving the identity / mask constructions from once-per-graph to once-per-layer? |
Mainly wanted to get rid of the Btw, creating a mask and then multiplying does not have advantage compare to just masking the data (with |
65da8aa to
4521751
Compare
aa91134 to
fa44de7
Compare
|
Is this the new home for the unified delta_net effort? My job is to modify build_delta_net_chunking to use "if (is_kda) {} else {}" for kda and gdn? |
|
Yes, we'll merge this PR soon and then the new |
2371dfb to
c70946a
Compare
|
@ggreganov, I am done with adding kimi linear to delta-net-base.cpp. https://github.com/ymcki/llama.cpp/tree/dn The original delta net path is the same except that I added o = ggml_cont(ctx0, o); after ggml_permute at the end. This is to avoid the assertion error: Do I open another PR? Or you will do that yourself? |
|
Please open a PR. I will attempt to simplify the implementation - hoping we can unify this and avoid branching on the |
* upstream/master: (88 commits) ci : bump komac version (ggml-org#19682) build : link ws2_32 as PUBLIC on Windows (ggml-org#19666) build : cleanup library linking logic (ggml-org#19665) convert : add JoyAI-LLM-Flash (ggml-org#19651) perplexity: add proper batching (ggml-org#19661) common : inline functions (ggml-org#18639) ggml : make `ggml_is_view` as API (ggml-org#19539) model: Add support for Tiny Aya Models (ggml-org#19611) build : rework llama_option_depr to handle LLAMA_CURL (ggml-org#19658) Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions (ggml-org#19591) models : deduplicate delta-net graphs for Qwen family (ggml-org#19597) graph : fix KQ mask, lora, cvec reuse checks (ggml-org#19644) ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (ggml-org#19132) sync : ggml ggml : bump version to 0.9.7 (ggml/1425) ggml : bump version to 0.9.6 (ggml/1423) cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (ggml-org#19624) docs: update s390x build docs (ggml-org#19643) build : remove LLAMA_HTTPLIB option (ggml-org#19623) cmake : check if KleidiAI API has been fetched (ggml-org#19640) ...
* models : add llm_build_delta_net_base * cont : keep qwen35 and qwen35moe graphs intact * cont : add comments
cont #19375
llm_build_delta_net_basefor common delta net builds. Currently used only byqwen3nextllm_graph_context_mamba->llm_build_mamba_baseNext PRs:
llm_build_delta_net_baseforqwen35andqwen35moe(after release)llm_build_delta_net_baseforkimi_linear(cc @ymcki)