feat(skainet-cli): swap LLaMA/Qwen branch to DSL path#125
Merged
Conversation
Phase 5b consumer migration. Mirrors PR #122 (kllama CLI) and #123 (KLlamaJava facade). After this merge, no top-level CLI in this repo constructs `LlamaRuntime` for the GGUF path. `skainet-cli` previously routed Gemma + Apertus through DSL but kept LLaMA / Qwen / Mistral on the legacy `LlamaRuntime` + `CpuAttentionBackend` + `LlamaWeightMapper` + `MemSegWeightConverter` chain. This PR collapses the else branch onto the DSL path: - `DecoderGgufWeightLoader(NATIVE_OPTIMIZED, family.architectures + [arch])` → `DecoderGgufMemSegConverter.convert` → per-family network loader → `OptimizedLLMRuntime` DIRECT mode. - Family dispatch on the DSL side: `ModelFamily.QWEN` → `QwenNetworkLoader.fromWeights` (NEOX RoPE + QK-norm), else → `LlamaNetworkLoader.fromWeights`. Previously this CLI handled Qwen via the `LlamaRuntime`-with-detected-flags hybrid that the kllama CLI also used pre-#121 — same architectural collapse here. Imports cleaned: removed `CpuAttentionBackend`, `LlamaRuntime`, `LlamaWeightMapper`, `MemSegWeightConverter`. Added `:llm-inference:qwen` to the build.gradle dependencies (was missing — only the legacy hybrid-Qwen path didn't need it). Numerical equivalence with the legacy path on identical weights is pinned by `QwenDslLegacyParityTest` (#120). Tests pass: `:llm-apps:skainet-cli:build`, `:llm-runtime:kllama:jvmTest`, `:llm-inference:qwen:jvmTest`, `:llm-inference:llama:jvmTest`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 5b consumer migration. Mirrors #121, #122, #123. After this merge, no top-level CLI in this repo constructs
LlamaRuntimefor the GGUF path (only kllama-cli's BIN fallback still does).What changes
skainet-clipreviously routed Gemma + Apertus through DSL but kept LLaMA / Qwen / Mistral on the legacyLlamaRuntime+CpuAttentionBackend+LlamaWeightMapper+MemSegWeightConverterchain. This PR collapses the else branch:DecoderGgufWeightLoader(NATIVE_OPTIMIZED, family.architectures + [arch])→DecoderGgufMemSegConverter.convert→ per-family network loader →OptimizedLLMRuntimeDIRECT mode.ModelFamily.QWEN→QwenNetworkLoader.fromWeights(NEOX RoPE + QK-norm); else →LlamaNetworkLoader.fromWeights.LlamaRuntime-with-detected-flags hybrid that the kllama CLI also used pre-feat(kllama-cli): swap Qwen branch to DSL path (Phase 4) #121 — same architectural collapse here.Imports + deps cleaned
CpuAttentionBackend,LlamaRuntime,LlamaWeightMapper,MemSegWeightConverter.:llm-inference:qwendep (was missing — the legacy hybrid-Qwen path didn't need an explicit dep on the Qwen module).Test plan
:llm-apps:skainet-cli:build,:llm-runtime:kllama:jvmTest,:llm-inference:qwen:jvmTest,:llm-inference:llama:jvmTest— all pass.skainet-cliwith a real Qwen3 / Llama / Mistral GGUF; verify coherent output.Numerical equivalence with the legacy path on identical weights is pinned by
QwenDslLegacyParityTest(#120).🤖 Generated with Claude Code