You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(skainet-cli): swap LLaMA/Qwen branch to DSL path
Phase 5b consumer migration. Mirrors PR #122 (kllama CLI) and #123
(KLlamaJava facade). After this merge, no top-level CLI in this repo
constructs `LlamaRuntime` for the GGUF path.
`skainet-cli` previously routed Gemma + Apertus through DSL but kept
LLaMA / Qwen / Mistral on the legacy `LlamaRuntime` + `CpuAttentionBackend`
+ `LlamaWeightMapper` + `MemSegWeightConverter` chain. This PR collapses
the else branch onto the DSL path:
- `DecoderGgufWeightLoader(NATIVE_OPTIMIZED, family.architectures + [arch])`
→ `DecoderGgufMemSegConverter.convert` → per-family network loader
→ `OptimizedLLMRuntime` DIRECT mode.
- Family dispatch on the DSL side: `ModelFamily.QWEN` →
`QwenNetworkLoader.fromWeights` (NEOX RoPE + QK-norm), else →
`LlamaNetworkLoader.fromWeights`. Previously this CLI handled Qwen
via the `LlamaRuntime`-with-detected-flags hybrid that the kllama
CLI also used pre-#121 — same architectural collapse here.
Imports cleaned: removed `CpuAttentionBackend`, `LlamaRuntime`,
`LlamaWeightMapper`, `MemSegWeightConverter`. Added `:llm-inference:qwen`
to the build.gradle dependencies (was missing — only the legacy
hybrid-Qwen path didn't need it).
Numerical equivalence with the legacy path on identical weights is
pinned by `QwenDslLegacyParityTest` (#120).
Tests pass: `:llm-apps:skainet-cli:build`, `:llm-runtime:kllama:jvmTest`,
`:llm-inference:qwen:jvmTest`, `:llm-inference:llama:jvmTest`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments