Skip to content

feat(skainet-cli): swap LLaMA/Qwen branch to DSL path#125

Merged
michalharakal merged 1 commit into
developfrom
feat/skainet-cli-dsl-swap
May 4, 2026
Merged

feat(skainet-cli): swap LLaMA/Qwen branch to DSL path#125
michalharakal merged 1 commit into
developfrom
feat/skainet-cli-dsl-swap

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Phase 5b consumer migration. Mirrors #121, #122, #123. After this merge, no top-level CLI in this repo constructs LlamaRuntime for the GGUF path (only kllama-cli's BIN fallback still does).

What changes

skainet-cli previously routed Gemma + Apertus through DSL but kept LLaMA / Qwen / Mistral on the legacy LlamaRuntime + CpuAttentionBackend + LlamaWeightMapper + MemSegWeightConverter chain. This PR collapses the else branch:

  • DecoderGgufWeightLoader(NATIVE_OPTIMIZED, family.architectures + [arch])DecoderGgufMemSegConverter.convert → per-family network loader → OptimizedLLMRuntime DIRECT mode.
  • DSL-side family dispatch: ModelFamily.QWENQwenNetworkLoader.fromWeights (NEOX RoPE + QK-norm); else → LlamaNetworkLoader.fromWeights.
  • This CLI previously handled Qwen via the LlamaRuntime-with-detected-flags hybrid that the kllama CLI also used pre-feat(kllama-cli): swap Qwen branch to DSL path (Phase 4) #121 — same architectural collapse here.

Imports + deps cleaned

  • Removed: CpuAttentionBackend, LlamaRuntime, LlamaWeightMapper, MemSegWeightConverter.
  • Added: :llm-inference:qwen dep (was missing — the legacy hybrid-Qwen path didn't need an explicit dep on the Qwen module).

Test plan

  • :llm-apps:skainet-cli:build, :llm-runtime:kllama:jvmTest, :llm-inference:qwen:jvmTest, :llm-inference:llama:jvmTest — all pass.
  • CI green on PR.
  • Manual (post-merge): skainet-cli with a real Qwen3 / Llama / Mistral GGUF; verify coherent output.

Numerical equivalence with the legacy path on identical weights is pinned by QwenDslLegacyParityTest (#120).

🤖 Generated with Claude Code

Phase 5b consumer migration. Mirrors PR #122 (kllama CLI) and #123
(KLlamaJava facade). After this merge, no top-level CLI in this repo
constructs `LlamaRuntime` for the GGUF path.

`skainet-cli` previously routed Gemma + Apertus through DSL but kept
LLaMA / Qwen / Mistral on the legacy `LlamaRuntime` + `CpuAttentionBackend`
+ `LlamaWeightMapper` + `MemSegWeightConverter` chain. This PR collapses
the else branch onto the DSL path:

- `DecoderGgufWeightLoader(NATIVE_OPTIMIZED, family.architectures + [arch])`
  → `DecoderGgufMemSegConverter.convert` → per-family network loader
  → `OptimizedLLMRuntime` DIRECT mode.
- Family dispatch on the DSL side: `ModelFamily.QWEN` →
  `QwenNetworkLoader.fromWeights` (NEOX RoPE + QK-norm), else →
  `LlamaNetworkLoader.fromWeights`. Previously this CLI handled Qwen
  via the `LlamaRuntime`-with-detected-flags hybrid that the kllama
  CLI also used pre-#121 — same architectural collapse here.

Imports cleaned: removed `CpuAttentionBackend`, `LlamaRuntime`,
`LlamaWeightMapper`, `MemSegWeightConverter`. Added `:llm-inference:qwen`
to the build.gradle dependencies (was missing — only the legacy
hybrid-Qwen path didn't need it).

Numerical equivalence with the legacy path on identical weights is
pinned by `QwenDslLegacyParityTest` (#120).

Tests pass: `:llm-apps:skainet-cli:build`, `:llm-runtime:kllama:jvmTest`,
`:llm-inference:qwen:jvmTest`, `:llm-inference:llama:jvmTest`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant