Skip to content

feat(kllama-java): swap KLlamaJava facade to DSL path#123

Merged
michalharakal merged 1 commit into
developfrom
feat/kllamajava-dsl-swap
May 4, 2026
Merged

feat(kllama-java): swap KLlamaJava facade to DSL path#123
michalharakal merged 1 commit into
developfrom
feat/kllamajava-dsl-swap

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Phase 5b consumer migration. Mirrors PR #122 (kllama CLI) for the Java facade.

What changes

  • KLlamaJava.loadGGUF: DecoderGgufWeightLoader(NATIVE_OPTIMIZED, {"llama", "mistral"})DecoderGgufMemSegConverter.convertLlamaNetworkLoader.fromWeightsOptimizedLLMRuntime DIRECT mode. Same path the CLI uses.
  • KLlamaJava.loadSafeTensors: DecoderSafeTensorsLoader<FP32>(ctx, FP32::class, metadata, tiedEmbeddings).loadToMap { … }LlamaNetworkLoader.fromWeightsOptimizedLLMRuntime.
  • KLlamaSession.runtime type loosened from LlamaRuntime<FP32> (legacy concrete) to InferenceRuntime<FP32> (interface). The session's generate(...) methods only call reset() and the generateUntilStop extension — both available on the interface — so it's non-breaking for internal callers.
  • tokenizer.eosIdtokenizer.eosTokenId at both call sites (interface property; the deprecated GGUFTokenizer-specific alias goes away with the eventual GGUFTokenizer cleanup).

Scope notes

  • KLlamaJava now accepts {"llama", "mistral"} (was {"llama"} via LlamaIngestion's default — minor scope expansion to match common Llama-derivative usage). Qwen-family GGUFs are still rejected here; the kllama CLI is the entry point with Qwen dispatch.
  • LlamaIngestionBlocking.kt (Java-friendly suspend wrapper around LlamaIngestion) is unchanged. Still used by :llm-apps:skainet-cli — next migration target.

Test plan

  • :llm-runtime:kllama:jvmTest passes.
  • Compile clean.
  • CI green on PR.
  • Manual (post-merge): run a Java consumer (e.g. llm-test-java smoke if present) loading a real Llama GGUF; verify output coherence.

Numerical parity with the legacy path on identical weights is pinned by QwenDslLegacyParityTest (#120) — same LlamaNetworkLoader.fromWeights codepath the CLI swaps use.

🤖 Generated with Claude Code

Phase 5b consumer migration. Mirrors PR #122 (kllama CLI) for the
Java-facing facade.

- `KLlamaJava.loadGGUF` and `loadSafeTensors` both use
  `LlamaNetworkLoader.fromWeights` + `OptimizedLLMRuntime` DIRECT mode
  via `DecoderGgufWeightLoader` + `DecoderGgufMemSegConverter`
  (GGUF) or `DecoderSafeTensorsLoader` (SafeTensors).
- `KLlamaSession.runtime` type loosened from `LlamaRuntime<FP32>`
  (legacy concrete) to `InferenceRuntime<FP32>` (interface). The
  session's generate methods only used `reset()` and
  `generateUntilStop(...)` — both available on the interface — so
  this is non-breaking for `internal` callers and Java consumers
  (the field is `internal val`).
- KLlamaJava now declares its accepted GGUF architectures as
  `setOf("llama", "mistral")` (was `setOf("llama")` via
  `LlamaIngestion`'s default — minor scope expansion to match common
  Llama-derivative usage). Qwen-family GGUFs are still rejected here;
  the kllama CLI is the entry point with Qwen dispatch.

`tokenizer.eosId` (deprecated `GGUFTokenizer`-specific alias) replaced
with `tokenizer.eosTokenId` (interface property) at both call sites.

After this PR, `KLlamaJava` no longer references `LlamaIngestion`,
`LlamaLoadConfig`, `LlamaRuntime`, `MemSegWeightConverter`,
`CpuAttentionBackend`, `LlamaRuntimeWeights`, or
`loadLlamaRuntimeWeightsStreaming`. `LlamaIngestionBlocking.kt`
remains (still used by `:llm-apps:skainet-cli`) and is the next
migration target.

All `:llm-runtime:kllama` tests pass. Numerical equivalence with
the legacy path on identical weights remains pinned by
`QwenDslLegacyParityTest` (#120) — same `LlamaNetworkLoader.fromWeights`
codepath as the CLI swap (#121, #122).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant