Skip to content

Expand README with features section and API documentation#103

Merged
bernardladenthin merged 2 commits into
masterfrom
claude/review-readme-docs-biT1j
May 7, 2026
Merged

Expand README with features section and API documentation#103
bernardladenthin merged 2 commits into
masterfrom
claude/review-readme-docs-biT1j

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

Summary

This PR significantly expands the README documentation to provide a more comprehensive overview of the library's capabilities and usage patterns. The changes reorganize the table of contents, add a dedicated Features section, and include new documentation for chat completion, embeddings/reranking, and raw JSON endpoints.

Key Changes

  • Added Features section - Highlights key capabilities including text completion, chat completion, embeddings, reranking, infilling, tokenization, grammar conversion, raw JSON endpoints, model metadata access, and platform/acceleration support
  • Reorganized table of contents - Restructured to include the new Features section as the first item, with updated numbering for all subsequent sections
  • Expanded Documentation - Added three new subsections under Documentation:
    • Chat Completion with code examples showing streaming and blocking modes
    • Embeddings & Reranking with usage examples
    • Raw JSON Endpoints describing the available handler methods and server state management
  • Removed outdated content - Removed the "Download" section with the JAR badge and the Gemma 3/4 support note
  • Enhanced Infilling section - Kept existing content but positioned it after the new Chat Completion section

Notable Details

  • Chat Completion examples demonstrate both streaming (generateChat()) and blocking (chatComplete()) patterns
  • Embeddings section shows how to enable embedding mode and retrieve sentence embeddings
  • Raw JSON Endpoints section documents the full set of handler methods and server management APIs
  • All code examples follow the existing documentation style and patterns

https://claude.ai/code/session_01Phsbbq9JdFU24F9PGwG1wf

claude added 2 commits May 6, 2026 07:58
The README was missing key features added since the chat-integration and
JSON-bridge refactors: chat completion (chatComplete/generateChat),
embeddings/reranking helpers, raw JSON endpoint handlers, model metadata,
and server management. Adds short overview-level sections for each, drops
the broken dist/ download link and the stale Gemma 3/4 banner.
New maintainer fork with breaking changes (groupId/package rename
de.kherud → net.ladenthin, AutoCloseable LlamaIterator, canonical-format
rerank scores, new LlamaOutput.stopReason field) warrants a major bump.
SNAPSHOT marks the in-development line until the first 5.0.0 release.
@bernardladenthin bernardladenthin merged commit 504f9e7 into master May 7, 2026
16 checks passed
@bernardladenthin bernardladenthin deleted the claude/review-readme-docs-biT1j branch May 7, 2026 06:34
bernardladenthin pushed a commit that referenced this pull request May 22, 2026
Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from
github.com/kherud/java-llama.cpp and append a Verification plan section
with: (a) a table of new info extracted from each issue body, (b) four
concrete JUnit test sketches that would close out #80, #95, #98, #102,
(c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the
corresponding action (feature, docs, CI matrix), (d) a recommended PR
sequencing.

Notable finding: #98's original repro did not call enableEmbedding()
at all — the binding never forwarded --embedding to the upstream
server-context, so the result_output assertion fired because the
embedding pipeline was never initialised. enableEmbedding() now
exists in ModelParameters (line 1040), so the fix is essentially
code-confirmed; an integration test against nomic-embed-text is
optional confirmation.
bernardladenthin added a commit that referenced this pull request May 22, 2026
)

* Enrich open-issues baseline with current-fork status

Appends a Status in fork subsection to each of the 37 upstream issues with
a verdict, file:line evidence, and next steps; adds a Status overview
table summarising verdicts across all issues.

* Add deep-dive analysis for likely/partially fixed issues

Appends a per-issue Deep-dive analysis block to each of the 9
LIKELY FIXED / PARTIALLY FIXED entries, and adds a top-level Deep-dive
verdict guide categorising which issues are confirmable from code
inspection, which need one targeted JUnit test, and which genuinely
require platform-specific runtime reproduction.

Updates the Status overview table for #121 (FIXED for 64-bit Android)
and #86 (CUDA jar requires libcudart at runtime, not auto-fallback).

* Add verification plan with original-issue research and test sketches

Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from
github.com/kherud/java-llama.cpp and append a Verification plan section
with: (a) a table of new info extracted from each issue body, (b) four
concrete JUnit test sketches that would close out #80, #95, #98, #102,
(c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the
corresponding action (feature, docs, CI matrix), (d) a recommended PR
sequencing.

Notable finding: #98's original repro did not call enableEmbedding()
at all — the binding never forwarded --embedding to the upstream
server-context, so the result_output assertion fired because the
embedding pipeline was never initialised. enableEmbedding() now
exists in ModelParameters (line 1040), so the fix is essentially
code-confirmed; an integration test against nomic-embed-text is
optional confirmation.

---------

Co-authored-by: Claude <noreply@anthropic.com>
bernardladenthin pushed a commit that referenced this pull request May 22, 2026
Updates docs/history/49be664_open_issues.md to reflect that the four
JUnit regression tests called for in the verification plan have been
added on this branch:

- Deep-dive verdict guide now lists each test name and self-skip
  behaviour next to its issue bullet
- Per-issue Status blocks for #80, #95, #98, #102 annotated as
  "LIKELY FIXED -> FIXED on CI green" with the covering test
- Status overview table rows for the same four issues updated
- "What the original issues actually contain" feasibility table marks
  all four as DONE with the commit reference
- "Concrete test plan" gains a status callout noting the as-shipped
  implementation matches the sketches
- "Recommended sequencing" step 1 marked DONE and enumerates what
  shipped; remaining steps (#86 docs, #103/#34 typed image API, Android
  emulator CI) carried forward as the next deliverables

No code or behaviour change, documentation only.

https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW
bernardladenthin added a commit that referenced this pull request May 22, 2026
* test: add JUnit regressions for kherud open issues #80, #95, #98, #102

Adds four small JUnit tests proposed in the verification plan section of
docs/history/49be664_open_issues.md to upgrade the corresponding upstream
issues from LIKELY FIXED to FIXED:

- MemoryManagementTest#testOpenCloseLoopDoesNotLeak (#102) - 20-iteration
  open/close loop; on Linux asserts VmRSS delta < 200 MB. Degenerates to
  a no-crash smoke test on non-Linux hosts where /proc/self/status is
  absent.
- MemoryManagementTest#testOpenCloseWithoutGeneration (#80) - 20 open +
  immediate close without any generation, exercises the half-initialised
  worker race closed by the double server.terminate() in jllama.cpp.
- LlamaModelTest#testIteratorTerminatesOnRepetitivePrompt (#95) - asserts
  the iterator terminates within nPredict+1 steps on a deliberately
  repetitive prompt.
- LlamaEmbeddingsTest#testNomicEmbedLoads (#98) - gated on system
  property net.ladenthin.llama.nomic.path; reproduces the reporter's
  batch/ubatch config plus the fix (enableEmbedding()), and asserts a
  768-dim vector for nomic-embed-text-v1.5.

Wires up the optional nomic GGUF download in the linux-x86_64 Java test
job in .github/workflows/publish.yml. Other test jobs cleanly self-skip
via Assume because the system property is unset.

Documents the local native-build workflow in CLAUDE.md - per-host output
paths, mvn-cmake handoff, optional model handling, and the
restricted-network caveat for environments that block huggingface.co.

https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW

* docs: record #80/#95/#98/#102 regression tests added in 713d426

Updates docs/history/49be664_open_issues.md to reflect that the four
JUnit regression tests called for in the verification plan have been
added on this branch:

- Deep-dive verdict guide now lists each test name and self-skip
  behaviour next to its issue bullet
- Per-issue Status blocks for #80, #95, #98, #102 annotated as
  "LIKELY FIXED -> FIXED on CI green" with the covering test
- Status overview table rows for the same four issues updated
- "What the original issues actually contain" feasibility table marks
  all four as DONE with the commit reference
- "Concrete test plan" gains a status callout noting the as-shipped
  implementation matches the sketches
- "Recommended sequencing" step 1 marked DONE and enumerates what
  shipped; remaining steps (#86 docs, #103/#34 typed image API, Android
  emulator CI) carried forward as the next deliverables

No code or behaviour change, documentation only.

https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW

---------

Co-authored-by: Claude <noreply@anthropic.com>
bernardladenthin added a commit that referenced this pull request May 22, 2026
Adds a forward-looking section at the bottom of the README with three
bullets pointing readers at the docs where the detail already lives:

- The Kotlin Llama Stack client feature inventory
  (docs/feature-investigation-llama-stack-client-kotlin.md), so
  candidate features (multimodal image input, typed chat, async API,
  batch inference, typed usage/timings) are discoverable.
- The goal of shipping a first-class Android-capable Maven artifact —
  tied to the existing opencl-android-aarch64 classifier — that would
  let downstream Android projects drop ogx-ai/llama-stack-client-kotlin.
- The ongoing work of resolving all 37 upstream kherud/java-llama.cpp
  open issues (docs/history/49be664_open_issues.md), with explicit
  cross-references to #103 / #34 (VLM / multimodal image input, both
  PARTIALLY FIXED) — the same image-input work that closes §2.1 of the
  Kotlin inventory.

Also adds a matching TOC entry.

Co-authored-by: Claude <noreply@anthropic.com>
bernardladenthin pushed a commit that referenced this pull request May 23, 2026
Adds vision-capable model + matching mmproj + a CC0/PD test image to all
four Java test jobs (Linux x86_64, macOS arm64 with/without Metal,
Windows x86_64) and a model-gated MultimodalIntegrationTest that proves
the typed ChatMessage(role, List<ContentPart>) surface from PR #189
round-trips through the upstream mtmd pipeline end-to-end.

CI changes (.github/workflows/publish.yml)
- New env vars: VISION_MODEL_URL / VISION_MODEL_NAME pointing at
  ggml-org/SmolVLM-500M-Instruct-Q8_0.gguf (smallest reliable vision
  GGUF on community ggml-org), VISION_MMPROJ_URL / _NAME for the
  matching mmproj, VISION_IMAGE_URL / _NAME for a small PD red-apple
  image from Wikimedia Commons.
- Each of the four Java test jobs gains three download steps and three
  -D system properties on the mvn test invocation:
  -Dnet.ladenthin.llama.vision.model / .mmproj / .image.

Validation scripts
- validate-models.sh refactored into validate_gguf() + validate_image()
  helpers with a 'required' vs 'optional' mode. Required models still
  fail-fast; the new vision GGUFs and PD image are validated only when
  present so jobs that skip them keep passing.
- validate-models.bat extended with a parallel OPTIONAL_MODELS loop.

Test (src/test/java/.../MultimodalIntegrationTest.java)
- Self-skips via Assume when any of the three -D paths is unset or its
  file is missing, so local mvn test stays green without the artifacts.
- multimodalRequestProducesNonEmptyReply: builds a ChatMessage.userMultimodal
  with ContentPart.text(...) + ContentPart.imageFile(Paths.get(image)),
  calls chatCompleteText, asserts non-empty reply. Does NOT assert reply
  semantics &#x2014; a 500M model can caption inaccurately and CI must
  not flap on model quality.
- multimodalThenTextOnSameModel: sanity check that a multimodal call
  followed by a text-only call on the same model both succeed (catches
  any parts/legacy split poisoning the inference context).

TestConstants gains PROP_VISION_MODEL_PATH / PROP_VISION_MMPROJ_PATH /
PROP_VISION_IMAGE_PATH so the test reads the system properties via the
same naming pattern as PROP_NOMIC_MODEL_PATH.

Docs
- docs/history/49be664_open_issues.md: #103 and #34 PARTIALLY FIXED ->
  FIXED in the per-issue blocks, the verdict guide, the status overview
  table, the deep-dive table, the cannot-be-closed-by-unit-tests-alone
  table, and the recommended-sequencing list. Bottom-line summary
  updated to reflect that 0 of the original LIKELY/PARTIALLY FIXED items
  remain partially fixed.
- (docs/feature-investigation-llama-stack-client-kotlin.md §2.1 was
  already updated in the PR-189 typed-multimodal-surface commit.)

Verified locally
- mvn test-compile: clean.
- mvn test -Dtest=MultimodalIntegrationTest: SKIPPED (no -D properties
  set; expected self-skip path).
- mvn javadoc:jar: BUILD SUCCESS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants