Expand README with features section and API documentation by bernardladenthin · Pull Request #103 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-05-06T08:03:10Z

Summary

This PR significantly expands the README documentation to provide a more comprehensive overview of the library's capabilities and usage patterns. The changes reorganize the table of contents, add a dedicated Features section, and include new documentation for chat completion, embeddings/reranking, and raw JSON endpoints.

Key Changes

Added Features section - Highlights key capabilities including text completion, chat completion, embeddings, reranking, infilling, tokenization, grammar conversion, raw JSON endpoints, model metadata access, and platform/acceleration support
Reorganized table of contents - Restructured to include the new Features section as the first item, with updated numbering for all subsequent sections
Expanded Documentation - Added three new subsections under Documentation:
- Chat Completion with code examples showing streaming and blocking modes
- Embeddings & Reranking with usage examples
- Raw JSON Endpoints describing the available handler methods and server state management
Removed outdated content - Removed the "Download" section with the JAR badge and the Gemma 3/4 support note
Enhanced Infilling section - Kept existing content but positioned it after the new Chat Completion section

Notable Details

Chat Completion examples demonstrate both streaming (generateChat()) and blocking (chatComplete()) patterns
Embeddings section shows how to enable embedding mode and retrieve sentence embeddings
Raw JSON Endpoints section documents the full set of handler methods and server management APIs
All code examples follow the existing documentation style and patterns

https://claude.ai/code/session_01Phsbbq9JdFU24F9PGwG1wf

The README was missing key features added since the chat-integration and JSON-bridge refactors: chat completion (chatComplete/generateChat), embeddings/reranking helpers, raw JSON endpoint handlers, model metadata, and server management. Adds short overview-level sections for each, drops the broken dist/ download link and the stale Gemma 3/4 banner.

New maintainer fork with breaking changes (groupId/package rename de.kherud → net.ladenthin, AutoCloseable LlamaIterator, canonical-format rerank scores, new LlamaOutput.stopReason field) warrants a major bump. SNAPSHOT marks the in-development line until the first 5.0.0 release.

Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from github.com/kherud/java-llama.cpp and append a Verification plan section with: (a) a table of new info extracted from each issue body, (b) four concrete JUnit test sketches that would close out #80, #95, #98, #102, (c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the corresponding action (feature, docs, CI matrix), (d) a recommended PR sequencing. Notable finding: #98's original repro did not call enableEmbedding() at all — the binding never forwarded --embedding to the upstream server-context, so the result_output assertion fired because the embedding pipeline was never initialised. enableEmbedding() now exists in ModelParameters (line 1040), so the fix is essentially code-confirmed; an integration test against nomic-embed-text is optional confirmation.

) * Enrich open-issues baseline with current-fork status Appends a Status in fork subsection to each of the 37 upstream issues with a verdict, file:line evidence, and next steps; adds a Status overview table summarising verdicts across all issues. * Add deep-dive analysis for likely/partially fixed issues Appends a per-issue Deep-dive analysis block to each of the 9 LIKELY FIXED / PARTIALLY FIXED entries, and adds a top-level Deep-dive verdict guide categorising which issues are confirmable from code inspection, which need one targeted JUnit test, and which genuinely require platform-specific runtime reproduction. Updates the Status overview table for #121 (FIXED for 64-bit Android) and #86 (CUDA jar requires libcudart at runtime, not auto-fallback). * Add verification plan with original-issue research and test sketches Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from github.com/kherud/java-llama.cpp and append a Verification plan section with: (a) a table of new info extracted from each issue body, (b) four concrete JUnit test sketches that would close out #80, #95, #98, #102, (c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the corresponding action (feature, docs, CI matrix), (d) a recommended PR sequencing. Notable finding: #98's original repro did not call enableEmbedding() at all — the binding never forwarded --embedding to the upstream server-context, so the result_output assertion fired because the embedding pipeline was never initialised. enableEmbedding() now exists in ModelParameters (line 1040), so the fix is essentially code-confirmed; an integration test against nomic-embed-text is optional confirmation. --------- Co-authored-by: Claude <noreply@anthropic.com>

Updates docs/history/49be664_open_issues.md to reflect that the four JUnit regression tests called for in the verification plan have been added on this branch: - Deep-dive verdict guide now lists each test name and self-skip behaviour next to its issue bullet - Per-issue Status blocks for #80, #95, #98, #102 annotated as "LIKELY FIXED -> FIXED on CI green" with the covering test - Status overview table rows for the same four issues updated - "What the original issues actually contain" feasibility table marks all four as DONE with the commit reference - "Concrete test plan" gains a status callout noting the as-shipped implementation matches the sketches - "Recommended sequencing" step 1 marked DONE and enumerates what shipped; remaining steps (#86 docs, #103/#34 typed image API, Android emulator CI) carried forward as the next deliverables No code or behaviour change, documentation only. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW

* test: add JUnit regressions for kherud open issues #80, #95, #98, #102 Adds four small JUnit tests proposed in the verification plan section of docs/history/49be664_open_issues.md to upgrade the corresponding upstream issues from LIKELY FIXED to FIXED: - MemoryManagementTest#testOpenCloseLoopDoesNotLeak (#102) - 20-iteration open/close loop; on Linux asserts VmRSS delta < 200 MB. Degenerates to a no-crash smoke test on non-Linux hosts where /proc/self/status is absent. - MemoryManagementTest#testOpenCloseWithoutGeneration (#80) - 20 open + immediate close without any generation, exercises the half-initialised worker race closed by the double server.terminate() in jllama.cpp. - LlamaModelTest#testIteratorTerminatesOnRepetitivePrompt (#95) - asserts the iterator terminates within nPredict+1 steps on a deliberately repetitive prompt. - LlamaEmbeddingsTest#testNomicEmbedLoads (#98) - gated on system property net.ladenthin.llama.nomic.path; reproduces the reporter's batch/ubatch config plus the fix (enableEmbedding()), and asserts a 768-dim vector for nomic-embed-text-v1.5. Wires up the optional nomic GGUF download in the linux-x86_64 Java test job in .github/workflows/publish.yml. Other test jobs cleanly self-skip via Assume because the system property is unset. Documents the local native-build workflow in CLAUDE.md - per-host output paths, mvn-cmake handoff, optional model handling, and the restricted-network caveat for environments that block huggingface.co. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW * docs: record #80/#95/#98/#102 regression tests added in 713d426 Updates docs/history/49be664_open_issues.md to reflect that the four JUnit regression tests called for in the verification plan have been added on this branch: - Deep-dive verdict guide now lists each test name and self-skip behaviour next to its issue bullet - Per-issue Status blocks for #80, #95, #98, #102 annotated as "LIKELY FIXED -> FIXED on CI green" with the covering test - Status overview table rows for the same four issues updated - "What the original issues actually contain" feasibility table marks all four as DONE with the commit reference - "Concrete test plan" gains a status callout noting the as-shipped implementation matches the sketches - "Recommended sequencing" step 1 marked DONE and enumerates what shipped; remaining steps (#86 docs, #103/#34 typed image API, Android emulator CI) carried forward as the next deliverables No code or behaviour change, documentation only. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW --------- Co-authored-by: Claude <noreply@anthropic.com>

Adds a forward-looking section at the bottom of the README with three bullets pointing readers at the docs where the detail already lives: - The Kotlin Llama Stack client feature inventory (docs/feature-investigation-llama-stack-client-kotlin.md), so candidate features (multimodal image input, typed chat, async API, batch inference, typed usage/timings) are discoverable. - The goal of shipping a first-class Android-capable Maven artifact — tied to the existing opencl-android-aarch64 classifier — that would let downstream Android projects drop ogx-ai/llama-stack-client-kotlin. - The ongoing work of resolving all 37 upstream kherud/java-llama.cpp open issues (docs/history/49be664_open_issues.md), with explicit cross-references to #103 / #34 (VLM / multimodal image input, both PARTIALLY FIXED) — the same image-input work that closes §2.1 of the Kotlin inventory. Also adds a matching TOC entry. Co-authored-by: Claude <noreply@anthropic.com>

Adds vision-capable model + matching mmproj + a CC0/PD test image to all four Java test jobs (Linux x86_64, macOS arm64 with/without Metal, Windows x86_64) and a model-gated MultimodalIntegrationTest that proves the typed ChatMessage(role, List<ContentPart>) surface from PR #189 round-trips through the upstream mtmd pipeline end-to-end. CI changes (.github/workflows/publish.yml) - New env vars: VISION_MODEL_URL / VISION_MODEL_NAME pointing at ggml-org/SmolVLM-500M-Instruct-Q8_0.gguf (smallest reliable vision GGUF on community ggml-org), VISION_MMPROJ_URL / _NAME for the matching mmproj, VISION_IMAGE_URL / _NAME for a small PD red-apple image from Wikimedia Commons. - Each of the four Java test jobs gains three download steps and three -D system properties on the mvn test invocation: -Dnet.ladenthin.llama.vision.model / .mmproj / .image. Validation scripts - validate-models.sh refactored into validate_gguf() + validate_image() helpers with a 'required' vs 'optional' mode. Required models still fail-fast; the new vision GGUFs and PD image are validated only when present so jobs that skip them keep passing. - validate-models.bat extended with a parallel OPTIONAL_MODELS loop. Test (src/test/java/.../MultimodalIntegrationTest.java) - Self-skips via Assume when any of the three -D paths is unset or its file is missing, so local mvn test stays green without the artifacts. - multimodalRequestProducesNonEmptyReply: builds a ChatMessage.userMultimodal with ContentPart.text(...) + ContentPart.imageFile(Paths.get(image)), calls chatCompleteText, asserts non-empty reply. Does NOT assert reply semantics — a 500M model can caption inaccurately and CI must not flap on model quality. - multimodalThenTextOnSameModel: sanity check that a multimodal call followed by a text-only call on the same model both succeed (catches any parts/legacy split poisoning the inference context). TestConstants gains PROP_VISION_MODEL_PATH / PROP_VISION_MMPROJ_PATH / PROP_VISION_IMAGE_PATH so the test reads the system properties via the same naming pattern as PROP_NOMIC_MODEL_PATH. Docs - docs/history/49be664_open_issues.md: #103 and #34 PARTIALLY FIXED -> FIXED in the per-issue blocks, the verdict guide, the status overview table, the deep-dive table, the cannot-be-closed-by-unit-tests-alone table, and the recommended-sequencing list. Bottom-line summary updated to reflect that 0 of the original LIKELY/PARTIALLY FIXED items remain partially fixed. - (docs/feature-investigation-llama-stack-client-kotlin.md §2.1 was already updated in the PR-189 typed-multimodal-surface commit.) Verified locally - mvn test-compile: clean. - mvn test -Dtest=MultimodalIntegrationTest: SKIPPED (no -D properties set; expected self-skip path). - mvn javadoc:jar: BUILD SUCCESS.

claude added 2 commits May 6, 2026 07:58

bernardladenthin merged commit 504f9e7 into master May 7, 2026
16 checks passed

bernardladenthin deleted the claude/review-readme-docs-biT1j branch May 7, 2026 06:34

bernardladenthin mentioned this pull request May 22, 2026

docs: add deep-dive analysis and verification plan for open issues #184

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand README with features section and API documentation#103

Expand README with features section and API documentation#103
bernardladenthin merged 2 commits into
masterfrom
claude/review-readme-docs-biT1j

bernardladenthin commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented May 6, 2026

Summary

Key Changes

Notable Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants