Expand README with features section and API documentation#103
Merged
Conversation
The README was missing key features added since the chat-integration and JSON-bridge refactors: chat completion (chatComplete/generateChat), embeddings/reranking helpers, raw JSON endpoint handlers, model metadata, and server management. Adds short overview-level sections for each, drops the broken dist/ download link and the stale Gemma 3/4 banner.
New maintainer fork with breaking changes (groupId/package rename de.kherud → net.ladenthin, AutoCloseable LlamaIterator, canonical-format rerank scores, new LlamaOutput.stopReason field) warrants a major bump. SNAPSHOT marks the in-development line until the first 5.0.0 release.
bernardladenthin
pushed a commit
that referenced
this pull request
May 22, 2026
Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from github.com/kherud/java-llama.cpp and append a Verification plan section with: (a) a table of new info extracted from each issue body, (b) four concrete JUnit test sketches that would close out #80, #95, #98, #102, (c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the corresponding action (feature, docs, CI matrix), (d) a recommended PR sequencing. Notable finding: #98's original repro did not call enableEmbedding() at all — the binding never forwarded --embedding to the upstream server-context, so the result_output assertion fired because the embedding pipeline was never initialised. enableEmbedding() now exists in ModelParameters (line 1040), so the fix is essentially code-confirmed; an integration test against nomic-embed-text is optional confirmation.
6 tasks
bernardladenthin
added a commit
that referenced
this pull request
May 22, 2026
) * Enrich open-issues baseline with current-fork status Appends a Status in fork subsection to each of the 37 upstream issues with a verdict, file:line evidence, and next steps; adds a Status overview table summarising verdicts across all issues. * Add deep-dive analysis for likely/partially fixed issues Appends a per-issue Deep-dive analysis block to each of the 9 LIKELY FIXED / PARTIALLY FIXED entries, and adds a top-level Deep-dive verdict guide categorising which issues are confirmable from code inspection, which need one targeted JUnit test, and which genuinely require platform-specific runtime reproduction. Updates the Status overview table for #121 (FIXED for 64-bit Android) and #86 (CUDA jar requires libcudart at runtime, not auto-fallback). * Add verification plan with original-issue research and test sketches Fetched verbatim text of the LIKELY FIXED / PARTIALLY FIXED issues from github.com/kherud/java-llama.cpp and append a Verification plan section with: (a) a table of new info extracted from each issue body, (b) four concrete JUnit test sketches that would close out #80, #95, #98, #102, (c) a non-unit-testable bucket for #34, #50, #86, #103, #121 with the corresponding action (feature, docs, CI matrix), (d) a recommended PR sequencing. Notable finding: #98's original repro did not call enableEmbedding() at all — the binding never forwarded --embedding to the upstream server-context, so the result_output assertion fired because the embedding pipeline was never initialised. enableEmbedding() now exists in ModelParameters (line 1040), so the fix is essentially code-confirmed; an integration test against nomic-embed-text is optional confirmation. --------- Co-authored-by: Claude <noreply@anthropic.com>
bernardladenthin
pushed a commit
that referenced
this pull request
May 22, 2026
Updates docs/history/49be664_open_issues.md to reflect that the four JUnit regression tests called for in the verification plan have been added on this branch: - Deep-dive verdict guide now lists each test name and self-skip behaviour next to its issue bullet - Per-issue Status blocks for #80, #95, #98, #102 annotated as "LIKELY FIXED -> FIXED on CI green" with the covering test - Status overview table rows for the same four issues updated - "What the original issues actually contain" feasibility table marks all four as DONE with the commit reference - "Concrete test plan" gains a status callout noting the as-shipped implementation matches the sketches - "Recommended sequencing" step 1 marked DONE and enumerates what shipped; remaining steps (#86 docs, #103/#34 typed image API, Android emulator CI) carried forward as the next deliverables No code or behaviour change, documentation only. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW
bernardladenthin
added a commit
that referenced
this pull request
May 22, 2026
* test: add JUnit regressions for kherud open issues #80, #95, #98, #102 Adds four small JUnit tests proposed in the verification plan section of docs/history/49be664_open_issues.md to upgrade the corresponding upstream issues from LIKELY FIXED to FIXED: - MemoryManagementTest#testOpenCloseLoopDoesNotLeak (#102) - 20-iteration open/close loop; on Linux asserts VmRSS delta < 200 MB. Degenerates to a no-crash smoke test on non-Linux hosts where /proc/self/status is absent. - MemoryManagementTest#testOpenCloseWithoutGeneration (#80) - 20 open + immediate close without any generation, exercises the half-initialised worker race closed by the double server.terminate() in jllama.cpp. - LlamaModelTest#testIteratorTerminatesOnRepetitivePrompt (#95) - asserts the iterator terminates within nPredict+1 steps on a deliberately repetitive prompt. - LlamaEmbeddingsTest#testNomicEmbedLoads (#98) - gated on system property net.ladenthin.llama.nomic.path; reproduces the reporter's batch/ubatch config plus the fix (enableEmbedding()), and asserts a 768-dim vector for nomic-embed-text-v1.5. Wires up the optional nomic GGUF download in the linux-x86_64 Java test job in .github/workflows/publish.yml. Other test jobs cleanly self-skip via Assume because the system property is unset. Documents the local native-build workflow in CLAUDE.md - per-host output paths, mvn-cmake handoff, optional model handling, and the restricted-network caveat for environments that block huggingface.co. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW * docs: record #80/#95/#98/#102 regression tests added in 713d426 Updates docs/history/49be664_open_issues.md to reflect that the four JUnit regression tests called for in the verification plan have been added on this branch: - Deep-dive verdict guide now lists each test name and self-skip behaviour next to its issue bullet - Per-issue Status blocks for #80, #95, #98, #102 annotated as "LIKELY FIXED -> FIXED on CI green" with the covering test - Status overview table rows for the same four issues updated - "What the original issues actually contain" feasibility table marks all four as DONE with the commit reference - "Concrete test plan" gains a status callout noting the as-shipped implementation matches the sketches - "Recommended sequencing" step 1 marked DONE and enumerates what shipped; remaining steps (#86 docs, #103/#34 typed image API, Android emulator CI) carried forward as the next deliverables No code or behaviour change, documentation only. https://claude.ai/code/session_01LR7Gw1pyKS7wvxXfZjnxNW --------- Co-authored-by: Claude <noreply@anthropic.com>
bernardladenthin
added a commit
that referenced
this pull request
May 22, 2026
Adds a forward-looking section at the bottom of the README with three bullets pointing readers at the docs where the detail already lives: - The Kotlin Llama Stack client feature inventory (docs/feature-investigation-llama-stack-client-kotlin.md), so candidate features (multimodal image input, typed chat, async API, batch inference, typed usage/timings) are discoverable. - The goal of shipping a first-class Android-capable Maven artifact — tied to the existing opencl-android-aarch64 classifier — that would let downstream Android projects drop ogx-ai/llama-stack-client-kotlin. - The ongoing work of resolving all 37 upstream kherud/java-llama.cpp open issues (docs/history/49be664_open_issues.md), with explicit cross-references to #103 / #34 (VLM / multimodal image input, both PARTIALLY FIXED) — the same image-input work that closes §2.1 of the Kotlin inventory. Also adds a matching TOC entry. Co-authored-by: Claude <noreply@anthropic.com>
bernardladenthin
pushed a commit
that referenced
this pull request
May 23, 2026
Adds vision-capable model + matching mmproj + a CC0/PD test image to all four Java test jobs (Linux x86_64, macOS arm64 with/without Metal, Windows x86_64) and a model-gated MultimodalIntegrationTest that proves the typed ChatMessage(role, List<ContentPart>) surface from PR #189 round-trips through the upstream mtmd pipeline end-to-end. CI changes (.github/workflows/publish.yml) - New env vars: VISION_MODEL_URL / VISION_MODEL_NAME pointing at ggml-org/SmolVLM-500M-Instruct-Q8_0.gguf (smallest reliable vision GGUF on community ggml-org), VISION_MMPROJ_URL / _NAME for the matching mmproj, VISION_IMAGE_URL / _NAME for a small PD red-apple image from Wikimedia Commons. - Each of the four Java test jobs gains three download steps and three -D system properties on the mvn test invocation: -Dnet.ladenthin.llama.vision.model / .mmproj / .image. Validation scripts - validate-models.sh refactored into validate_gguf() + validate_image() helpers with a 'required' vs 'optional' mode. Required models still fail-fast; the new vision GGUFs and PD image are validated only when present so jobs that skip them keep passing. - validate-models.bat extended with a parallel OPTIONAL_MODELS loop. Test (src/test/java/.../MultimodalIntegrationTest.java) - Self-skips via Assume when any of the three -D paths is unset or its file is missing, so local mvn test stays green without the artifacts. - multimodalRequestProducesNonEmptyReply: builds a ChatMessage.userMultimodal with ContentPart.text(...) + ContentPart.imageFile(Paths.get(image)), calls chatCompleteText, asserts non-empty reply. Does NOT assert reply semantics — a 500M model can caption inaccurately and CI must not flap on model quality. - multimodalThenTextOnSameModel: sanity check that a multimodal call followed by a text-only call on the same model both succeed (catches any parts/legacy split poisoning the inference context). TestConstants gains PROP_VISION_MODEL_PATH / PROP_VISION_MMPROJ_PATH / PROP_VISION_IMAGE_PATH so the test reads the system properties via the same naming pattern as PROP_NOMIC_MODEL_PATH. Docs - docs/history/49be664_open_issues.md: #103 and #34 PARTIALLY FIXED -> FIXED in the per-issue blocks, the verdict guide, the status overview table, the deep-dive table, the cannot-be-closed-by-unit-tests-alone table, and the recommended-sequencing list. Bottom-line summary updated to reflect that 0 of the original LIKELY/PARTIALLY FIXED items remain partially fixed. - (docs/feature-investigation-llama-stack-client-kotlin.md §2.1 was already updated in the PR-189 typed-multimodal-surface commit.) Verified locally - mvn test-compile: clean. - mvn test -Dtest=MultimodalIntegrationTest: SKIPPED (no -D properties set; expected self-skip path). - mvn javadoc:jar: BUILD SUCCESS.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR significantly expands the README documentation to provide a more comprehensive overview of the library's capabilities and usage patterns. The changes reorganize the table of contents, add a dedicated Features section, and include new documentation for chat completion, embeddings/reranking, and raw JSON endpoints.
Key Changes
Notable Details
generateChat()) and blocking (chatComplete()) patternshttps://claude.ai/code/session_01Phsbbq9JdFU24F9PGwG1wf