Upgrade llama.cpp from b9284 to b9297 by bernardladenthin · Pull Request #191 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-05-24T10:41:52Z

Summary

Upgrades the pinned llama.cpp version from b9284 to b9297
Updates CMakeLists.txt, README.md, and CLAUDE.md to reflect the new version
Adds comprehensive changelog entries documenting all upstream changes between versions

Changes

This upgrade includes several upstream improvements across multiple backends and components:

Java-relevant changes:

Chat continuation handling: InferenceParameters.setContinueFinalMessage() now correctly emits the first delta for non-continuation requests instead of suppressing it (previously only worked correctly for continuation requests)
Bug fix in ggml_backend_tensor_get_2d_async: corrects a typo in the fast-path condition that was preventing proper fallback for multi-copy gets

Backend improvements (no Java changes required):

NVFP4 quantization extended to MTP (Multi-Token Prediction) tensors in Qwen3.5 models
Adreno MoE pipeline bug fix for boundary-check race conditions in OpenCL kernels
SYCL backend: BF16 support added, Level Zero auto-detection improved, MoE dispatch optimized with counting sort
Vulkan: SPIRV-Headers detection improved for CMake-config-only installations
ZenDNN: Bumped to 2026-WW19 with Q8_0 weight support
Perplexity tool: Fixed 32-bit overflow on large context sizes

Build system:

LLAMA_BUILD_APP default reverted to ${LLAMA_STANDALONE} (OFF for FetchContent); project's defensive pin remains in place

Test plan

CI is green on this branch
Docs updated (CLAUDE.md changelog, README.md badge)

Related issues / PRs

Follows the llama.cpp upgrade procedure documented in CLAUDE.md.

Checklist

I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
My commits follow Conventional Commits
No security-sensitive changes

https://claude.ai/code/session_016Wh3BGBeMygcFUgQvcnuG6

Patch is mostly internal (ggml backend fixes, NVFP4 MTP scale tensors, SYCL MoE counting-sort speedup, Adreno MoE boundary-check fix). The one user-visible behavioural change is in server-task.cpp: common_chat_parser_params::is_continuation (new) now gates the empty-prefill chat_msg init — non-continuation requests correctly emit the first delta instead of suppressing it. Java already wires continue_final_message through to the request JSON, so behaviour is picked up automatically. Build + all 435 C++ tests pass.

sonarqubecloud · 2026-05-24T10:42:26Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

bernardladenthin had a problem deploying to startgate May 24, 2026 10:41 — with GitHub Actions Error

bernardladenthin merged commit c07ab4a into main May 24, 2026
6 of 9 checks passed

bernardladenthin deleted the claude/quirky-tesla-fE3B9 branch May 24, 2026 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade llama.cpp from b9284 to b9297#191

Upgrade llama.cpp from b9284 to b9297#191
bernardladenthin merged 1 commit into
mainfrom
claude/quirky-tesla-fE3B9

bernardladenthin commented May 24, 2026

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented May 24, 2026

Summary

Changes

Test plan

Related issues / PRs

Checklist

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 24, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants