Skip to content

Upgrade llama.cpp from b9284 to b9297#191

Merged
bernardladenthin merged 1 commit into
mainfrom
claude/quirky-tesla-fE3B9
May 24, 2026
Merged

Upgrade llama.cpp from b9284 to b9297#191
bernardladenthin merged 1 commit into
mainfrom
claude/quirky-tesla-fE3B9

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

Summary

  • Upgrades the pinned llama.cpp version from b9284 to b9297
  • Updates CMakeLists.txt, README.md, and CLAUDE.md to reflect the new version
  • Adds comprehensive changelog entries documenting all upstream changes between versions

Changes

This upgrade includes several upstream improvements across multiple backends and components:

Java-relevant changes:

  • Chat continuation handling: InferenceParameters.setContinueFinalMessage() now correctly emits the first delta for non-continuation requests instead of suppressing it (previously only worked correctly for continuation requests)
  • Bug fix in ggml_backend_tensor_get_2d_async: corrects a typo in the fast-path condition that was preventing proper fallback for multi-copy gets

Backend improvements (no Java changes required):

  • NVFP4 quantization extended to MTP (Multi-Token Prediction) tensors in Qwen3.5 models
  • Adreno MoE pipeline bug fix for boundary-check race conditions in OpenCL kernels
  • SYCL backend: BF16 support added, Level Zero auto-detection improved, MoE dispatch optimized with counting sort
  • Vulkan: SPIRV-Headers detection improved for CMake-config-only installations
  • ZenDNN: Bumped to 2026-WW19 with Q8_0 weight support
  • Perplexity tool: Fixed 32-bit overflow on large context sizes

Build system:

  • LLAMA_BUILD_APP default reverted to ${LLAMA_STANDALONE} (OFF for FetchContent); project's defensive pin remains in place

Test plan

  • CI is green on this branch
  • Docs updated (CLAUDE.md changelog, README.md badge)

Related issues / PRs

Follows the llama.cpp upgrade procedure documented in CLAUDE.md.

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes

https://claude.ai/code/session_016Wh3BGBeMygcFUgQvcnuG6

Patch is mostly internal (ggml backend fixes, NVFP4 MTP scale tensors,
SYCL MoE counting-sort speedup, Adreno MoE boundary-check fix). The one
user-visible behavioural change is in server-task.cpp:
common_chat_parser_params::is_continuation (new) now gates the
empty-prefill chat_msg init — non-continuation requests correctly emit
the first delta instead of suppressing it. Java already wires
continue_final_message through to the request JSON, so behaviour is
picked up automatically.

Build + all 435 C++ tests pass.
@bernardladenthin bernardladenthin merged commit c07ab4a into main May 24, 2026
6 of 9 checks passed
@bernardladenthin bernardladenthin deleted the claude/quirky-tesla-fE3B9 branch May 24, 2026 10:42
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants