Skip to content

Upgrade llama.cpp from b9151 to b9172#143

Merged
bernardladenthin merged 1 commit into
mainfrom
claude/update-b9172-compatibility-I3YnI
May 15, 2026
Merged

Upgrade llama.cpp from b9151 to b9172#143
bernardladenthin merged 1 commit into
mainfrom
claude/update-b9172-compatibility-I3YnI

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

Summary

Upgrades the pinned llama.cpp version from b9151 to b9172 and adds necessary build configuration to handle upstream changes in WebUI asset provisioning.

Changes Made

  • CMakeLists.txt: Updated GIT_TAG from b9151 to b9172 and added set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE) to disable WebUI asset building (upstream now provisions assets at build time via HF bucket by default)
  • README.md: Updated llama.cpp version badge and link from b9151 to b9172
  • CLAUDE.md:
    • Updated pinned version reference from b9151 to b9172
    • Added changelog entries documenting upstream changes between b9151–b9172:
      • clip_has_whisper_encoder() removed from public API (no project impact)
      • WebUI assets now provisioned at build time; project disables asset download via LLAMA_BUILD_WEBUI=OFF
      • common_params::webui default made conditional on LLAMA_WEBUI_DEFAULT_ENABLED macro
      • Bug fix in common_reasoning_budget_clone (no API change)
      • AMD RDNA3 WMMA flash attention support in CUDA backend
      • Non-function Responses API tools now silently skipped in server-chat

Implementation Details

The LLAMA_BUILD_WEBUI=OFF cache variable is set before FetchContent_MakeAvailable() to prevent upstream from downloading WebUI assets during the build. This aligns with the project's build strategy and avoids unnecessary network I/O.

All upstream changes are either internal (CUDA backend, server implementation) or non-breaking for the Java bindings. No JNI code modifications required.

https://claude.ai/code/session_01DVizDEtXBVDaXciEoo9a8v

Also adds LLAMA_BUILD_WEBUI=OFF before FetchContent to prevent the new
build-time WebUI asset download introduced in b9172 from running during
CI/local builds. No JNI-level API changes were required.

https://claude.ai/code/session_01DVizDEtXBVDaXciEoo9a8v
@bernardladenthin bernardladenthin merged commit 98174fe into main May 15, 2026
4 of 16 checks passed
@bernardladenthin bernardladenthin deleted the claude/update-b9172-compatibility-I3YnI branch May 15, 2026 23:06
@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Review Summary

✅ Strengths

  1. Consistent version updates: All references to llama.cpp version are correctly updated from b9151 to b9172 across CMakeLists.txt, README.md, and CLAUDE.md
  2. Proper CMake configuration order: The LLAMA_BUILD_WEBUI=OFF cache variable is correctly placed before FetchContent_MakeAvailable(), ensuring upstream respects the setting
  3. Well-documented changelog: The CLAUDE.md additions clearly document the upstream changes and their project impact
  4. Minimal, focused PR: The changes are scoped appropriately for a version upgrade with no unnecessary modifications

⚠️ Question for Clarification

WebUI asset provisioning: The CLAUDE.md documentation indicates two upstream mechanisms for WebUI provisioning:

  • LLAMA_USE_PREBUILT_WEBUI=ON (default) — downloads prebuilt assets from HF bucket
  • LLAMA_BUILD_WEBUI — builds assets from source

The PR sets LLAMA_BUILD_WEBUI=OFF (disabling source build), but since LLAMA_USE_PREBUILT_WEBUI=ON is the default, will the build still attempt to download prebuilt WebUI assets from the HF bucket?

Clarification needed: Should the configuration also include set(LLAMA_USE_PREBUILT_WEBUI OFF CACHE BOOL "" FORCE) to fully prevent WebUI asset downloads and achieve the stated goal of avoiding unnecessary network I/O?

If the current setting is confirmed to work as intended (i.e., setting only LLAMA_BUILD_WEBUI=OFF prevents all WebUI provisioning), please verify this is tested in the CI pipeline.

Summary

The PR is technically sound and well-documented. The clarification above is the only item that should be addressed before merging.

Comment thread CMakeLists.txt
set(GGML_FMA ON CACHE BOOL "" FORCE)
set(GGML_F16C ON CACHE BOOL "" FORCE)
set(GGML_AVX512 OFF CACHE BOOL "" FORCE)
set(LLAMA_BUILD_WEBUI OFF CACHE BOOL "" FORCE)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider whether LLAMA_USE_PREBUILT_WEBUI also needs to be disabled. According to CLAUDE.md, upstream defaults to LLAMA_USE_PREBUILT_WEBUI=ON which downloads prebuilt WebUI assets. Setting LLAMA_BUILD_WEBUI=OFF prevents building from source, but the prebuilt download might still occur. If the goal is to fully avoid network I/O, you may also need:

set(LLAMA_USE_PREBUILT_WEBUI OFF CACHE BOOL "" FORCE)

Or confirm that LLAMA_BUILD_WEBUI=OFF alone is sufficient to prevent all WebUI asset provisioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants