CI updates#1390
Merged
martindevans merged 9 commits intoMay 24, 2026
Merged
Conversation
… syntax
llama.cpp now builds an embedded Web UI (npm install + build) by default,
which combined with unlimited parallel compilation exhausts the ~7GB RAM
on GitHub ubuntu-22.04 runners. Disable it with -DLLAMA_BUILD_UI=OFF
since LLamaSharp only needs the shared libraries.
Also fix -j ${env:NUMBER_OF_PROCESSORS} (PowerShell syntax) to -j $(nproc)
in bash steps — the old syntax silently expanded to empty, causing cmake
to use unlimited parallelism.
llama.cpp introduced a unified binary (llama-app) that links against llama-server-impl and llama-cli-impl. When LLAMA_BUILD_SERVER=OFF (as set for Android), these libraries aren't built, causing a linker error. Disable llama-app globally since LLamaSharp only needs the shared libraries, not the CLI tools.
The macOS step was using PowerShell syntax ${env:NUMBER_OF_PROCESSORS}
which silently expands to empty in bash, resulting in unlimited
parallelism. Use $(sysctl -n hw.logicalcpu) which is the correct
macOS equivalent of nproc.
The cublas build steps had no -j flag, defaulting to single-threaded compilation. Add -j with the correct platform syntax to parallelize CUDA kernel compilation.
The nvcc compiler emits thousands of warnings from upstream llama.cpp CUDA code (e.g. float overflow SciSharp#221-D, unused variables SciSharp#177-D), repeated for each of the 8 target architectures. On Windows this produces 381k+ lines of log output, truncating the actual build output. Suppress with -DCMAKE_CUDA_FLAGS=-w since we don't maintain this code.
LLamaSharp only needs the shared libraries (ggml, llama, mtmd), not the CLI tools, server, or example binaries. Disable examples and server globally via COMMON_DEFINE, and remove the now-redundant per-platform LLAMA_BUILD_SERVER=OFF from Android defines.
Add SUPPRESS_WARNINGS_MSVC (/w) and SUPPRESS_WARNINGS_GNU (-w) env vars and apply them to all cmake configure steps. These are upstream llama.cpp warnings we don't maintain — particularly noisy on Windows where MSVC template instantiation warnings produce hundreds of thousands of log lines.
This reverts commit 177d0e4.
Instead of building all llama.cpp targets (CLI tools, benchmarks, server, examples), use cmake --target to build only the shared libraries that LLamaSharp actually uses: ggml, ggml-base, ggml-cpu/cuda/vulkan, llama, and mtmd. This skips ~40 unnecessary targets and their dependencies.
Member
|
Huge speedups! Thanks for this, it'll make future binary updates a lot less painful. I've test this locally with the binaries from your test run and it worked perfectly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR does a couple of things:
-DLLAMA_BUILD_UI=OFF, which fired a npm install/build) that was causing OOM-DLLAMA_BUILD_APP=OFF, which links againstllama-server-implandllama-cli-implon Android, which we weren't building, which was causing a linker error)DLLAMA_BUILD_EXAMPLES=OFF-DLLAMA_BUILD_SERVER=OFF, which we weren't using)env:NUMBER_OF_PROCESSORS), Linux (nproc) and macOS (sysctl -n hw.logicalcpu) resulting in faster build times (previously, this resulted in an empty string for macOS, and we weren't even doing this for cublas!)This results in the complete binary update workflow going from 2h 55m -> 1h 23m
Completed build run:
https://github.com/m0nsky/LLamaSharp/actions/runs/26358923261