CI updates by m0nsky · Pull Request #1390 · SciSharp/LLamaSharp

m0nsky · 2026-05-24T15:46:47Z

This PR does a couple of things:

Disable webui build (-DLLAMA_BUILD_UI=OFF, which fired a npm install/build) that was causing OOM
Disable app build (-DLLAMA_BUILD_APP=OFF, which links against llama-server-impl and llama-cli-impl on Android, which we weren't building, which was causing a linker error)
Disable examples/server builds (DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_SERVER=OFF, which we weren't using)
Properly set num proc for Windows (env:NUMBER_OF_PROCESSORS), Linux (nproc) and macOS (sysctl -n hw.logicalcpu) resulting in faster build times (previously, this resulted in an empty string for macOS, and we weren't even doing this for cublas!)
Suppress upstream CUDA warnings (which were causing 381k lines log files on on the windows cuda runner, making it hard to debug)
Set cmake targets to build only the shared libraries that llamasharp uses (ggml, ggml-base, ggml-cpu/cuda/vulkan, llama + mtmd)

This results in the complete binary update workflow going from 2h 55m -> 1h 23m

Job	Before	After	Saved	Change
Linux (noavx)	4m 32s	2m 12s	2m 20s	-51%
Linux (avx)	4m 42s	2m 21s	2m 21s	-50%
Linux (avx2)	4m 49s	2m 12s	2m 37s	-54%
Linux (avx512)	4m 44s	1m 53s	2m 51s	-60%
Linux (aarch64)	4m 41s	1m 39s	3m 2s	-65%
musl (noavx)	7m 34s	3m 18s	4m 16s	-56%
musl (avx)	7m 14s	3m 12s	4m 2s	-56%
musl (avx2)	7m 33s	3m 27s	4m 6s	-54%
musl (avx512)	7m 26s	3m 3s	4m 23s	-59%
Windows (noavx)	7m 13s	3m 26s	3m 47s	-52%
Windows (avx)	6m 37s	3m 14s	3m 23s	-51%
Windows (avx2)	6m 14s	3m 13s	3m 1s	-48%
Windows (avx512)	6m 11s	3m 22s	2m 49s	-46%
Windows ARM64	4m 22s	2m 40s	1m 42s	-39%
Vulkan (Linux)	8m 30s	5m 53s	2m 37s	-31%
Vulkan (Windows)	11m 26s	8m 36s	2m 50s	-25%
cublas (Linux)	2h 7m 24s	59m 33s	1h 7m 51s	-53%
cublas (Windows)	2h 53m 46s	1h 22m 25s	1h 31m 21s	-53%
macOS (arm64)	25m 38s	2m 28s	23m 10s	-90%
macOS (x64)	31m 55s	2m 47s	29m 8s	-91%
macOS (x64-rosetta2)	22m 56s	2m 14s	20m 42s	-90%
Android (arm64-v8a)	4m 34s	2m 32s	2m 2s	-45%
Android (x86_64)	4m 40s	2m 43s	1m 57s	-42%
Gather Binaries	1m 20s	1m 22s	—	—

Total	2h 55m 13s	1h 23m 56s	1h 31m 17s	-52%

Completed build run:
https://github.com/m0nsky/LLamaSharp/actions/runs/26358923261

… syntax llama.cpp now builds an embedded Web UI (npm install + build) by default, which combined with unlimited parallel compilation exhausts the ~7GB RAM on GitHub ubuntu-22.04 runners. Disable it with -DLLAMA_BUILD_UI=OFF since LLamaSharp only needs the shared libraries. Also fix -j ${env:NUMBER_OF_PROCESSORS} (PowerShell syntax) to -j $(nproc) in bash steps — the old syntax silently expanded to empty, causing cmake to use unlimited parallelism.

llama.cpp introduced a unified binary (llama-app) that links against llama-server-impl and llama-cli-impl. When LLAMA_BUILD_SERVER=OFF (as set for Android), these libraries aren't built, causing a linker error. Disable llama-app globally since LLamaSharp only needs the shared libraries, not the CLI tools.

The macOS step was using PowerShell syntax ${env:NUMBER_OF_PROCESSORS} which silently expands to empty in bash, resulting in unlimited parallelism. Use $(sysctl -n hw.logicalcpu) which is the correct macOS equivalent of nproc.

The cublas build steps had no -j flag, defaulting to single-threaded compilation. Add -j with the correct platform syntax to parallelize CUDA kernel compilation.

The nvcc compiler emits thousands of warnings from upstream llama.cpp CUDA code (e.g. float overflow SciSharp#221-D, unused variables SciSharp#177-D), repeated for each of the 8 target architectures. On Windows this produces 381k+ lines of log output, truncating the actual build output. Suppress with -DCMAKE_CUDA_FLAGS=-w since we don't maintain this code.

LLamaSharp only needs the shared libraries (ggml, llama, mtmd), not the CLI tools, server, or example binaries. Disable examples and server globally via COMMON_DEFINE, and remove the now-redundant per-platform LLAMA_BUILD_SERVER=OFF from Android defines.

Add SUPPRESS_WARNINGS_MSVC (/w) and SUPPRESS_WARNINGS_GNU (-w) env vars and apply them to all cmake configure steps. These are upstream llama.cpp warnings we don't maintain — particularly noisy on Windows where MSVC template instantiation warnings produce hundreds of thousands of log lines.

This reverts commit 177d0e4.

Instead of building all llama.cpp targets (CLI tools, benchmarks, server, examples), use cmake --target to build only the shared libraries that LLamaSharp actually uses: ggml, ggml-base, ggml-cpu/cuda/vulkan, llama, and mtmd. This skips ~40 unnecessary targets and their dependencies.

martindevans · 2026-05-24T18:03:56Z

Huge speedups! Thanks for this, it'll make future binary updates a lot less painful. I've test this locally with the binaries from your test run and it worked perfectly.

m0nsky added 9 commits May 24, 2026 09:26

Fix -j syntax for macOS build to use sysctl

b688b2e

The macOS step was using PowerShell syntax ${env:NUMBER_OF_PROCESSORS} which silently expands to empty in bash, resulting in unlimited parallelism. Use $(sysctl -n hw.logicalcpu) which is the correct macOS equivalent of nproc.

Enable parallel compilation for cublas builds

c7a9738

The cublas build steps had no -j flag, defaulting to single-threaded compilation. Add -j with the correct platform syntax to parallelize CUDA kernel compilation.

Revert "Suppress all upstream compiler warnings across platforms"

c36e19c

This reverts commit 177d0e4.

martindevans merged commit 5c5b706 into SciSharp:master May 24, 2026
8 checks passed

m0nsky deleted the fix/ci-update-binaries-failures branch May 24, 2026 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI updates#1390

CI updates#1390
martindevans merged 9 commits into
SciSharp:masterfrom
m0nsky:fix/ci-update-binaries-failures

m0nsky commented May 24, 2026 •

edited

Loading

Uh oh!

martindevans commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m0nsky commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindevans commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

m0nsky commented May 24, 2026 •

edited

Loading