Description
Building ggml-org/llama.cpp master / tag b9219 with BUILD_SHARED_LIBS=ON + GGML_CUDA=ON fails at the final llama-server / llama-cli link with undefined references to CUDA Driver VMM symbols. The shared libggml-cuda.so itself builds fine (default --allow-shlib-undefined), but anything downstream linking against it can't resolve the cu* symbols transitively.
[100%] Linking CXX executable ../../bin/llama-server
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemCreate'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemAddressReserve'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemUnmap'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemSetAccess'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuDeviceGet'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemAddressFree'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuGetErrorString'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuDeviceGetAttribute'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemMap'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemRelease'
/usr/bin/ld: ../../bin/libggml-cuda.so.0.12.0: undefined reference to `cuMemGetAllocationGranularity'
collect2: error: ld returned 1 exit status
Root cause
ggml/src/ggml-cuda/CMakeLists.txt line 181 already does:
target_link_libraries(ggml-cuda PRIVATE CUDA::cuda_driver)
…but the dependency is PRIVATE, so it satisfies the link of libggml-cuda.so itself (with --allow-shlib-undefined) but does not propagate to downstream targets (ggml, llama, llama-server, llama-cli) that link against ggml-cuda shared.
Result: libggml-cuda.so ships with unresolved cu* driver symbols, then ld refuses to link llama-server against it because the closure isn't satisfied.
Reproducer
FROM nvidia/cuda:13.0.0-devel-ubuntu24.04 AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential cmake git curl ca-certificates \
libcurl4-openssl-dev libgomp1 python3 python3-pip \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /build
RUN git clone --depth 1 --branch b9219 https://github.com/ggml-org/llama.cpp.git .
RUN cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=ON \
-DGGML_CUDA=ON \
-DGGML_CUDA_FA=ON \
-DGGML_CUDA_FA_ALL_QUANTS=ON \
-DLLAMA_CURL=ON \
-DCMAKE_CUDA_ARCHITECTURES="120"
RUN cmake --build build --config Release -j$(nproc) --target llama-server llama-cli
Build via docker buildx build --platform linux/amd64 . on macOS / qemu. Same failure on tag b9219 (2026-05-18) and b9180 (2026-05-16 MTP merge), so this is not a recent regression — it's a long-standing scope bug only triggered by the BUILD_SHARED_LIBS=ON + non-GGML_BACKEND_DL build path.
Working fix
Anbeeld hit the same bug on a downstream fork and landed the proper CMake change as commit 2b9aa77aa6, described in Anbeeld/beellama.cpp#18. The minimal patch — in ggml/src/ggml-cuda/CMakeLists.txt near the existing CUDA::cuda_driver link line:
target_link_libraries(ggml-cuda PRIVATE CUDA::cuda_driver)
if (NOT GGML_BACKEND_DL)
target_link_libraries(ggml-cuda INTERFACE $<LINK_ONLY:CUDA::cuda_driver>)
endif()
This:
- Leaves the dynamic-backend (
GGML_BACKEND_DL=ON) path unchanged
- Adds an
INTERFACE LINK_ONLY propagation for the shared BUILD_SHARED_LIBS=ON path, so executables linking against ggml-cuda automatically get libcuda.so on their link line
- Mirrors what the installed package config does in
ggml-cuda-config.cmake for VMM
I verified this on the same Ubuntu 24.04 / CUDA 13.0 / CMAKE_CUDA_ARCHITECTURES=120 (Blackwell consumer sm_120, RTX 5090M) reproducer: with Anbeeld's fix applied, llama-server and llama-cli link cleanly without any linker workaround.
Workaround (no source change)
For users hitting this today, the build can be unblocked by pushing the driver lib onto the executable/shared link lines via CMake:
cmake -B build \
-DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON \
-DCMAKE_EXE_LINKER_FLAGS="-L/usr/local/cuda/lib64/stubs -lcuda" \
-DCMAKE_SHARED_LINKER_FLAGS="-L/usr/local/cuda/lib64/stubs -lcuda" \
...
Context
Hit while building a custom image to test the new MTP support (#22673) on Olares One (RTX 5090M / sm_120 Blackwell consumer mobile). The official ghcr.io/ggml-org/llama.cpp:server-cuda13 rolling tag is still at b9191 (pre-MTP merge → post-MTP merge transition published 2026-05-17) and hasn't moved past it, so building locally is currently the only way to get b9180+ for Blackwell mobile.
Happy to send a PR with the target_link_libraries(... INTERFACE ...) line if useful.
Description
Building
ggml-org/llama.cppmaster / tagb9219withBUILD_SHARED_LIBS=ON+GGML_CUDA=ONfails at the finalllama-server/llama-clilink with undefined references to CUDA Driver VMM symbols. The sharedlibggml-cuda.soitself builds fine (default--allow-shlib-undefined), but anything downstream linking against it can't resolve thecu*symbols transitively.Root cause
ggml/src/ggml-cuda/CMakeLists.txtline 181 already does:…but the dependency is PRIVATE, so it satisfies the link of
libggml-cuda.soitself (with--allow-shlib-undefined) but does not propagate to downstream targets (ggml,llama,llama-server,llama-cli) that link againstggml-cudashared.Result:
libggml-cuda.soships with unresolvedcu*driver symbols, thenldrefuses to linkllama-serveragainst it because the closure isn't satisfied.Reproducer
Build via
docker buildx build --platform linux/amd64 .on macOS / qemu. Same failure on tagb9219(2026-05-18) andb9180(2026-05-16 MTP merge), so this is not a recent regression — it's a long-standing scope bug only triggered by theBUILD_SHARED_LIBS=ON+ non-GGML_BACKEND_DLbuild path.Working fix
Anbeeld hit the same bug on a downstream fork and landed the proper CMake change as commit
2b9aa77aa6, described in Anbeeld/beellama.cpp#18. The minimal patch — inggml/src/ggml-cuda/CMakeLists.txtnear the existingCUDA::cuda_driverlink line:This:
GGML_BACKEND_DL=ON) path unchangedINTERFACELINK_ONLYpropagation for the sharedBUILD_SHARED_LIBS=ONpath, so executables linking againstggml-cudaautomatically getlibcuda.soon their link lineggml-cuda-config.cmakefor VMMI verified this on the same Ubuntu 24.04 / CUDA 13.0 /
CMAKE_CUDA_ARCHITECTURES=120(Blackwell consumer sm_120, RTX 5090M) reproducer: with Anbeeld's fix applied,llama-serverandllama-clilink cleanly without any linker workaround.Workaround (no source change)
For users hitting this today, the build can be unblocked by pushing the driver lib onto the executable/shared link lines via CMake:
cmake -B build \ -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON \ -DCMAKE_EXE_LINKER_FLAGS="-L/usr/local/cuda/lib64/stubs -lcuda" \ -DCMAKE_SHARED_LINKER_FLAGS="-L/usr/local/cuda/lib64/stubs -lcuda" \ ...Context
Hit while building a custom image to test the new MTP support (#22673) on Olares One (RTX 5090M / sm_120 Blackwell consumer mobile). The official
ghcr.io/ggml-org/llama.cpp:server-cuda13rolling tag is still at b9191 (pre-MTP merge → post-MTP merge transition published 2026-05-17) and hasn't moved past it, so building locally is currently the only way to get b9180+ for Blackwell mobile.Happy to send a PR with the
target_link_libraries(... INTERFACE ...)line if useful.