ggml : build backends as libraries #10256

slaren · 2024-11-11T20:23:46Z

Moves each backend to a different directory with its own build script. The ggml library is split into the target ggml-base that only includes the core ggml elements, and ggml that bundles ggml-base and all the backends included in the build.

To completely separate the build of the CPU backend, ggml-quants.c and ggml-aarch64.c have been split such as the reference quantization and dequantization functions are in ggml-base, and the optimized quantization and dot product functions are in ggml-cpu.

The build is organized as such:

graph TD;
application    --> libllama;
application    --> libggml;
libllama       --> libggml;
libggml        --> libggml-base;
libggml        --> libggml-cpu;
libggml        --> libggml-backend1;
libggml        --> libggml-backend2;
libggml-cpu    --> libggml-base;
libggml-backend1 --> libggml-base;
libggml-backend2 --> libggml-base;

Currently, ggml needs to be linked to the backend libraries, but ultimately the goal is to load the backends dynamically at runtime, so that we can distribute a single llama.cpp package that includes all the backends, as well as multiple versions of the CPU backend compiled with different instruction sets.

Breaking changes

Applications that use ggml and llama.cpp should not require any changes, they only need to link to the ggml and llama targets as usual. However, when building with BUILD_SHARED_LIBS, additional shared libraries are produced that need to be bundled with the application: in addition to llama and ggml, ggml-base, ggml-cpu and the any other backends included in the build should be added to the application package.

The flag to build the HIP backend with cmake has been changed from GGML_HIPBLAS to GGML_HIP, in line with a previous change to the CUDA backend

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

ggml/CMakeLists.txt

arch-btw · 2024-11-15T12:00:38Z

Is this caused by this commit by any chance?

make: *** No rule to make target 'ggml/src/ggml-vulkan.cpp', needed by 'ggml/src/ggml-vulkan.o'. Stop.
make: *** Waiting for unfinished jobs....

fairydreaming · 2024-11-15T18:24:24Z

@slaren I see that you removed #include "ggml-cpu-impl.h" from ggml.c. This breaks compilation for builds with AVX512, as it contains definition of m512i() macro used in ggml.c when AVX512 is enabled. I simply copied the macro definition to ggml.c and it compiled successfully then.

* ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com> build passed

LostRuins · 2024-11-16T08:00:28Z

Hi @slaren , I don't see the rules required to create ggml-cpu-aarch64.o and ggml-cpu-quants.o and ggml-backend-reg.o in the Makefile, are they defined someplace else?

ggerganov · 2024-11-16T08:51:00Z

I'll add them now. But it's better to start using the CMake build since the Makefile will be removed at some point.

LostRuins · 2024-11-16T09:00:02Z

Thanks. Understandable. I hope that solutions to building on more esoteric environments like Termux/w64devkit/old linux/macOS systems that do not have cmake readily available can be found.

LostRuins · 2024-11-16T18:05:44Z

Hi,

llama.cpp/ggml/src/ggml-metal/ggml-metal.metal

Line 7 in bcdb7a2

#include "../ggml-common.h"

Relative path to ggml-common.h is now broken after this PR.

Edit: Perhaps I need to use the GGML_METAL_EMBED_LIBRARY branch instead.

ggerganov · 2024-11-16T18:25:05Z

Yes, the GGML_METAL_EMBED_LIBRARY should be good. If we remove the relative path, then the SPM package stops working. There is maybe some better way to build the Metal code.

* https://github.com/ggerganov/llama.cpp/commits/db4cfd5dbc31c90f0d5c413a2e182d068b8ee308 * build.rs & bindgen includes updates to reflect backend refactor - ggerganov/llama.cpp#10256

* ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com> test passed

* ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>

gorpo-69 · 2024-11-22T11:19:13Z

Cannot build with VS 2022 (admin dev prompt) for CUDA anymore and I think it is this change, I have all the permissions and dir/subdirs have full control permissions to all users. Was compiling up until right after Daisyui server revamp ~2 weeks ago. The dll export fails

(...)
ggml-threading.cpp
Auto build dll exports
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: T
he command "setlocal [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: c
d C:\build\llamacpp\msvc_latest\ggml\src [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: C
: [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: "
C:\Program Files\CMake\bin\cmake.exe" -E __create_def C:/build/llamacpp/msvc_latest/ggml/src/ggml-base.dir/Release/exports.def C:/buil
d/llamacpp/msvc_latest/ggml/src/ggml-base.dir/Release//objects.txt [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: e
ndlocal & call :cmErrorLevel %errorlevel% & goto :cmDone [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
cmErrorLevel [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: e
xit /b %1 [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
cmDone [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :VCEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
VCEnd" exited with code -1073741819. [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]

CMAKE configuration output (working up until ~2weeks ago no problem)

PS C:\build\llamacpp\msvc_latest> cmake -B . -S "C:\msys64\home\admin\llama.cpp" -DGGML_NATIVE=ON -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_CUDA_F16=ON -DGGML_LTO=ON -DLLAMA_CURL=ON -DLLAMA_SERVER_SSL=ON
-- Building for: Visual Studio 17 2022
-- Selecting Windows SDK version 10.0.26100.0 to target Windows 10.0.22631.
-- The C compiler identification is MSVC 19.42.34433.0
-- The CXX compiler identification is MSVC 19.42.34433.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.42.34433/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.42.34433/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.46.2.windows.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- CMAKE_GENERATOR_PLATFORM:
-- Found OpenMP_C: -openmp (found version "2.0")
-- Found OpenMP_CXX: -openmp (found version "2.0")
-- Found OpenMP: TRUE (found version "2.0")
-- OpenMP found
-- Using llamafile
-- x86 detected
-- Performing Test HAS_AVX_1
-- Performing Test HAS_AVX_1 - Success
-- Performing Test HAS_AVX2_1
-- Performing Test HAS_AVX2_1 - Success
-- Performing Test HAS_FMA_1
-- Performing Test HAS_FMA_1 - Success
-- Performing Test HAS_AVX512_1
-- Performing Test HAS_AVX512_1 - Failed
-- Performing Test HAS_AVX512_2
-- Performing Test HAS_AVX512_2 - Failed
-- Using runtime weight conversion of Q4_0 to Q4_0_x_x to enable optimized GEMM/GEMV kernels
-- Including CPU backend
CMake Warning at ggml/src/ggml-amx/CMakeLists.txt:106 (message):
  AMX requires x86 and gcc version > 11.0.  Turning off GGML_AMX.


-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/include (found version "12.6.77")
-- CUDA Toolkit found
-- Using CUDA architectures: 89
-- The CUDA compiler identification is NVIDIA 12.6.77 with host compiler MSVC 19.42.34433.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Including CUDA backend
-- Found CURL: C:/Program Files/vcpkg/installed/x64-windows/share/curl/CURLConfig.cmake (found version "8.10.1-DEV")
-- Found OpenSSL: C:/Program Files/vcpkg/installed/x64-windows/lib/libcrypto.lib (found version "3.3.2")
-- Configuring done (16.1s)
-- Generating done (0.4s)
-- Build files have been written to: C:/build/llamacpp/msvc_latest

slaren · 2024-11-22T13:46:42Z

I have no issues building for CUDA with VS 2022.

gorpo-69 · 2024-11-30T17:15:30Z

Hi Slaren, I've taken a look at this again and the problem was DGGML_LTO=ON (I can build the fresh version with all those flags but LTO now). Before the change, I could re-run the cmake build command on failure to export the dlls, then it would regenerate the mock(?) pgbds all over again (maybe cuz they were kinda in the same place?). That version also created a _CMakeLTOTest-C and _CMakeLTOTest-CXX dirs inside \ggml\src\CMakeFiles, but after this commit it just doesn't do that, so I guess the access violation error must be somehow related to not having the temp/mock libs to do LTO with due to the order of commands or something (error persists irrespective of generator, msvc or ninja)

I'm sorry if any of this sounds kinda generic or imprecise, I'm not a professional engineer.

Kudos

slaren · 2024-12-01T18:00:09Z

I don't think LTO makes much difference for ggml, everything that should be inlined is already defined in the same translation unit. I will take a look at this when I have the chance, but you should not lose anything by just disabling LTO.

GitHub: fix pgroonga/pgroonga#642 It can build backends as libraries: ggerganov/llama.cpp#10256 The current bundled llama.cpp uses some AVX operations in static variables. So we can't load libgroonga.so on CPU without AVX. With the backends as libraries feature, we can really lazy AVX operations. Reported by Yuki Shira. Thanks!!!

ggml : build backends as libraries

a30f0b2

github-actions bot added build Compilation issues Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 11, 2024

slaren added 2 commits November 11, 2024 23:20

Merge remote-tracking branch 'origin/master' into sl/dl-backend

4428593

fix tests and examples

8768c7c

github-actions bot added testing Everything test related examples labels Nov 11, 2024

add rpc backend

bf79cb3

slaren force-pushed the sl/dl-backend branch 4 times, most recently from 28b3b76 to 0cdecd3 Compare November 12, 2024 01:03

build fixes

ab26fb9

slaren force-pushed the sl/dl-backend branch from 0cdecd3 to ab26fb9 Compare November 12, 2024 01:32

more build fixes

efdd713

github-actions bot added Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Kompute https://github.com/KomputeProject/kompute/ labels Nov 12, 2024

add vulkan and kompute

646e91a

slaren force-pushed the sl/dl-backend branch from 8cd434c to 646e91a Compare November 12, 2024 17:44

github-actions bot added the devops improvements to build systems and github actions label Nov 12, 2024

add amx, cann, sycl

710822f

slaren force-pushed the sl/dl-backend branch from bac7868 to 710822f Compare November 12, 2024 18:40

github-actions bot added documentation Improvements or additions to documentation nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment labels Nov 12, 2024

slaren force-pushed the sl/dl-backend branch 2 times, most recently from db2cb04 to 45f7dc4 Compare November 12, 2024 19:46

add hip

c8da7d0

slaren force-pushed the sl/dl-backend branch from 45f7dc4 to c8da7d0 Compare November 12, 2024 19:53

yeahdongcn mentioned this pull request Nov 14, 2024

ggml: separate musa into its own section in the Makefile #10294

Merged

4 tasks

ggerganov mentioned this pull request Nov 14, 2024

metal : refactor kernel args into structs #10238

Merged

12 tasks

slaren and others added 2 commits November 14, 2024 16:21

Merge remote-tracking branch 'origin/master' into sl/dl-backend

4e49714

ggml: separate musa into its own section in the Makefile (#10294)

f2f5c3b

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

ggerganov reviewed Nov 14, 2024

View reviewed changes

ggml/CMakeLists.txt Show resolved Hide resolved

ggerganov approved these changes Nov 14, 2024

View reviewed changes

slaren merged commit ae8de6d into master Nov 14, 2024
55 checks passed

slaren deleted the sl/dl-backend branch November 14, 2024 17:04

ggerganov mentioned this pull request Nov 15, 2024

sync : ggml ggerganov/whisper.cpp#2561

Merged

slaren mentioned this pull request Nov 15, 2024

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels #9921

Merged

3 tasks

slaren mentioned this pull request Nov 15, 2024

ggml : fix some build issues #10317

Merged

ggerganov mentioned this pull request Nov 16, 2024

make : add missing rules for ggml sources #10333

Merged

tristandruyen mentioned this pull request Nov 26, 2024

Fix HIP flag inconsistency & build docs #10524

Merged

martindevans mentioned this pull request Jan 4, 2025

January 2025 Update SciSharp/LLamaSharp#1036

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : build backends as libraries #10256

ggml : build backends as libraries #10256

slaren commented Nov 11, 2024 •

edited

Loading

arch-btw commented Nov 15, 2024

fairydreaming commented Nov 15, 2024

LostRuins commented Nov 16, 2024 •

edited

Loading

ggerganov commented Nov 16, 2024

LostRuins commented Nov 16, 2024

LostRuins commented Nov 16, 2024 •

edited

Loading

ggerganov commented Nov 16, 2024

gorpo-69 commented Nov 22, 2024

slaren commented Nov 22, 2024

gorpo-69 commented Nov 30, 2024

slaren commented Dec 1, 2024

ggml : build backends as libraries #10256

ggml : build backends as libraries #10256

Conversation

slaren commented Nov 11, 2024 • edited Loading

Breaking changes

arch-btw commented Nov 15, 2024

fairydreaming commented Nov 15, 2024

LostRuins commented Nov 16, 2024 • edited Loading

ggerganov commented Nov 16, 2024

LostRuins commented Nov 16, 2024

LostRuins commented Nov 16, 2024 • edited Loading

ggerganov commented Nov 16, 2024

gorpo-69 commented Nov 22, 2024

slaren commented Nov 22, 2024

gorpo-69 commented Nov 30, 2024

slaren commented Dec 1, 2024

slaren commented Nov 11, 2024 •

edited

Loading

LostRuins commented Nov 16, 2024 •

edited

Loading

LostRuins commented Nov 16, 2024 •

edited

Loading