Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : build backends as libraries #10256

Merged
merged 26 commits into from
Nov 14, 2024
Merged

ggml : build backends as libraries #10256

merged 26 commits into from
Nov 14, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Nov 11, 2024

Moves each backend to a different directory with its own build script. The ggml library is split into the target ggml-base that only includes the core ggml elements, and ggml that bundles ggml-base and all the backends included in the build.

To completely separate the build of the CPU backend, ggml-quants.c and ggml-aarch64.c have been split such as the reference quantization and dequantization functions are in ggml-base, and the optimized quantization and dot product functions are in ggml-cpu.

The build is organized as such:

graph TD;
application    --> libllama;
application    --> libggml;
libllama       --> libggml;
libggml        --> libggml-base;
libggml        --> libggml-cpu;
libggml        --> libggml-backend1;
libggml        --> libggml-backend2;
libggml-cpu    --> libggml-base;
libggml-backend1 --> libggml-base;
libggml-backend2 --> libggml-base;
Loading

Currently, ggml needs to be linked to the backend libraries, but ultimately the goal is to load the backends dynamically at runtime, so that we can distribute a single llama.cpp package that includes all the backends, as well as multiple versions of the CPU backend compiled with different instruction sets.

Breaking changes

Applications that use ggml and llama.cpp should not require any changes, they only need to link to the ggml and llama targets as usual. However, when building with BUILD_SHARED_LIBS, additional shared libraries are produced that need to be bundled with the application: in addition to llama and ggml, ggml-base, ggml-cpu and the any other backends included in the build should be added to the application package.

  • The flag to build the HIP backend with cmake has been changed from GGML_HIPBLAS to GGML_HIP, in line with a previous change to the CUDA backend

@github-actions github-actions bot added build Compilation issues Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Nov 11, 2024
@github-actions github-actions bot added testing Everything test related examples labels Nov 11, 2024
@slaren slaren force-pushed the sl/dl-backend branch 4 times, most recently from 28b3b76 to 0cdecd3 Compare November 12, 2024 01:03
@github-actions github-actions bot added Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Kompute https://github.com/KomputeProject/kompute/ labels Nov 12, 2024
@github-actions github-actions bot added the devops improvements to build systems and github actions label Nov 12, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment labels Nov 12, 2024
@slaren slaren force-pushed the sl/dl-backend branch 2 times, most recently from db2cb04 to 45f7dc4 Compare November 12, 2024 19:46
@slaren slaren merged commit ae8de6d into master Nov 14, 2024
55 checks passed
@slaren slaren deleted the sl/dl-backend branch November 14, 2024 17:04
@arch-btw
Copy link
Contributor

Is this caused by this commit by any chance?

make: *** No rule to make target 'ggml/src/ggml-vulkan.cpp', needed by 'ggml/src/ggml-vulkan.o'. Stop.
make: *** Waiting for unfinished jobs....

@fairydreaming
Copy link
Collaborator

@slaren I see that you removed #include "ggml-cpu-impl.h" from ggml.c. This breaks compilation for builds with AVX512, as it contains definition of m512i() macro used in ggml.c when AVX512 is enabled. I simply copied the macro definition to ggml.c and it compiled successfully then.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>

build passed
@LostRuins
Copy link
Collaborator

LostRuins commented Nov 16, 2024

Hi @slaren , I don't see the rules required to create ggml-cpu-aarch64.o and ggml-cpu-quants.o and ggml-backend-reg.o in the Makefile, are they defined someplace else?

@ggerganov
Copy link
Owner

I'll add them now. But it's better to start using the CMake build since the Makefile will be removed at some point.

@LostRuins
Copy link
Collaborator

Thanks. Understandable. I hope that solutions to building on more esoteric environments like Termux/w64devkit/old linux/macOS systems that do not have cmake readily available can be found.

@LostRuins
Copy link
Collaborator

LostRuins commented Nov 16, 2024

Hi,

#include "../ggml-common.h"

Relative path to ggml-common.h is now broken after this PR.

Edit: Perhaps I need to use the GGML_METAL_EMBED_LIBRARY branch instead.

@ggerganov
Copy link
Owner

Yes, the GGML_METAL_EMBED_LIBRARY should be good. If we remove the relative path, then the SPM package stops working. There is maybe some better way to build the Metal code.

brittlewis12 added a commit to brittlewis12/llama-cpp-rs that referenced this pull request Nov 16, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 17, 2024
* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>

test passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
@gorpo-69
Copy link

Cannot build with VS 2022 (admin dev prompt) for CUDA anymore and I think it is this change, I have all the permissions and dir/subdirs have full control permissions to all users. Was compiling up until right after Daisyui server revamp ~2 weeks ago. The dll export fails

(...)
ggml-threading.cpp
Auto build dll exports
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: T
he command "setlocal [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: c
d C:\build\llamacpp\msvc_latest\ggml\src [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: C
: [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: "
C:\Program Files\CMake\bin\cmake.exe" -E __create_def C:/build/llamacpp/msvc_latest/ggml/src/ggml-base.dir/Release/exports.def C:/buil
d/llamacpp/msvc_latest/ggml/src/ggml-base.dir/Release//objects.txt [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
cmEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: e
ndlocal & call :cmErrorLevel %errorlevel% & goto :cmDone [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
cmErrorLevel [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: e
xit /b %1 [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
cmDone [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: i
f %errorlevel% neq 0 goto :VCEnd [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(161,5): error MSB3073: :
VCEnd" exited with code -1073741819. [C:\build\llamacpp\msvc_latest\ggml\src\ggml-base.vcxproj]

CMAKE configuration output (working up until ~2weeks ago no problem)

PS C:\build\llamacpp\msvc_latest> cmake -B . -S "C:\msys64\home\admin\llama.cpp" -DGGML_NATIVE=ON -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_CUDA_F16=ON -DGGML_LTO=ON -DLLAMA_CURL=ON -DLLAMA_SERVER_SSL=ON
-- Building for: Visual Studio 17 2022
-- Selecting Windows SDK version 10.0.26100.0 to target Windows 10.0.22631.
-- The C compiler identification is MSVC 19.42.34433.0
-- The CXX compiler identification is MSVC 19.42.34433.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.42.34433/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.42.34433/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.46.2.windows.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- CMAKE_GENERATOR_PLATFORM:
-- Found OpenMP_C: -openmp (found version "2.0")
-- Found OpenMP_CXX: -openmp (found version "2.0")
-- Found OpenMP: TRUE (found version "2.0")
-- OpenMP found
-- Using llamafile
-- x86 detected
-- Performing Test HAS_AVX_1
-- Performing Test HAS_AVX_1 - Success
-- Performing Test HAS_AVX2_1
-- Performing Test HAS_AVX2_1 - Success
-- Performing Test HAS_FMA_1
-- Performing Test HAS_FMA_1 - Success
-- Performing Test HAS_AVX512_1
-- Performing Test HAS_AVX512_1 - Failed
-- Performing Test HAS_AVX512_2
-- Performing Test HAS_AVX512_2 - Failed
-- Using runtime weight conversion of Q4_0 to Q4_0_x_x to enable optimized GEMM/GEMV kernels
-- Including CPU backend
CMake Warning at ggml/src/ggml-amx/CMakeLists.txt:106 (message):
  AMX requires x86 and gcc version > 11.0.  Turning off GGML_AMX.


-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/include (found version "12.6.77")
-- CUDA Toolkit found
-- Using CUDA architectures: 89
-- The CUDA compiler identification is NVIDIA 12.6.77 with host compiler MSVC 19.42.34433.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Including CUDA backend
-- Found CURL: C:/Program Files/vcpkg/installed/x64-windows/share/curl/CURLConfig.cmake (found version "8.10.1-DEV")
-- Found OpenSSL: C:/Program Files/vcpkg/installed/x64-windows/lib/libcrypto.lib (found version "3.3.2")
-- Configuring done (16.1s)
-- Generating done (0.4s)
-- Build files have been written to: C:/build/llamacpp/msvc_latest

@slaren
Copy link
Collaborator Author

slaren commented Nov 22, 2024

I have no issues building for CUDA with VS 2022.

@gorpo-69
Copy link

Hi Slaren, I've taken a look at this again and the problem was DGGML_LTO=ON (I can build the fresh version with all those flags but LTO now). Before the change, I could re-run the cmake build command on failure to export the dlls, then it would regenerate the mock(?) pgbds all over again (maybe cuz they were kinda in the same place?). That version also created a _CMakeLTOTest-C and _CMakeLTOTest-CXX dirs inside \ggml\src\CMakeFiles, but after this commit it just doesn't do that, so I guess the access violation error must be somehow related to not having the temp/mock libs to do LTO with due to the order of commands or something (error persists irrespective of generator, msvc or ninja)

I'm sorry if any of this sounds kinda generic or imprecise, I'm not a professional engineer.

Kudos

@slaren
Copy link
Collaborator Author

slaren commented Dec 1, 2024

I don't think LTO makes much difference for ggml, everything that should be inlined is already defined in the same translation unit. I will take a look at this when I have the chance, but you should not lose anything by just disabling LTO.

kou added a commit to groonga/groonga that referenced this pull request Dec 30, 2024
GitHub: fix pgroonga/pgroonga#642

It can build backends as libraries:
ggerganov/llama.cpp#10256

The current bundled llama.cpp uses some AVX operations in static
variables. So we can't load libgroonga.so on CPU without AVX.

With the backends as libraries feature, we can really lazy AVX
operations.

Reported by Yuki Shira. Thanks!!!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) build Compilation issues devops improvements to build systems and github actions documentation Improvements or additions to documentation examples ggml changes relating to the ggml tensor library for machine learning Kompute https://github.com/KomputeProject/kompute/ nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment Nvidia GPU Issues specific to Nvidia GPUs SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants