Skip to content

sync : llama.cpp #1016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 15, 2024
Merged

sync : llama.cpp #1016

merged 10 commits into from
Nov 15, 2024

Conversation

ggerganov
Copy link
Member

No description provided.

ggerganov and others added 10 commits November 15, 2024 21:33
* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
…a/9921)

* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
* sycl: Use syclcompat::dp4a

* Using the syclcompat version allow the compiler to optimize the
  operation with native function

* Update news section

* Update CI Windows oneAPI version to 2025.0

* Reword doc

* Call syclcompat::dp4a inside dpct::dp4a

This reverts commit 90cb61d692d61360b46954a1c7f780bd2e569b73.
* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge
@ggerganov ggerganov merged commit 2dbdaaf into master Nov 15, 2024
10 checks passed
@ggerganov ggerganov deleted the sync branch November 15, 2024 20:51
@slaren
Copy link
Member

slaren commented Nov 15, 2024

The moved files weren't removed, they are duplicated now.

@mudler
Copy link
Contributor

mudler commented Nov 19, 2024

seems part of this PR ( 5384878 ) have broken building here with sycl, not sure if am I missing something?

2024-11-19T09:13:54.2414646Z #56 97.42 /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-sycl/dpct/helper.hpp:1240:55: warning: cast from 'const void *' to 'unsigned char *' drops const qualifier [-Wcast-qual]
2024-11-19T09:13:54.2417134Z #56 97.42  1240 |                 auto it = m_map.upper_bound((byte_t *)ptr);
2024-11-19T09:13:54.2418252Z #56 97.42       |                                                       ^
2024-11-19T09:13:54.2420526Z #56 97.42 /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-sycl/dpct/helper.hpp:1837:16: error: no member named 'dp4a' in namespace 'syclcompat'; did you mean simply 'dp4a'?
2024-11-19T09:13:54.2422486Z #56 97.42  1837 |         return syclcompat::dp4a(a, b, c);
2024-11-19T09:13:54.2423252Z #56 97.42       |                ^~~~~~~~~~~~~~~~
2024-11-19T09:13:54.2424107Z #56 97.42       |                dp4a
2024-11-19T09:13:54.2425546Z #56 97.42 /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-sycl/dpct/helper.hpp:1835:17: note: 'dp4a' declared here
2024-11-19T09:13:54.2427095Z #56 97.42  1835 |     inline auto dp4a(T1 a, T2 b, T3 c)

Any help would be appreciated, thank you! 🙏

@slaren
Copy link
Member

slaren commented Nov 19, 2024

You probably need to update the oneAPI etc SYCL libraries. See ggml-org/llama.cpp#10267 for more details.

@mudler
Copy link
Contributor

mudler commented Nov 19, 2024

You probably need to update the oneAPI etc SYCL libraries. See ggerganov/llama.cpp#10267 for more details.

Gonna check in a bit. Thank you @slaren!

@mudler
Copy link
Contributor

mudler commented Nov 20, 2024

that was it indeed! just for reference if anyone else is facing this, I had to update the Docker base image from intel/oneapi-basekit:2024.2.0-devel-ubuntu22.04 to intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants