sync : llama.cpp #1016

ggerganov · 2024-11-15T19:39:40Z

No description provided.

* ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>

…a/9921) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

* sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit 90cb61d692d61360b46954a1c7f780bd2e569b73.

* use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge

ggml-ci

slaren · 2024-11-15T20:59:10Z

The moved files weren't removed, they are duplicated now.

mudler · 2024-11-19T17:26:27Z

seems part of this PR ( 5384878 ) have broken building here with sycl, not sure if am I missing something?

2024-11-19T09:13:54.2414646Z #56 97.42 /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-sycl/dpct/helper.hpp:1240:55: warning: cast from 'const void *' to 'unsigned char *' drops const qualifier [-Wcast-qual]
2024-11-19T09:13:54.2417134Z #56 97.42  1240 |                 auto it = m_map.upper_bound((byte_t *)ptr);
2024-11-19T09:13:54.2418252Z #56 97.42       |                                                       ^
2024-11-19T09:13:54.2420526Z #56 97.42 /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-sycl/dpct/helper.hpp:1837:16: error: no member named 'dp4a' in namespace 'syclcompat'; did you mean simply 'dp4a'?
2024-11-19T09:13:54.2422486Z #56 97.42  1837 |         return syclcompat::dp4a(a, b, c);
2024-11-19T09:13:54.2423252Z #56 97.42       |                ^~~~~~~~~~~~~~~~
2024-11-19T09:13:54.2424107Z #56 97.42       |                dp4a
2024-11-19T09:13:54.2425546Z #56 97.42 /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-sycl/dpct/helper.hpp:1835:17: note: 'dp4a' declared here
2024-11-19T09:13:54.2427095Z #56 97.42  1835 |     inline auto dp4a(T1 a, T2 b, T3 c)

Any help would be appreciated, thank you! 🙏

slaren · 2024-11-19T17:37:55Z

You probably need to update the oneAPI etc SYCL libraries. See ggml-org/llama.cpp#10267 for more details.

mudler · 2024-11-19T17:42:15Z

You probably need to update the oneAPI etc SYCL libraries. See ggerganov/llama.cpp#10267 for more details.

Gonna check in a bit. Thank you @slaren!

mudler · 2024-11-20T13:30:18Z

that was it indeed! just for reference if anyone else is facing this, I had to update the Docker base image from intel/oneapi-basekit:2024.2.0-devel-ubuntu22.04 to intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04

ggerganov and others added 10 commits November 15, 2024 21:33

scripts : update sync llama.cpp

45168eb

ggml : build backends as libraries (llama/10256)

81a9449

* ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (llam…

434ec71

…a/9921) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

cmake : restore CMakeLists.txt (llama/10256)

cac76e4

ggml-ci

test-dup : minor fix

700de03

ggml-ci

sync : leftovers (#0)

ebef5de

ggml-ci

ggml : fix some build issues

9f06e82

sync : llama.cpp

ddf8ecc

ggml-ci

ggerganov merged commit 2dbdaaf into master Nov 15, 2024
10 checks passed

ggerganov deleted the sync branch November 15, 2024 20:51

ggerganov mentioned this pull request Nov 15, 2024

ggml : remove duplicated sources from the last sync #1017

Merged

mudler mentioned this pull request Nov 19, 2024

chore(deps): bump sycl intel image mudler/LocalAI#4201

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync : llama.cpp #1016

sync : llama.cpp #1016

Uh oh!

ggerganov commented Nov 15, 2024

Uh oh!

Uh oh!

slaren commented Nov 15, 2024

Uh oh!

mudler commented Nov 19, 2024 •

edited

Loading

Uh oh!

slaren commented Nov 19, 2024 •

edited

Loading

Uh oh!

mudler commented Nov 19, 2024

Uh oh!

mudler commented Nov 20, 2024 •

edited

Loading

Uh oh!

Uh oh!

sync : llama.cpp #1016

sync : llama.cpp #1016

Uh oh!

Conversation

ggerganov commented Nov 15, 2024

Uh oh!

Uh oh!

slaren commented Nov 15, 2024

Uh oh!

mudler commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mudler commented Nov 19, 2024

Uh oh!

mudler commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mudler commented Nov 19, 2024 •

edited

Loading

slaren commented Nov 19, 2024 •

edited

Loading

mudler commented Nov 20, 2024 •

edited

Loading