[pull] master from ggml-org:master #201

pull · 2025-06-13T10:12:04Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

ggml-ci

* cmake: Simplify build-info.cpp generation The rebuild of build-info.cpp still gets triggered when .git/index gets changes. * cmake: generate build-info.cpp in build dir

Update oneMath commit to merged PR uxlfoundation/oneMath#669 which adds SYCL-Graph support for recording CUDA BLAS commands. With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph enabled. Prior to this change, an error would be thrown. ``` $ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2 UR CUDA ERROR: Value: 700 Name: CUDA_ERROR_ILLEGAL_ADDRESS Description: an illegal memory access was encountered Function: operator() Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154 Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN) Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator() SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code! in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598 $HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. ```

ggml-ci

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>

* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*

* batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci

* Update multimodal.md * Update multimodal.md

* batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display

* vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml-ci

* compare llama-bench: add option to plot * Address review comments: convert case + add type hints * Add matplotlib to requirements * fix tests * Improve comment and fix assert condition for test * Add back default test_name, add --plot_log_scale * use log_scale regardless of x_values

Currently when a model generates output which looks like a tool call, but is invalid an exception is thrown and not handled, causing the cli or llama-server to bail. Instead, handle the chat parser exception and simply return the generated text in such cases. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

* batch : verify multi-sequence input batches ggml-ci * cont : auto-gen positions + verify multi-seq input ggml-ci * cont : first print debug info, then perform validation ggml-ci * cont : fix position auto-gen + add comments ggml-ci

ggml-ci

Adds: * Dots1Model to convert_hf_to_gguf.py * Computation graph code to llama-model.cpp * Chat template to llama-chat.cpp to detect this model's template. --- The model is called "dots.llm1" (I decided to shorten it to dots1 or DOTS1 in the code generally) architecture. The only models that exist as of writing of this commit that follow this architecture are "dots.llm1.inst" and "dots.llm1.base" from here: * https://huggingface.co/rednote-hilab/dots.llm1.inst * https://huggingface.co/rednote-hilab/dots.llm1.base The model architecture is a combination of Qwen and Deepseek parts, as seen here: https://github.com/huggingface/transformers/blob/ffe12627b4e84489d2ab91dd0ec00614855edc79/src/transformers/models/dots1/modular_dots1.py

ggml-ci

…T_SIZE__ (#14183)

…nd port (#14180) Instead show something like this: main: server is listening on file.sock - starting the main loop Signed-off-by: Eric Curtin <ecurtin@redhat.com>

* Add Arcee AFM support * Add draft update code * Fix linter and update URL, may still not be final * Update src/llama-model.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Remote accidental blank line --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* ggml-cpu : rework weak alias on apple targets * fix powerpc detection * fix ppc detection * fix powerpc detection on darwin

This fixes the remaining crash in test-thread-safety on my system.

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>

* llama : rework embeddings logic ggml-ci * cont : fix rerank ggml-ci * cont : engrish [no ci] * cont : fix rerank ggml-ci * server : support both embeddings and completions with single model ggml-ci * cont : avoid embeddings_org ggml-ci

* convert neobert model to gguf * add inference graph * fix flake8 lint * followed reviewer suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * follow reviewers suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * override NeoBERT feed-forward length --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Remove install step for vulkan-shaders-gen * Add install step to normalize msvc with make * Regenerate modified shaders at build-time

* llama : add thread safety test * llamafile : remove global state * llama : better LLAMA_SPLIT_MODE_NONE logic when main_gpu < 0 GPU devices are not used --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server : fix incorrect usage of llama_get_embeddings() ggml-ci * cont : fix the fix ggml-ci

ggerganov and others added 8 commits June 13, 2025 08:03

vocab : prevent heap overflow when vocab is too small (#14145)

c33fe8b

ggml-ci

cmake : Improve build-info.cpp generation (#14156)

09cf2c7

* cmake: Simplify build-info.cpp generation The rebuild of build-info.cpp still gets triggered when .git/index gets changes. * cmake: generate build-info.cpp in build dir

sycl: Adding additional cpy dbg print output (#14034)

0889eba

server : fix SWA condition for full context reprocess (#14163)

ffad043

ggml-ci

pooling : make cls_b and cls_out_b optional (#14165)

d714dad

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>

cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)

cc8d081

* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*

readme : remove survey link (#14168)

b7cc774

pull bot added the ⤵️ pull label Jun 13, 2025

github-actions bot added examples server ggml SYCL build labels Jun 13, 2025

ggerganov and others added 2 commits June 13, 2025 13:47

batch : rework llama_batch_allocr (#14153)

60c6663

* batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci

docs : Update multimodal.md (#14122)

26ff368

* Update multimodal.md * Update multimodal.md

github-actions bot added the documentation Improvements or additions to documentation label Jun 13, 2025

ggerganov and others added 3 commits June 13, 2025 18:35

batch : add LLAMA_BATCH_DEBUG environment variable (#14172)

80709b7

* batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display

Merge commit from fork

3cfbbdb

* vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

sycl: fix docker image (#14144)

40643ed

github-actions bot added the devops label Jun 13, 2025

ggerganov and others added 2 commits June 13, 2025 20:03

vocab : fix build (#14175)

fb85a28

ggml-ci

github-actions bot added python script labels Jun 14, 2025

p1-0tr and others added 5 commits June 14, 2025 17:25

docs : remove WIP since PR has been merged (#13912)

00ba772

cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)

c311ac6

ggml-ci

ggerganov and others added 2 commits June 15, 2025 10:52

kv-cache : fix use-after-move of defrag info (#14189)

5fce5f9

ggml-ci

HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRON…

2c2caa4

…T_SIZE__ (#14183)

github-actions bot added the Nvidia GPU label Jun 15, 2025

IMbackK and others added 6 commits June 15, 2025 17:30

CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196)

e54b394

quantize : change int to unsigned int for KV overrides (#14197)

30e5b01

server : When listening on a unix domain socket don't print http:// a…

cd355ed

…nd port (#14180) Instead show something like this: main: server is listening on file.sock - starting the main loop Signed-off-by: Eric Curtin <ecurtin@redhat.com>

ggml-cpu : rework weak alias on apple targets (#14146)

3555b30

* ggml-cpu : rework weak alias on apple targets * fix powerpc detection * fix ppc detection * fix powerpc detection on darwin

vulkan: mutex around vkQueueSubmit (#14127)

c89c2d1

This fixes the remaining crash in test-thread-safety on my system.

github-actions bot added the Vulkan label Jun 16, 2025

huydt84 and others added 8 commits June 16, 2025 09:20

gguf-py : allow key override when adding value to GGUFWriter (#14194)

4ad2436

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>

convert : remove arcee change in convert_hf_to_gguf_update.py (#14207)

0bf49eb

ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206)

3ba0d84

llama : rework embeddings logic (#14208)

d3e64b9

* llama : rework embeddings logic ggml-ci * cont : fix rerank ggml-ci * cont : engrish [no ci] * cont : fix rerank ggml-ci * server : support both embeddings and completions with single model ggml-ci * cont : avoid embeddings_org ggml-ci

HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202)

7d6d91b

cmake: clean up external project logic for vulkan-shaders-gen (#14179)

0dbcabd

* Remove install step for vulkan-shaders-gen * Add install step to normalize msvc with make * Regenerate modified shaders at build-time

llama : add thread safety test (#14035)

6adc3c3

* llama : add thread safety test * llamafile : remove global state * llama : better LLAMA_SPLIT_MODE_NONE logic when main_gpu < 0 GPU devices are not used --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

github-actions bot added the testing label Jun 16, 2025

ggerganov and others added 2 commits June 16, 2025 22:33

server : fix incorrect usage of llama_get_embeddings() (#14225)

89fea80

* server : fix incorrect usage of llama_get_embeddings() ggml-ci * cont : fix the fix ggml-ci

common : suggest --jinja when autodetection fails (#14222)

e434e69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from ggml-org:master #201

[pull] master from ggml-org:master #201

Uh oh!

pull bot commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

[pull] master from ggml-org:master #201

Are you sure you want to change the base?

[pull] master from ggml-org:master #201

Uh oh!

Conversation

pull bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pull bot commented Jun 13, 2025 •

edited

Loading