Skip to content

Tags: Buzzoola/llama.cpp

Tags

b5190

Toggle b5190's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Force FP32 compute in GLM4 FFN Down (ggml-org#13101)

* Force FP32 compute in cuBLAS GEMM

* Revert "Force FP32 compute in cuBLAS GEMM"

This reverts commit 6efd872.

* Force F32 compute in GLM4 ffn down

* Edit comment to clarify issue

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

b5189

Toggle b5189's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
clip : fix pixtral on some GPU backends (ggml-org#13097)

* clip : fix pixtral on some GPU backends

* refactor inp_raw set

* rm outdated comment

* fix dynamic size

* add TODO

b5188

Toggle b5188's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
change the reorder tensor from init to execute OP (ggml-org#13003)

b5187

Toggle b5187's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (ggml-…

…org#12943)

RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.

The performance impact of this change depends on the network latency.

b5186

Toggle b5186's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
clip : remove boi/eoi embeddings for GLM-edge model (ggml-org#13081)

b5185

Toggle b5185's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
embeddings : fix batch sizes (ggml-org#13076)

ggml-ci

b5184

Toggle b5184's commit message
ggml : fix trailing whitespaces (#0)

b5181

Toggle b5181's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: use switch statements in constexpr functions (ggml-org#13095)

b5180

Toggle b5180's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake : do not include ./src as public for libllama (ggml-org#13062)

* cmake : do not include ./src as public for libllama

ggml-ci

* cmake : rework tests

ggml-ci

* llguidance : remove unicode include

ggml-ci

* cmake : make c++17 private

ggml-ci

b5178

Toggle b5178's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
arg : add --no-mmproj-offload (ggml-org#13093)

* arg : add --no-mmproj-offload

* Update common/arg.cpp