Tags: Buzzoola/llama.cpp
Tags
Force FP32 compute in GLM4 FFN Down (ggml-org#13101) * Force FP32 compute in cuBLAS GEMM * Revert "Force FP32 compute in cuBLAS GEMM" This reverts commit 6efd872. * Force F32 compute in GLM4 ffn down * Edit comment to clarify issue Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
clip : fix pixtral on some GPU backends (ggml-org#13097) * clip : fix pixtral on some GPU backends * refactor inp_raw set * rm outdated comment * fix dynamic size * add TODO
change the reorder tensor from init to execute OP (ggml-org#13003)
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (ggml-… …org#12943) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.
clip : remove boi/eoi embeddings for GLM-edge model (ggml-org#13081)
CUDA: use switch statements in constexpr functions (ggml-org#13095)
cmake : do not include ./src as public for libllama (ggml-org#13062) * cmake : do not include ./src as public for libllama ggml-ci * cmake : rework tests ggml-ci * llguidance : remove unicode include ggml-ci * cmake : make c++17 private ggml-ci
arg : add --no-mmproj-offload (ggml-org#13093) * arg : add --no-mmproj-offload * Update common/arg.cpp
PreviousNext