Skip to content

Releases: ggerganov/llama.cpp

b4103

16 Nov 23:32
4e54be0
Compare
Choose a tag to compare
llama/ex: remove --logdir argument (#10339)

b4102

16 Nov 21:21
Compare
Choose a tag to compare
llamafile : fix include path (#0)

ggml-ci

b4100

16 Nov 14:23
bcdb7a2
Compare
Choose a tag to compare
server: (web UI) Add samplers sequence customization (#10255)

* Samplers sequence: simplified and input field.

* Removed unused function

* Modify and use `settings-modal-short-input`

* rename "name" --> "label"

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

b4098

16 Nov 07:23
772703c
Compare
Choose a tag to compare
vulkan: Optimize some mat-vec mul quant shaders (#10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

b4095

16 Nov 01:58
89e4caa
Compare
Choose a tag to compare
llama : save number of parameters and the size in llama_model (#10286)

fixes #10285

b4094

15 Nov 22:50
74d73dc
Compare
Choose a tag to compare
Make updates to fix issues with clang-cl builds while using AVX512 fl…

b4092

15 Nov 22:45
Compare
Choose a tag to compare
ggml : fix some build issues

b4091

15 Nov 21:11
Compare
Choose a tag to compare
cmake : fix ppc64 check (whisper/0)

ggml-ci

b4088

15 Nov 21:06
1842922
Compare
Choose a tag to compare
AVX BF16 and single scale quant optimizations (#10212)

* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge

b4087

15 Nov 21:00
f0204a0
Compare
Choose a tag to compare
ci: build test musa with cmake (#10298)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>