Releases · ggerganov/llama.cpp

16 Nov 23:32

4e54be0

b4103

llama/ex: remove --logdir argument (#10339)

Assets 21

16 Nov 21:21

github-actions

b4102

db4cfd5

b4102

llamafile : fix include path (#0)

ggml-ci

Assets 21

16 Nov 14:23

github-actions

b4100

bcdb7a2

b4100

server: (web UI) Add samplers sequence customization (#10255)

* Samplers sequence: simplified and input field.

* Removed unused function

* Modify and use `settings-modal-short-input`

* rename "name" --> "label"

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

Assets 21

16 Nov 07:23

github-actions

b4098

772703c

b4098

vulkan: Optimize some mat-vec mul quant shaders (#10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

Assets 21

16 Nov 01:58

github-actions

b4095

89e4caa

b4095

llama : save number of parameters and the size in llama_model (#10286)

fixes #10285

Assets 21

15 Nov 22:50

github-actions

b4094

74d73dc

b4094

Make updates to fix issues with clang-cl builds while using AVX512 fl…

Assets 21

15 Nov 22:45

github-actions

b4092

883d206

b4092

ggml : fix some build issues

Assets 21

15 Nov 21:11

github-actions

b4091

09ecbcb

b4091

cmake : fix ppc64 check (whisper/0)

ggml-ci

Assets 21

15 Nov 21:06

github-actions

b4088

1842922

b4088

AVX BF16 and single scale quant optimizations (#10212)

* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge

Assets 21

15 Nov 21:00

github-actions

b4087

f0204a0

b4087

ci: build test musa with cmake (#10298)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4103

b4102

b4100

b4098

b4095

b4094

b4092

b4091

b4088

b4087