Skip to content

Releases: ngxson/llama.cpp

b4073

13 Nov 12:43
1ee9eea
Compare
Choose a tag to compare
docs : update bindings list (#10261)

Signed-off-by: tianzixuan <tianzixuan335@hellobike.com>

b4068

13 Nov 08:31
80dd7ff
Compare
Choose a tag to compare
vulkan: Optimize contiguous copies (#10254)

* tests: Fix memory bandwidth calculation for perf tests

Add a flops calculation for flash attention.

Add one GGML_OP_CPY perf test.

* vulkan: Optimize contiguous copies

Add a variant of the copy shader for when the tensors are contiguous. Avoid
the complex addressing calculations, and do four elements per invocation
to hide some other overhead.

Apply similar changes to the scale shader, since scale is always contiguous.

Add a "progress bar" for shader compiles.

b4067

11 Nov 18:42
54ef9cf
Compare
Choose a tag to compare
vulkan: Throttle the number of shader compiles during the build step.…

b4066

11 Nov 08:41
b0cefea
Compare
Choose a tag to compare
metal : more precise Q*K in FA vec kernel (#10247)

b4062

10 Nov 13:40
160687b
Compare
Choose a tag to compare
vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10…

b4061

09 Nov 11:25
6423c65
Compare
Choose a tag to compare
metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

b4058

09 Nov 10:25
f018acb
Compare
Choose a tag to compare
llama : fix Qwen model type strings

b4056

09 Nov 09:21
5b359bb
Compare
Choose a tag to compare
ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…

b4055

09 Nov 08:55
e892134
Compare
Choose a tag to compare
ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

b4053

08 Nov 21:48
ec450d3
Compare
Choose a tag to compare
metal : opt-in compile flag for BF16 (#10218)

* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci