Releases · ngxson/llama.cpp

13 Nov 12:43

1ee9eea

b4073 Latest

Latest

docs : update bindings list (#10261)

Signed-off-by: tianzixuan <tianzixuan335@hellobike.com>

Assets 22

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-11-13T12:43:39Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-11-13T12:43:49Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-11-13T12:44:01Z
llama-b1-bin-win-hip-x64-gfx1100.zip

238 MB 2024-11-13T12:44:09Z
llama-b1-bin-win-hip-x64-gfx1101.zip

238 MB 2024-11-13T12:44:18Z
llama-b4073-bin-macos-arm64.zip

52.5 MB 2024-11-13T12:44:27Z
llama-b4073-bin-macos-x64.zip

53.7 MB 2024-11-13T12:44:29Z
llama-b4073-bin-ubuntu-x64.zip

56.9 MB 2024-11-13T12:44:32Z
llama-b4073-bin-win-avx-x64.zip

8.12 MB 2024-11-13T12:44:34Z
llama-b4073-bin-win-avx2-x64.zip

8.12 MB 2024-11-13T12:44:35Z
Source code (zip)

2024-11-13T11:17:10Z
Source code (tar.gz)

2024-11-13T11:17:10Z

13 Nov 08:31

github-actions

b4068

80dd7ff

b4068

vulkan: Optimize contiguous copies (#10254)

* tests: Fix memory bandwidth calculation for perf tests

Add a flops calculation for flash attention.

Add one GGML_OP_CPY perf test.

* vulkan: Optimize contiguous copies

Add a variant of the copy shader for when the tensors are contiguous. Avoid
the complex addressing calculations, and do four elements per invocation
to hide some other overhead.

Apply similar changes to the scale shader, since scale is always contiguous.

Add a "progress bar" for shader compiles.

Assets 22

11 Nov 18:42

github-actions

b4067

54ef9cf

b4067

vulkan: Throttle the number of shader compiles during the build step.…

Assets 22

11 Nov 08:41

github-actions

b4066

b0cefea

b4066

metal : more precise Q*K in FA vec kernel (#10247)

Assets 22

10 Nov 13:40

github-actions

b4062

160687b

b4062

vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10…

Assets 22

09 Nov 11:25

github-actions

b4061

6423c65

b4061

metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

Assets 22

09 Nov 10:25

github-actions

b4058

f018acb

b4058

llama : fix Qwen model type strings

Assets 22

09 Nov 09:21

github-actions

b4056

5b359bb

b4056

ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…

Assets 22

09 Nov 08:55

github-actions

b4055

e892134

b4055

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

Assets 22

08 Nov 21:48

github-actions

b4053

ec450d3

b4053

metal : opt-in compile flag for BF16 (#10218)

* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b4073

b4068

b4067

b4066

b4062

b4061

b4058

b4056

b4055

b4053