Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4073
b4068
vulkan: Optimize contiguous copies (#10254) * tests: Fix memory bandwidth calculation for perf tests Add a flops calculation for flash attention. Add one GGML_OP_CPY perf test. * vulkan: Optimize contiguous copies Add a variant of the copy shader for when the tensors are contiguous. Avoid the complex addressing calculations, and do four elements per invocation to hide some other overhead. Apply similar changes to the scale shader, since scale is always contiguous. Add a "progress bar" for shader compiles.
b4067
vulkan: Throttle the number of shader compiles during the build step.…
b4066
metal : more precise Q*K in FA vec kernel (#10247)
b4062
vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10…
b4061
metal : reorder write loop in mul mat kernel + style (#10231) * metal : reorder write loop * metal : int -> short, style ggml-ci
b4058
llama : fix Qwen model type strings
b4056
ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…
b4055
ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for FP32 datatype. This change results in a consistent 90% improvement in input processing time, and 20% to 80% improvement in output processing time, across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
b4053
metal : opt-in compile flag for BF16 (#10218) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci