Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4027
cuda : clear error after changing peer access (#10153)
b4024
CANN: adjust backend registry refactor. (#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
b4023
sync : ggml
b4020
ggml : move CPU backend to a separate file (#10144)
b4019
metal : minor fixup in FA kernel (#10143) * metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var
b4016
server : fix slot selection by lru (#10126) * server : fix slot selection by lru, migrate lcs to `size_t` * minor debug log fix
b4014
llama : adjust default context size + print warnings (#10136) * llama : adjust default context size + print warnings ggml-ci * ggml-ci : add missing gpu-layers + adjust context sizes
b4013
simple-chat : only add bos on first prompt (#10129)
b4011
llama : add simple-chat example (#10124) * llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
b4009
vulkan : improve ggml_vk_create_buffer error handling (#9898)