Releases: teleprint-me/llama.cpp
Releases · teleprint-me/llama.cpp
b4033
b4020
ggml : move CPU backend to a separate file (#10144)
b3995
kompute: add backend registry / device interfaces (#10045) Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>
b3987
llama : Add IBM granite template (#10013) * Add granite template to llama.cpp * Add granite template to test-chat-template.cpp * Update src/llama.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Update tests/test-chat-template.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Added proper template and expected output * Small change to \n Small change to \n * Add code space & Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Fix spacing * Apply suggestions from code review * Update src/llama.cpp --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
b3982
sync : ggml
b3974
server : refactor slot input data, move tokenizer to HTTP thread (#10…
b3970
server : samplers accept the prompt correctly (#10019)
b3943
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745) * refactor llama_batch_get_one * adapt all examples * fix simple.cpp * fix llama_bench * fix * fix context shifting * free batch before return * use common_batch_add, reuse llama_batch in loop * null terminated seq_id list * fix save-load-state example * fix perplexity * correct token pos in llama_batch_allocr
b3934
readme : update bindings list (#9918) Co-authored-by: Tim Wang <tim.wang@ing.com>
b3922
llama : add infill sampler (#9896) ggml-ci