Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b2463
common : disable repeat penalties by default (#6127)
b2462
ci : exempt some labels from being tagged as stale (#6140)
b2461
common : print usage on '-h' and '--help' (#6145)
b2459
mpt : implement backwards compatiblity with duped output tensor (#6139)
b2458
clip : fix memory leak (#6138)
b2457
backend : set max split inputs to GGML_MAX_SRC (#6137)
b2456
ci : disable stale issue messages (#6126)
b2455
ci : temporary disable sanitizer builds (#6128)
b2454
backend : offload large batches to GPU (#6083) * backend : offload large batches to GPU * fix hip * code cleanup * fix CUDA split buffers * Update ggml-backend-impl.h Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cuda : fix memset without set_device * imatrix : remove sched affix from weight names * sched : add a new split if the current one has too many inputs reduce max inputs per split more cleanup * update backends ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
b2453
common : tidy-up argument parsing (#6105) * Tidy-up argument parsing. * Missing ref. * common : minor * common : add static classifier --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>