koboldcpp-1.57.1
koboldcpp-1.57.1
- Added a benchmarking feature with
--benchmark
, which automatically runs a benchmark with your provided settings, outputting run parameters, timing and speed information as well as testing for coherence, and exiting on completion. You can provide a filename e.g.--benchmark result.csv
and it will write CSV formatted data appended to that file. - Added temperature Quad-Sampling (set via API with parameter
smoothing_factor
) PR from @AAbushady, (credits @kalomaze). - Improved timing displays. Also, displays the seed used, and also shows llama.cpp styled timings when run in
--debugmode
. The timings will appear faster as they do not include overheads, measuring only specific eval functions. - Improved abort generation behavior (allows second user aborting while in queue)
- Vulkan enhancements from @0cc4m merged: APU memory handling and multigpu. To use multigpu, you can now specify additional IDs, for example
--usevulkan 0 2 3
which will use GPUs with IDs0
,2
, and3
. Allocation is determined by--tensor_split
. Multigpu for Vulkan is currently configurable via commandline only, the GUI launcher does not allow selecting multiple devices for Vulkan. - Various improvements and bugfixes merged from upstream.
- Updated Kobold Lite with many improvements and new features:
- NEW: The Aesthetic UI is now available for Story and Adventure modes as well!
- Added "AI Impersonate" feature for Instruct mode.
- Smoothing factor added, can be configured in dynamic temperature panel.
- Added a toggle to enable printable view (unlock vertical scrolling).
- Added a toggle to inject timestamps, allowing the AI to be aware of time passing.
- Persist API info for A1111 and XTTS, allows specifying custom negative prompts for image gen, allows specifying custom horde keys in KCPP mode.
- Fixes for XTTS to handle devices with over 100 voices, and also adds an option to narrate dialogue only.
- Toggle to request A1111 backend to save generated images to disk.
- Fix for chub.ai card fetching.
Hotfix1.57.1: Fixed some crashes and fixed multigpu for vulkan.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.