Releases: simonw/llm-llama-cpp
Releases · simonw/llm-llama-cpp
0.3
New mechanism for running GGUF files directly, using llm -m gguf
. Example:
llm -m gguf \
-o path una-cybertron-7b-v2-bf16.Q8_0.gguf \
'Instruction: Five reasons to get a pet walrus
Response:'
This makes it much easier to try out new GGUF files, for example those released by TheBloke on Hugging Face. #26
0.2b1
max_tokens
now defaults to 4000, thanks Alexis Métaireau. #18- New
-o max_tokens 100
option for changing the max tokens setting. #20 - New
-o n_gpu_layers 10
option for increasing the number of GPU layers. Thanks, LoopControl. #19
0.2b0
- Support for new GGUF format model files. Thanks, Andrew Mshar. #16
- Output from this model now streams. Thanks, Michael Hamann. #11
- Support for compiling with METAL GPU acceleration on Apple Silicon. Thanks, vividfog. #14