Description
Description
I was recently looking for ways to demonstrate some of the functionality of the llama.cpp
examples and some of the commands can become very cumbersome. For example, here is what I use for the llama.vim
FIM server:
llama-server \
-m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \
--log-file ./service-vim.log \
--host 0.0.0.0 --port 8012 \
--ctx-size 0 \
--cache-reuse 256 \
-ub 1024 -b 1024 -ngl 99 -fa -dt 0.1
It would be much cleaner if I could just run, for example:
llama-server --cfg-fim-7b
Or if I could turn this embedding server command into something simpler:
# llama-server \
# --hf-repo ggml-org/bert-base-uncased \
# --hf-file bert-base-uncased-Q8_0.gguf \
# --port 8033 -c 512 --embeddings --pooling mean
llama-server --cfg-embd-bert --port 8033
Implementation
There is already an initial example of how we can create such configuration presets:
llama-tts --tts-oute-default -p "This is a TTS preset"
# equivalent to
#
# llama-tts \
# --hf-repo OuteAI/OuteTTS-0.2-500M-GGUF \
# --hf-file OuteTTS-0.2-500M-Q8_0.gguf \
# --hf-repo-v ggml-org/WavTokenizer \
# --hf-file-v WavTokenizer-Large-75-F16.gguf -p "This is a TTS preset"
This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.
Goal
The goal of this issue is to create such presets for various common tasks:
- Run a basic TTS generation (see above)
- Start a chat server with a commonly used model
- Start a speculative-decoding-enabled chat server with a commonly used model
- Start a FIM server for plugins such as
llama.vim
- Start an embedding server with a commonly used embedding model
- Start a reranking server with a commonly used reranking model
- And many more ..
The list of configuration presets would require curation and proper documentation.
I think this is a great task for new contributors to help and to get involved in the project.
Activity