Skip to content

examples : add configuration presets #10932

Open
@ggerganov

Description

Description

I was recently looking for ways to demonstrate some of the functionality of the llama.cpp examples and some of the commands can become very cumbersome. For example, here is what I use for the llama.vim FIM server:

llama-server \
    -m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \
    --log-file ./service-vim.log \
    --host 0.0.0.0 --port 8012 \
    --ctx-size 0 \
    --cache-reuse 256 \
    -ub 1024 -b 1024 -ngl 99 -fa -dt 0.1

It would be much cleaner if I could just run, for example:

llama-server --cfg-fim-7b

Or if I could turn this embedding server command into something simpler:

# llama-server \
#     --hf-repo ggml-org/bert-base-uncased \
#     --hf-file          bert-base-uncased-Q8_0.gguf \
#     --port 8033 -c 512 --embeddings --pooling mean

llama-server --cfg-embd-bert --port 8033

Implementation

There is already an initial example of how we can create such configuration presets:

llama-tts --tts-oute-default -p "This is a TTS preset"

# equivalent to
# 
# llama-tts \
#    --hf-repo   OuteAI/OuteTTS-0.2-500M-GGUF \
#    --hf-file          OuteTTS-0.2-500M-Q8_0.gguf \
#    --hf-repo-v ggml-org/WavTokenizer \
#    --hf-file-v          WavTokenizer-Large-75-F16.gguf -p "This is a TTS preset"

https://github.com/ggerganov/llama.cpp/blob/5cd85b5e008de2ec398d6596e240187d627561e3/common/arg.cpp#L2208-L2220

This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.

Goal

The goal of this issue is to create such presets for various common tasks:

  • Run a basic TTS generation (see above)
  • Start a chat server with a commonly used model
  • Start a speculative-decoding-enabled chat server with a commonly used model
  • Start a FIM server for plugins such as llama.vim
  • Start an embedding server with a commonly used embedding model
  • Start a reranking server with a commonly used reranking model
  • And many more ..

The list of configuration presets would require curation and proper documentation.

I think this is a great task for new contributors to help and to get involved in the project.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions