examples : add configuration presets

## Description

I was recently looking for ways to demonstrate some of the functionality of the `llama.cpp` examples and some of the commands can become very cumbersome. For example, here is what I use for the `llama.vim` FIM server:

```bash
llama-server \
    -m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \
    --log-file ./service-vim.log \
    --host 0.0.0.0 --port 8012 \
    --ctx-size 0 \
    --cache-reuse 256 \
    -ub 1024 -b 1024 -ngl 99 -fa -dt 0.1
```

It would be much cleaner if I could just run, for example:

```bash
llama-server --cfg-fim-7b
```

Or if I could turn this embedding server command into something simpler:

```bash
# llama-server \
#     --hf-repo ggml-org/bert-base-uncased \
#     --hf-file          bert-base-uncased-Q8_0.gguf \
#     --port 8033 -c 512 --embeddings --pooling mean

llama-server --cfg-embd-bert --port 8033
```

## Implementation

There is already an initial example of how we can create such configuration presets:

```bash
llama-tts --tts-oute-default -p "This is a TTS preset"

# equivalent to
# 
# llama-tts \
#    --hf-repo   OuteAI/OuteTTS-0.2-500M-GGUF \
#    --hf-file          OuteTTS-0.2-500M-Q8_0.gguf \
#    --hf-repo-v ggml-org/WavTokenizer \
#    --hf-file-v          WavTokenizer-Large-75-F16.gguf -p "This is a TTS preset"
```

<details>

https://github.com/ggerganov/llama.cpp/blob/5cd85b5e008de2ec398d6596e240187d627561e3/common/arg.cpp#L2208-L2220

</details>

This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.

## Goal

The goal of this issue is to create such presets for various common tasks:

- [x] Run a basic TTS generation (see above)
- [ ] Start a chat server with a commonly used model
- [ ] Start a speculative-decoding-enabled chat server with a commonly used model
- [ ] Start a FIM server for plugins such as `llama.vim`
- [x] Start an embedding server with a commonly used embedding model
- [ ] Start a reranking server with a commonly used reranking model
- And many more ..

The list of configuration presets would require curation and proper documentation.

I think this is a great task for new contributors to help and to get involved in the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples : add configuration presets #10932

Description

Implementation

Goal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

examples : add configuration presets #10932

Description

Description

Implementation

Goal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions