Server: possibility of customizable chat template?

## Motivation

While we already have [support for known chat templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template), it sometimes not enough for users who want to:
- Use their own fine tuned model
- Or, use a model that does not have Jinja template

The problem is that other implementations of chat template out there are also quite messy, for example:
- Jinja tempate: as discussed in https://github.com/ggerganov/llama.cpp/issues/4216 , it's too complicated to add a such parser into the code base of llama.cpp
- The format of [ollama](https://ollama.com/library/zephyr:latest/blobs/a96be15318d6) requires a parser, and it's not very flexible for future usages
- [LM Studio format](https://github.com/bunnywaffle/AI-Persona/blob/04d1238fb276d45e0a76a48752b3339d069f7a40/LM%20Studio/ai.preset.json#L25) does not requires parser, but lack support for multi roles (we currently have `system` - `user` - `assistant`, but technically it's possible to have custom roles like `database`, `function`, `search-engine`,...)

## Possible implementation

My idea is to have a simple JSON format that take into account all roles:

```
{
  "system": {
    "prefix": "<|system|>\n",
    "postfix": "<|end|>\n"
  },
  "user": {
    "prefix": "<|user|>\n",
    "postfix": "<|end|>\n"
  },
  "assistant": {
    "prefix": "<|assistant|>\n",
    "postfix": "<|end|>\n"
  },
  "_stop": ["<|end|>"],
  "_generation": "<|assistant|>\n",
}
```

User can specify the custom template via `--chat-template-file ./my_template.json`

The cpp code will be as simple as:

```cpp
std::string apply_custom_template(json messages, json tmpl) {
  std::stringstream ss;
  for (auto & msg : messages) {
    json t = tmpl[msg["role"]];
    ss << t["prefix"] << msg["content"] << t["postfix"];
  }
  ss << tmpl["_generation"]; // add generation prompt
  return ss.str();
}
```

NOTE: This function does not take into account models that does not support system prompt for now, but this function can be added in the future, maybe toggle via an attribute inside json `"system_inside_user_message": true`

Ref:
- https://github.com/ggerganov/llama.cpp/issues/5447
- https://github.com/ggerganov/llama.cpp/issues/5921
- https://github.com/ggerganov/llama.cpp/issues/5822
- https://github.com/ggerganov/llama.cpp/issues/5974

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server: possibility of customizable chat template? #5922

Motivation

Possible implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server: possibility of customizable chat template? #5922

Description

Motivation

Possible implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions