Closed
Description
Motivation
While we already have support for known chat templates, it sometimes not enough for users who want to:
- Use their own fine tuned model
- Or, use a model that does not have Jinja template
The problem is that other implementations of chat template out there are also quite messy, for example:
- Jinja tempate: as discussed in server : improvements and maintenance #4216 , it's too complicated to add a such parser into the code base of llama.cpp
- The format of ollama requires a parser, and it's not very flexible for future usages
- LM Studio format does not requires parser, but lack support for multi roles (we currently have
system
-user
-assistant
, but technically it's possible to have custom roles likedatabase
,function
,search-engine
,...)
Possible implementation
My idea is to have a simple JSON format that take into account all roles:
{
"system": {
"prefix": "<|system|>\n",
"postfix": "<|end|>\n"
},
"user": {
"prefix": "<|user|>\n",
"postfix": "<|end|>\n"
},
"assistant": {
"prefix": "<|assistant|>\n",
"postfix": "<|end|>\n"
},
"_stop": ["<|end|>"],
"_generation": "<|assistant|>\n",
}
User can specify the custom template via --chat-template-file ./my_template.json
The cpp code will be as simple as:
std::string apply_custom_template(json messages, json tmpl) {
std::stringstream ss;
for (auto & msg : messages) {
json t = tmpl[msg["role"]];
ss << t["prefix"] << msg["content"] << t["postfix"];
}
ss << tmpl["_generation"]; // add generation prompt
return ss.str();
}
NOTE: This function does not take into account models that does not support system prompt for now, but this function can be added in the future, maybe toggle via an attribute inside json "system_inside_user_message": true
Ref: