Closed
Description
Motivation
Add described in #5447 , we can add an equivalent of huggingface's apply_chat_template()
that use simple heuristic checks to format the chat into string. In other word, there is no jinja parser being used in our implementation.
Docs for hf's apply_chat_template: https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.apply_chat_template
Supported templated
This section is moved to wiki: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
Initial proposal for llama_chat_apply_template (outdated)
// used in chat template
typedef struct llama_chat_message {
char * role; // NOTE: chatml actually allow roles other than system, user and assistant. therefore, no enum here
char * content;
} llama_chat_message;
/// @details Apply chat template and maybe tokenize it. Inspired by hf apply_chat_template() on python.
/// @param conversation a list of multiple llama_chat_message
/// @param template A Jinja template to use for this conversion. If this is nullptr, the model’s default chat template will be used instead.
/// @param tokenize Whether to tokenize the output. If False, the output will be a string.
/// @param add_generation_prompt Whether to end the prompt with the token(s) that indicate the start of an assistant message.
/// @return If "tokenize" is set to false, the "buf" must be a string (returned value will be the string length).
/// Otherwise, "buf" must be a list of tokens (returned value will be the number of tokens).
LLAMA_API int32_t llama_apply_chat_template(
const struct llama_model * model,
llama_chat_message * conversation,
size_t message_count,
char * template,
bool tokenize,
bool add_generation_prompt,
char * buf,
int32_t length);