-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor chat template API #6822
base: master
Are you sure you want to change the base?
Conversation
@teleprint-me @hanishkvc I finally made this to work. The code is not super clean, but I pay more attention to the API design as it's what we need to bring chat template support to |
Before moving further, @ggerganov could you please take a look on the API design to see if that's OK for you? Thanks. |
The API seems OK. If you think this is the right way, let's do it |
Given the context and circumstances, I think it's a start. I can absolutely see this getting out of control, though, as I've previously stated, if not done with caution and forethought. This is going to be challenging to manage in the long term simply because there is no way to know or predict what templates will arise, or become preferred, over time. Overall, I think it's okay as well. It obviously needs work, as that's how all things start. Hopefully we can identify a pattern and then determine how to smooth things out over time. It's better than nothing for now. |
I agree this can get easily over-engineered. I don't have capacity atm to think deeply into this, so we should try to take into account feedback from people using chat templates and at the same time don't try to support all sorts of edge cases that one can think of. Just aim for the stuff that is used most of the time and makes sense. And try to keep the API and implementation separated from the rest of the functionality as much as possible so that it can be easily adapted / replaced in the future if necessary |
@ngxson do have a look at the new PR which I have uploaded, it uses a simple json to load the expected/supported handshake-template as well as flag to control whether any BoS is prefixed when a user message immidiately follows the system message. Inturn the chat-template-apply which I have added in common/chaton.hpp, handles the same to try provide the required flow in a simple generic way. Also the json which should work wrt some of the models is in examples/chaton_meta.json NOTE: Among these, the 1 or 2 model which requires avoiding special tags between system message and 1st user message seems to treat BoS + RoleTagPrefix as a single bunch and expect both to be treated the same way. However some other models may require BoS to be handled specially while RoleTagPrefix to be handled the same always, for that in my logic I will have to add a seperate Begin/BoS entry other than the Prefix entry. and inturn do that selective inserting for Begin. |
Have updated my PR, with support for seperate Begin(BoS) and Prefix (RoleIdTag) wrt User role. And in the json one can individually control whether either of them get prepended to 1st user message following the system message. Monarch seems to need it from your server related chat-apply-template, and the same is supported now. Looking at the entries wrt Llama2, Monarch and Llama3, one can see how to configure the entries in the json file, to achive the 3 different possibilities wrt these 3 models. |
@teleprint-me @ggerganov Thanks for your feedback. I understand that this part can get complicated easily in the future, so these things are being considered when I made this proposal:
The only downside is that now the code is no longer linear. That means adding new template now requires a bit of "brain gym" to convert from jinja to prefix/postfix. Still, it is better than tricking Edit: Please also note that though multiple issue on the subject of chat templates, I've seen many proposals related to having postfix/prefix based on role. This PR will be the first one to bring that idea into the core API. |
@ggerganov @phymbert I got a weird issue on the CI workflow where the |
I'm changing this PR to "demo" since I'm still not very confident to make the chat template system become more complicated. Maybe we will re-visit this in the future. This PR is mostly useful for adding chat templates to |
Based on the discussion from #6391 (comment)
The main idea is to get prefix/postfix of each message based on the role. For example: Chatml uses
"<|im_start|>" + role + "\n"
as prefix (role
is dynamic based on current message);<|im_end|>\n
is the postfix.These things are being considered when I made this proposal:
llama_chat_get_typed_template
. No arbitrary templates are allowed (user must write their own logic if they want)Introducing an
enum llama_chat_template
for templates and a family of functions: