-
Notifications
You must be signed in to change notification settings - Fork 12k
main: allow preloading conversation with -p and add -st / --single-turn #12145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ngxson Ready for review. |
IMO this feature can be quite confused and at the same time, introduce too much complication to the already complex The problems are:
Basically, the code of |
And it still does.
The user surely expects the chat template to be used if they enabled --jinja or --chat-template.
I find this extremely useful as you can now do a one-shot prompt without having to paste a completely formatted template in your prompt (and if you really need to do that, it still works like that if you haven't explicitly enabled a chat template). In fact I found it extremely confusing that it didn't work this way when |
Versus what you said on the description:
Could you elaborate?
In the case of one-shot prompt, user also expect to use the built-in template inside the model. For example, a normal user will expect:
and
To be equivalent. There is no where in the document or in the Simple solution is as follow:
So go back the my example above:
and with
|
I see, so your objection is the Would the better solution be to warn users about the new behavior when automatically using chat template, and maybe add a raw prompt option? |
If I understand you correctly this would obsolete |
No, nothing is obsolete. -no-cnv == don't use chat template at all The -no-cnv mode never has support for chat template |
Yeah, I know, but what's the reason for keeping it like that? Conversation mode in itself never meant it was using a chat template, this is fairly recent. Even defaulting to conversation mode is new. To me it makes sense to (soft) deprecate -no/-cnv if you introduce |
For the technical reason: The -no-cnv and -cnv currently shares the same system under the hood. This is because historically, there was no -cnv mode, it was added with initial support for chatml template only. The support for other templates is added after that. So to me, the template system is tightly coupled to the -cnv mode, and it should always be the case. One big problem we're currently facing with chat templates is that after the introduction of jinja, sometimes it does not work correctly because turns are tracked at character level, not token level. I tried to address this by completely separate the -cnv into its own class: #11203 - but that's the idea, the PR is not merge-able, but at some point in the future we will have completely different code base for the -cnv mode. For the UX: So from your suggestion, I assume that these 2 commands are the same:
But then, why these 2 command below are NOT the same?
And
Then there is nowhere in the document or in the help message say that I must add |
Again, nothing is deprecated (at all). Many users are still using -no-cnv with their custom formatted prompts and/or custom template via suffix/prefix flags. In short, this is the equivalent of |
I feel like we are talking past each other here. :) My aim with this PR is to make one-shot and full conversation work the same (ie, use chat template), but in order to not break passing a raw (ie, fully preformatted) prompt I added a check. Now, since you suggested the The ultimate goal is to allow |
This seems to be a breaking change for me. AFAIU from what you're saying, if I already using this command:
Then after this change, I have to use
A such breaking change like this is not expected for many users. Also, could you add some concrete examples? I have very hard time understand what are proposing. |
Yes, it would be breaking if we go for this, but no more breaking than making conversation mode default. This PR as-is right now is non-breaking.
Ok, first let me outline how the PR as-is works right now... Works exactly like before:
Uses built in chat template:
Uses externally provided chat template:
Now, we can either go for this and continue to accept that -no-cnv behaves differently unless explicitly told to use chat template vs -cnv automatically selecting chat template, or we can accept status quo where --jinja and --chat-template params have no effect on -no-cnv, or we can do as follows... Adding --single-turn and --raw-prompt basically blends the concept of completion/preformatted/conversation into one, allowing you to do them all with consistent params, like so... One-shot sentence completion:
One-shot instruction with built in chat template:
One-shot preformatted instruction with no chat template:
Pre-started conversation mode with built in chat template:
All of the above need no warning and can be considered non-breaking. The following ones can be changed to work the new way by automatically moving prompt to raw_prompt and output a warning about the new params since we can check that -no-cnv was used... One-shot sentence completion:
One-shot preformatted instruction with no chat template:
|
At least, it's a breaking change that most people want to have - most users out there simply use Re. your first 3 commands:
It technically looks ok for me, but the problem is that without adequate documentations, no one will even know that this is possible. I think if we ever add this (aka merge the current PR), it will mostly to be used by you. Re. In my idea, your 4 commands will become: One-shot sentence completion:
One-shot instruction with built in chat template:
One-shot preformatted instruction with no chat template:
Pre-started conversation mode with built in chat template:
No breaking change, and yet still support what you need. Exactly what you proposed in the end. |
I agree. :)
My point exactly, once you introduce --single-turn it's no longer about conversation vs no-conversation, it's just a question of single-turn vs multi-turn and formatting.
You forgot -no-cnv here I guess?
Yep, except since -cnv is essentially a no-op and -no-cnv just means single turn raw prompt, my suggestion is to soft deprecate them for clarity of what they actually do, but this is all semantics, I'll change this PR to what you suggest and we can discuss this further afterwards. |
@ngxson Ready for re-review (test failure looks like server outage again). |
Re. -cnv and -no-cnv, I think it is important for us to step out of the implementation for a moment and consider that: the user may enable or disable conversation mode. At the moment the implementation dictates that the default is "-cnv" read from model, but the user should always be able to enable/disable that in the future. There may be a strange scenario where users need to force conversation mode on a base model or something! |
Yes I'm thinking about making
I'm thinking about throwing an error in this case, maybe smth like:
|
Sure, but that would either just be interactive mode or an external chat template. |
Yeah this all really evolved from the Alpaca/Llama2 days when there was no conversation mode at all and we had to specify the delimiters EOG, etc. directly! And then conversation mode was added without the Jinja parser. As far as I understand, the idea of keeping the "-cnv" flag was to allow the legacy template system to be forced and override the Jinja templates.
But this doesn't make much sense either as In my opinion we get sane defaults from the model, and allow user to override. It's fine to force conversation mode on a base prompt if the user wants to—just warn them. 🙂 |
@ochafik Heads up I'm going to merge this in 5 minutes unless you interject! Don't worry, I will handle the fallout if there's a problem. 😉 |
…rn (ggml-org#12145) * Add chat template formatting to -no-cnv * only enable prompt formatting if explicitly enabled * add -st / --single-turn * add --single-turn and -p in conversation mode * fix -sys + -p * reword warning * small readability change and fix (long) outdated example usage * only activate single turn in conversation mode
…rn (ggml-org#12145) * Add chat template formatting to -no-cnv * only enable prompt formatting if explicitly enabled * add -st / --single-turn * add --single-turn and -p in conversation mode * fix -sys + -p * reword warning * small readability change and fix (long) outdated example usage * only activate single turn in conversation mode
Implements preloading conversation mode with
-p
and adds-st
/--single-turn
that limit the conversation to a single turn for one-shot instructions.