main: allow preloading conversation with -p and add -st / --single-turn #12145

CISC · 2025-03-02T15:16:57Z

Implements preloading conversation mode with -p and adds -st/--single-turn that limit the conversation to a single turn for one-shot instructions.

CISC · 2025-03-02T15:57:30Z

@ngxson Ready for review.

ngxson · 2025-03-02T16:17:01Z

IMO this feature can be quite confused and at the same time, introduce too much complication to the already complex main.cpp source code.

The problems are:

Traditionally, -no-cnv means there is no chat template at all. Now adding this, and we break that implication.
How can user know that this is a thing? We don't document it anywhere, no example, etc.
I'm not sure if anyone even needs this. Basically, most simple use case would be someone what to test the model by doing a single turn chat. The problem is, now user need to know the chat template to be used in advance. This is not a good UX.

Basically, the code of main.cpp has grown too much that I think we should stop having implicit behaviors like this. Things should be straight-forward, and most simple way is to have dedicated flags/args to de specific things, rather than combine some of them.

CISC · 2025-03-02T16:27:09Z

1. Traditionally, `-no-cnv` means there is no chat template at all. Now adding this, and we break that implication.

And it still does.

2. How can user know that this is a thing? We don't document it anywhere, no example, etc.

The user surely expects the chat template to be used if they enabled --jinja or --chat-template.

3. I'm not sure if anyone even needs this. Basically, most simple use case would be someone what to test the model by doing a single turn chat. The problem is, now user need to know the chat template to be used in advance. This is not a good UX.

I find this extremely useful as you can now do a one-shot prompt without having to paste a completely formatted template in your prompt (and if you really need to do that, it still works like that if you haven't explicitly enabled a chat template).

In fact I found it extremely confusing that it didn't work this way when --jinja was introduced and would go so far as to say the status quo is bad UX. :)

ngxson · 2025-03-02T17:05:37Z

1. Traditionally, `-no-cnv` means there is no chat template at all. Now adding this, and we break that implication.

And it still does.

Versus what you said on the description:

Adds chat template formatting to -no-cnv

Could you elaborate?

The user surely expects the chat template to be used if they enabled --jinja or --chat-template.

In the case of one-shot prompt, user also expect to use the built-in template inside the model.

For example, a normal user will expect:

llama-cli -sys "You are Bob"
> who are you?

and

llama-cli -sys "You are Bob" -p "who are you?"

To be equivalent. There is no where in the document or in the --help message indicate that I must add --jinja or --chat-template, so I don't even know! That's what make it confused.

Simple solution is as follow:

If -p is provided in conversation mode, we use the provided prompt to pre-fill the first user message in the conversation. This is kinda equivalent to ?q=... parameter on chatgpt where it pre-fill the provided param into the input field and send it right away.
We add another flag called -st, --single-turn. If is set, we just exit after the first message.

So go back the my example above:

llama-cli -sys "You are Bob" -p "who are you?"
> who are you?
I am Bob
> ... (I can still ask more questions here)

and with --single-turn

llama-cli -sys "You are Bob" -p "who are you?" --single-turn
> who are you?
I am Bob
(exit with code 0)

CISC · 2025-03-02T17:55:59Z

I see, so your objection is the explicit_chat_template check I added? I too would prefer it just to be enabled by default, but since that would break old behavior (and remove the ability to send a fully formatted prompt) I added the check.

Would the better solution be to warn users about the new behavior when automatically using chat template, and maybe add a raw prompt option?

CISC · 2025-03-02T18:53:18Z

* We add another flag called `-st, --single-turn`. If is set, we just exit after the first message.

If I understand you correctly this would obsolete -no/-cnv params, -st would behave like -no-cnv if -p was set, and a single turn in conversation mode if not. Another PR? :)

ngxson · 2025-03-02T19:00:26Z

No, nothing is obsolete.

-no-cnv == don't use chat template at all
--single-turn == conversation mode (-cnv) with single turn

The -no-cnv mode never has support for chat template

CISC · 2025-03-02T19:18:06Z

The -no-cnv mode never has support for chat template

Yeah, I know, but what's the reason for keeping it like that? Conversation mode in itself never meant it was using a chat template, this is fairly recent. Even defaulting to conversation mode is new. To me it makes sense to (soft) deprecate -no/-cnv if you introduce --single-turn.

ngxson · 2025-03-03T09:28:12Z

Yeah, I know, but what's the reason for keeping it like that?

For the technical reason:

The -no-cnv and -cnv currently shares the same system under the hood. This is because historically, there was no -cnv mode, it was added with initial support for chatml template only. The support for other templates is added after that.

So to me, the template system is tightly coupled to the -cnv mode, and it should always be the case.

One big problem we're currently facing with chat templates is that after the introduction of jinja, sometimes it does not work correctly because turns are tracked at character level, not token level. I tried to address this by completely separate the -cnv into its own class: #11203 - but that's the idea, the PR is not merge-able, but at some point in the future we will have completely different code base for the -cnv mode.

For the UX:

So from your suggestion, I assume that these 2 commands are the same:

llama-cli -m ... --jinja
> who are you?

llama-cli -m ... -p "who are you?" -no-cnv --jinja

But then, why these 2 command below are NOT the same?

llama-cli -m ...
> who are you?

And

llama-cli -m ... -p "who are you?" -no-cnv

Then there is nowhere in the document or in the help message say that I must add --chat-template or --jinja to make this work. And even if it's mentioned somewhere in the documentation, it's not obvious for end-user.

ngxson · 2025-03-03T09:34:03Z

To me it makes sense to (soft) deprecate -no/-cnv if you introduce --single-turn.

Again, nothing is deprecated (at all).

Many users are still using -no-cnv with their custom formatted prompts and/or custom template via suffix/prefix flags. In short, this is the equivalent of /completions endpoint on server, and the -cnv mode is /chat/completions

CISC · 2025-03-03T12:44:35Z

I feel like we are talking past each other here. :)

My aim with this PR is to make one-shot and full conversation work the same (ie, use chat template), but in order to not break passing a raw (ie, fully preformatted) prompt I added a check.

Now, since you suggested the --single-turn param, my suggestion is that we can do away with -no/-cnv completely as --single-turn -p "..." then would equal -no-cnv -p "...". Adding f.ex. --raw-prompt would solve the need for passing a preformatted prompt.

The ultimate goal is to allow -p in conversation mode as well, but it's a little more complicated and I didn't want to clutter this PR with it. When that is done everything will be nice and consistent with no weird undocumented effects depending on the combination of parameters.

ngxson · 2025-03-03T13:48:29Z

Now, since you suggested the --single-turn param, my suggestion is that we can do away with -no/-cnv completely as --single-turn -p "..." then would equal -no-cnv -p "...". Adding f.ex. --raw-prompt would solve the need for passing a preformatted prompt.

This seems to be a breaking change for me. AFAIU from what you're saying, if I already using this command:

llama-cli -m ... -p "The meaning of life is" -no-cnv

Then after this change, I have to use --raw-prompt

llama-cli -m ... -p "The meaning of life is" -no-cnv --raw-prompt

A such breaking change like this is not expected for many users.

Also, could you add some concrete examples? I have very hard time understand what are proposing.

CISC · 2025-03-03T14:36:08Z

This seems to be a breaking change for me.

Yes, it would be breaking if we go for this, but no more breaking than making conversation mode default.

This PR as-is right now is non-breaking.

AFAIU from what you're saying, if I already using this command:
[...]
A such breaking change like this is not expected for many users.

~~Certainly, but in order to fix all the inconsistencies something has to get broken.~~
Strike that, I suggest just automatically adjusting params instead, see the last few examples.

Also, could you add some concrete examples? I have very hard time understand what are proposing.

Ok, first let me outline how the PR as-is works right now...

Works exactly like before:

llama-cli -m ... -p "The meaning of life is" -no-cnv

Uses built in chat template:

llama-cli -m ... -p "What is the meaning of life?" -no-cnv --jinja

Uses externally provided chat template:

llama-cli -m ... -p "What is the meaning of life?" -no-cnv --chat-template-file ...

Now, we can either go for this and continue to accept that -no-cnv behaves differently unless explicitly told to use chat template vs -cnv automatically selecting chat template, or we can accept status quo where --jinja and --chat-template params have no effect on -no-cnv, or we can do as follows...

Adding --single-turn and --raw-prompt basically blends the concept of completion/preformatted/conversation into one, allowing you to do them all with consistent params, like so...

One-shot sentence completion:

llama-cli -m ... --raw-prompt "The meaning of life is" --single-turn

One-shot instruction with built in chat template:

llama-cli -m ... -p "What is the meaning of life?" --single-turn

One-shot preformatted instruction with no chat template:

llama-cli -m ... --raw-prompt "<|im_start|>user\nWhat is the meaning of life?<|im_end|>\n<|im_start|>assistant\n" --single-turn

Pre-started conversation mode with built in chat template:

llama-cli -m ... -p "What is the meaning of life?"

All of the above need no warning and can be considered non-breaking. The following ones can be changed to work the new way by automatically moving prompt to raw_prompt and output a warning about the new params since we can check that -no-cnv was used...

One-shot sentence completion:

llama-cli -m ... -p "The meaning of life is" -no-cnv

One-shot preformatted instruction with no chat template:

llama-cli -m ... -p "<|im_start|>user\nWhat is the meaning of life?<|im_end|>\n<|im_start|>assistant\n" -no-cnv

ngxson · 2025-03-03T15:36:13Z

no more breaking than making conversation mode default.

At least, it's a breaking change that most people want to have - most users out there simply use llama-cli for conversation, so they expect it to behaves like ollama run command (at least, I heard many users asking for this change)

Re. your first 3 commands:

llama-cli -m ... -p "What is the meaning of life?" -no-cnv [--jinja] [--chat-template]

It technically looks ok for me, but the problem is that without adequate documentations, no one will even know that this is possible. I think if we ever add this (aka merge the current PR), it will mostly to be used by you.

Re. --raw-prompt, it's now even more confused to me, I can't see how it different from -no-cnv. My suggestion for --single-turn is restricted to -cnv only. It cannot be used in -no-cnv mode. Something like -no-cnv --single-turn is invalid and will be ignored.

In my idea, your 4 commands will become:

One-shot sentence completion:

llama-cli -m ... -p "The meaning of life is" -no-cnv

One-shot instruction with built in chat template:

llama-cli -m ... -p "What is the meaning of life?" [-cnv] --single-turn

One-shot preformatted instruction with no chat template:

llama-cli -m ... -p "<|im_start|>user\nWhat is the meaning of life?<|im_end|>\n<|im_start|>assistant\n"

Pre-started conversation mode with built in chat template:

llama-cli -m ... -p "What is the meaning of life?" [-cnv]

No breaking change, and yet still support what you need. Exactly what you proposed in the end.

CISC · 2025-03-03T16:23:22Z

It technically looks ok for me, but the problem is that without adequate documentations, no one will even know that this is possible. I think if we ever add this (aka merge the current PR), it will mostly to be used by you.

I agree. :)

My suggestion for --single-turn is restricted to -cnv only. It cannot be used in -no-cnv mode. Something like -no-cnv --single-turn is invalid and will be ignored.

My point exactly, once you introduce --single-turn it's no longer about conversation vs no-conversation, it's just a question of single-turn vs multi-turn and formatting.

One-shot preformatted instruction with no chat template:
llama-cli -m ... -p "<|im_start|>user\nWhat is the meaning of life?<|im_end|>\n<|im_start|>assistant\n"

You forgot -no-cnv here I guess?

No breaking change, and yet still support what you need. Exactly what you proposed in the end.

Yep, except since -cnv is essentially a no-op and -no-cnv just means single turn raw prompt, my suggestion is to soft deprecate them for clarity of what they actually do, but this is all semantics, I'll change this PR to what you suggest and we can discuss this further afterwards.

CISC · 2025-03-03T19:42:04Z

@ngxson Ready for re-review (test failure looks like server outage again).

examples/main/main.cpp

bandoti · 2025-03-04T13:32:14Z

Re. -cnv and -no-cnv, I think it is important for us to step out of the implementation for a moment and consider that: the user may enable or disable conversation mode.

At the moment the implementation dictates that the default is "-cnv" read from model, but the user should always be able to enable/disable that in the future.

There may be a strange scenario where users need to force conversation mode on a base model or something!

ngxson · 2025-03-04T13:37:39Z

At the moment the implementation dictates that the default is "-cnv" read from model, but the user should always be able to enable/disable that in the future.

Yes I'm thinking about making -cnv to be the default (not depends on the model anymore), since it is pretty much expected nowadays that llama-cli run as chat by default.

-no-cnv can always be used to go back to the completion mode.

There may be a strange scenario where users need to force conversation mode on a base model or something!

I'm thinking about throwing an error in this case, maybe smth like:

Error: This model does not have a chat template, it may not support chat. Try manually specifying a chat template via --chat-template

CISC · 2025-03-04T13:37:42Z

There may be a strange scenario where users need to force conversation mode on a base model or something!

Sure, but that would either just be interactive mode or an external chat template.

bandoti · 2025-03-04T13:57:40Z

Yeah this all really evolved from the Alpaca/Llama2 days when there was no conversation mode at all and we had to specify the delimiters EOG, etc. directly!

And then conversation mode was added without the Jinja parser.

As far as I understand, the idea of keeping the "-cnv" flag was to allow the legacy template system to be forced and override the Jinja templates.

~~E.G. -cnv => --no-Jinja~~

But this doesn't make much sense either as -cnv is separate from the templates. I need ☕️

In my opinion we get sane defaults from the model, and allow user to override. It's fine to force conversation mode on a base prompt if the user wants to—just warn them. 🙂

bandoti · 2025-03-04T16:03:18Z

@ochafik Heads up I'm going to merge this in 5 minutes unless you interject! Don't worry, I will handle the fallout if there's a problem. 😉

…rn (ggml-org#12145) * Add chat template formatting to -no-cnv * only enable prompt formatting if explicitly enabled * add -st / --single-turn * add --single-turn and -p in conversation mode * fix -sys + -p * reword warning * small readability change and fix (long) outdated example usage * only activate single turn in conversation mode

Add chat template formatting to -no-cnv

f99d9d1

github-actions bot added the examples label Mar 2, 2025

only enable prompt formatting if explicitly enabled

07f9d4c

CISC added 2 commits March 3, 2025 20:33

add -st / --single-turn

78efe60

add --single-turn and -p in conversation mode

31a3039

CISC changed the title ~~main: add chat template formatting to -no-cnv~~ main: allow preloading conversation with -p and add -st / --single-turn Mar 3, 2025

ochafik reviewed Mar 3, 2025

View reviewed changes

examples/main/main.cpp Outdated Show resolved Hide resolved

fix -sys + -p

247e420

CISC requested a review from ochafik March 4, 2025 08:26

CISC added 2 commits March 4, 2025 13:41

reword warning

771fae0

small readability change and fix (long) outdated example usage

8af372b

ngxson reviewed Mar 4, 2025

View reviewed changes

examples/main/main.cpp Outdated Show resolved Hide resolved

examples/main/main.cpp Outdated Show resolved Hide resolved

only activate single turn in conversation mode

170779c

ngxson approved these changes Mar 4, 2025

View reviewed changes

bandoti merged commit 56d7a9f into ggml-org:master Mar 4, 2025
47 checks passed

CISC deleted the no-cnv-chat-template branch March 5, 2025 07:37

main: allow preloading conversation with -p and add -st / --single-turn #12145

main: allow preloading conversation with -p and add -st / --single-turn #12145

Uh oh!

Conversation

CISC commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 2, 2025

Uh oh!

ngxson commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 2, 2025

Uh oh!

CISC commented Mar 2, 2025

Uh oh!

ngxson commented Mar 2, 2025

Uh oh!

CISC commented Mar 2, 2025

Uh oh!

ngxson commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 3, 2025

Uh oh!

ngxson commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 3, 2025

Uh oh!

CISC commented Mar 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bandoti commented Mar 4, 2025

Uh oh!

ngxson commented Mar 4, 2025

Uh oh!

CISC commented Mar 4, 2025

Uh oh!

bandoti commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bandoti commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented Mar 2, 2025 •

edited

Loading

ngxson commented Mar 2, 2025 •

edited

Loading

CISC commented Mar 2, 2025 •

edited

Loading

ngxson commented Mar 2, 2025 •

edited

Loading

ngxson commented Mar 3, 2025 •

edited

Loading

ngxson commented Mar 3, 2025 •

edited

Loading

ngxson commented Mar 3, 2025 •

edited

Loading

CISC commented Mar 3, 2025 •

edited

Loading

ngxson commented Mar 3, 2025 •

edited

Loading

bandoti commented Mar 4, 2025 •

edited

Loading