Skip to content

Commit d751670

Browse files
vinkal-chudgarCISC
authored andcommitted
llama-cli: prevent spurious assistant token (ggml-org#16202)
* tools/main: llama-cli: prevent spurious assistant token (ggml-org#13402) During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece. Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged. Fixes ggml-org#13402. Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> * Update tools/main/main.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * tools/main: remove outdated comment Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> --------- Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
1 parent e688dc3 commit d751670

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

tools/main/main.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -914,6 +914,10 @@ int main(int argc, char ** argv) {
914914
// Print cache statistics after each token generation
915915
token_count++;
916916

917+
if (params.conversation_mode && !waiting_for_first_input && !llama_vocab_is_eog(vocab, id)) {
918+
assistant_ss << common_token_to_piece(ctx, id, false);
919+
}
920+
917921
// echo this to console
918922
input_echo = true;
919923

@@ -1031,11 +1035,7 @@ int main(int argc, char ** argv) {
10311035
}
10321036
}
10331037

1034-
// if current token is not EOG, we add it to current assistant message
10351038
if (params.conversation_mode && !waiting_for_first_input) {
1036-
const auto id = common_sampler_last(smpl);
1037-
assistant_ss << common_token_to_piece(ctx, id, false);
1038-
10391039
if (!prompt.empty()) {
10401040
prompt.clear();
10411041
is_interacting = false;

0 commit comments

Comments
 (0)