- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Name and Version
version: 5327 (27ebfca)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
nvidia
Models
all
Problem description & steps to reproduce
After the user prompt is provided, the code enters this branch:
Line 716 in 0cf6725
| LOG_DBG("embd_inp.size(): %d, n_consumed: %d\n", (int) embd_inp.size(), n_consumed); | 
No new tokens are generated.
However, the following code assumes that there is a new token and it is inserted in the assistant response:
Line 824 in 0cf6725
| assistant_ss << common_token_to_piece(ctx, id, false); | 
First Bad Commit
No response
Relevant log output
The easiest way is to set a breakpoint here and wait for the assistant message:
https://github.com/ggml-org/llama.cpp/blob/0cf6725e9f9a164c39f7a87214d60342f7f946d8/tools/main/main.cpp#L270teleprint-me
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working