-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes infill incorrect tokenization #3508
Conversation
…ing space token from suffix when params.escape
examples/infill/infill.cpp
Outdated
std::vector<llama_token> inp_sfx = ::llama_tokenize(ctx, params.input_suffix, false); | ||
const int space_token = 29871; | ||
if (params.escape && inp_sfx.size() > 1 && inp_sfx[0] == space_token) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we always want to remove the leading space, regardless if the params.escape
is set or not:
if (params.escape && inp_sfx.size() > 1 && inp_sfx[0] == space_token) { | |
if (inp_sfx.size() > 1 && inp_sfx[0] == space_token) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the params.escape check as only then we add an extra space to the suffix.
…ing space token from suffix when params.escape
…or rm added space token
bool suff_rm_leading_spc = params.escape; | ||
if (suff_rm_leading_spc && params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) { | ||
params.input_suffix.erase(0, 1); | ||
suff_rm_leading_spc = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I could be missing something, but this whole section should not be needed.
If you keep everything else and just replace this block with:
bool suff_rm_leading_spc = params.escape; | |
if (suff_rm_leading_spc && params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) { | |
params.input_suffix.erase(0, 1); | |
suff_rm_leading_spc = false; | |
} | |
const bool suff_rm_leading_spc = true; |
Can you find a case that does not match the Python version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I can find cases where it misbehaves:
with params.escape, if the suffix starts with a space, the tokenizer adds a space and THEN tokenizes the double space to 259
so we can't remove the surplus space anymore but would have to "translate" the 259
to 29871
to remove the space, same with more than one leading spaces...
without params.escape if the suffix starts with a space, we remove it so it does not reflect the original suffix
not sure what the python version does as I can't test it, but the above behaviour seems wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense. Let's double check, just in case:
@kherud Could you please run the Python codellama with suffix with 2 leading spaces and see what tokens are produced for infill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'<PRE> <SUF>': 1, 32007, 259, 32008, 32009
'<PRE> <SUF> ': 1, 32007, 259, 32008, 29871, 32009
'<PRE> <SUF> ': 1, 32007, 259, 32008, 259, 32009
'<PRE> <SUF> ': 1, 32007, 259, 32008, 1678, 32009
'<PRE> <SUF> ': 1, 32007, 259, 32008, 268, 32009
…or rm added space token
…ed back or rm added space token" This reverts commit 63ba0b6.
There seems to be another issue too, in interactive mode the -e seems to only act on the initial input as the processing happens when the params are loaded in common.cpp:617 |
This is probably an oversight - we should escape the user input and |
Should be an easy fix as I am doing that already in the infill script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway you prefer.
If the results match the Python version now, I think we can merge.
I can do some testing later, or we can wait for some feedback from other people.
Would keep it separate as the issues are related but not identical?
I think it matches now, your call to merge or wait. |
Ok, will merge this first thing tomorrow unless somebody notices any problems |
…example * 'master' of github.com:ggerganov/llama.cpp: (34 commits) examples: support LLaVA v1.5 (multimodal model) (ggerganov#3436) docs : fix typo GOMP_CPU_AFFINITY (ggerganov#3597) cmake : fix add_compile_options on macOS typo : it is `--n-gpu-layers` not `--gpu-layers` (ggerganov#3592) ci : check if there is enough VRAM (ggerganov#3596) server : add completion mode (no chat) (ggerganov#3582) prompts : add mnemonics.txt server : fix kv cache management (ggerganov#3588) main : fix session loading bug (ggerganov#3400) server : add parameter -tb N, --threads-batch N (ggerganov#3584) common : fix mirostat state when using multiple sequences (ggerganov#3543) batched : add bench tool (ggerganov#3545) examples : add batched.swift + improve CI for swift (ggerganov#3562) Add MPT model to supported models in README.md (ggerganov#3574) Minor improvements in GPT2 tokenizer (ggerganov#3567) readme : add bloom (ggerganov#3570) llm : add bloom models (ggerganov#3553) swift : improvements and fixes (ggerganov#3564) llm : add MPT support (ggerganov#3417) infill. : fix tokenization (ggerganov#3508) ...
fixes #3503