Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add
retrieval
example #6193add
retrieval
example #6193Changes from 2 commits
eb760a9
50f9967
751787e
c1e5575
d33a015
208e1f0
789a194
7d819d0
e16279e
f9dc033
56b7db9
5d02fe9
3073b38
0796bcb
3cb7567
0cacdec
09eeea6
ecdcb67
d7fa7a6
eb5fd2d
f6be52d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure about this line. Will it drops tokens if
inp.size() > n_batch
? If that's the case, that means we drop information from embedding, which will cause the output embedding to become inaccurate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll have to restrict the chunk size to fit the n_batch parameter somehow. This presents the identical issue to setting a configurable maximum chunk length: How should we handle situations where separators fail to appear within the maximum length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe reversed, we can force the user to give sufficient
n_batch
(andn_ubatch
) before running the app. AFAIK that's because with embeddings, we use non-causal models the requires prompt to be processed in one batch.My plan is maybe after tokenize all chunks, you can see if any chunk have
tokens.size() > n_batch
, then raise an error and exit the program.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the best option for now 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it make more sense to determine
params.n_batch
based on the largest chunk after tokenization? It should not be a user-provided parameter in this example - just set it to the largest chunk sizeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that can be a solution. I'm just not sure if that requires re-creating a new
llama_context
, because the priorllama_init_from_gpt_params
call already usedparams.n_batch
andparams.n_ubatch