-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Added api for getting/setting the kv_cache #685
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The api provides access methods for retrieving the current memory buffer for the kv_cache and its token number. It also contains a method for setting the kv_cache from a memory buffer. This makes it possible to load/save history - maybe support --cache-prompt paramater as well?
prusnak
requested changes
Apr 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const
where appropriate
Add review comments Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
Added review comments Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
Added review comments Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
Review Comments Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
Thanks for the comments, @prusnak - appreciate it. |
ggerganov
approved these changes
Apr 2, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good start for solving #64
Use 4 spaces indentation
prusnak
approved these changes
Apr 2, 2023
This was referenced Apr 2, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains a simple extension to the C-API for getting/setting the kv_cache so that an app can save the state of the kv_cache after providing a prompt and load this next time the app starts to avoid having to evaluate the prompt on startup.
The api provides access methods for retrieving the current memory buffer for the kv_cache and its token number. It also contains a method for setting the kv_cache from a memory buffer and a token count.
I did not start implementing the
--cache-prompt
argument since it is a bit more involved - we need to save some more state like the last_n_tokens and n_past parameter. We'd also need to hash the prompt, check if a prompt file existed etc.Implements foundation for #64