-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving the state of the model when exiting the program? #1240
Comments
Every user that uses the same server ends up killing the cache, right? It is recalculated from scratch for each request because the context changes from person to person. Keeping the cache of all users in memory would end up using a lot of memory. But, having a computer with little memory, I understand the desire to save the cache. I can only imagine the time it takes with this model. |
Yes, the cache will be rewound back to the first diverging token. But normally prompt processing is fast enough that you won't realize it. Jojorne is right that multiple cache saving is not really feasible for a server application. |
The kv cache for 123B will be many gigabytes too though. |
I understand that, and I accept it. |
Just making it known here, I too would like to see this implemented, if for no other reason than to overcome the long startup times like the one pictured above |
Is there now a possibility to add this feature to the program: when exiting the program (or pressing CRTL+C, as I usually do), if the -savestate key was used at startup, then save the state of the model, so that the next startup does not recalculate the KV-cache, but just load it from disk?
I have 4 Tesla P40's, using the 123B_4KM model and a 24k context. I really need this feature!
The text was updated successfully, but these errors were encountered: