[llama-server] Save & Load Slots from Disk Cache Automatically #16979
Interpause
started this conversation in
Ideas
Replies: 1 comment
-
|
It's possible to add such functionality and probably not difficult to get quite far with minimal effort. But I think there are various edge cases that makes this unlikely to get implemented:
No need for vector database, the similarity is based on the actual tokens (i.e. text) that corresponds to that state. It's relatively simple, usually longest prefix matching is enough. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I think a natural continuation of #16117 (#16391) would be to implement automatically saving & loading slots from disk (which afaik has to be done manually https://github.com/ggml-org/llama.cpp/tree/master/tools/server#post-slotsid_slotactionsave-save-the-prompt-cache-of-the-specified-slot-to-a-file). This would reduce prefill times across server restarts in local usage, and would be extremely beneficial for RAM-limited systems.
I don't really know how this would be implemented. At the most naive, it would be hashing the context to create a multi-level hash-based file cache (the
cache/a3/b8/cd/1a4fe5e24...kind). However, I know the slots use some sort of similarity metrics to improve reuse, and I am not sure how that would be implemented for a disk cache, maybe a vector database?Beta Was this translation helpful? Give feedback.
All reactions