[llama-server] Save & Load Slots from Disk Cache Automatically #16979

Interpause · 2025-11-03T20:44:24Z

Interpause
Nov 3, 2025

I think a natural continuation of #16117 (#16391) would be to implement automatically saving & loading slots from disk (which afaik has to be done manually https://github.com/ggml-org/llama.cpp/tree/master/tools/server#post-slotsid_slotactionsave-save-the-prompt-cache-of-the-specified-slot-to-a-file). This would reduce prefill times across server restarts in local usage, and would be extremely beneficial for RAM-limited systems.

I don't really know how this would be implemented. At the most naive, it would be hashing the context to create a multi-level hash-based file cache (the cache/a3/b8/cd/1a4fe5e24... kind). However, I know the slots use some sort of similarity metrics to improve reuse, and I am not sure how that would be implemented for a disk cache, maybe a vector database?

ggerganov · 2025-11-04T16:09:41Z

ggerganov
Nov 4, 2025
Maintainer

It's possible to add such functionality and probably not difficult to get quite far with minimal effort. But I think there are various edge cases that makes this unlikely to get implemented:

Organizing caches according to combinations of "model + parameters". I.e. we can't use the state of Qwen3 with gpt-oss f.ex.
Managing available disk space, freeing it when appropriate, etc.

However, I know the slots use some sort of similarity metrics to improve reuse, and I am not sure how that would be implemented for a disk cache, maybe a vector database?

No need for vector database, the similarity is based on the actual tokens (i.e. text) that corresponds to that state. It's relatively simple, usually longest prefix matching is enough.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llama-server] Save & Load Slots from Disk Cache Automatically #16979

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[llama-server] Save & Load Slots from Disk Cache Automatically #16979

Uh oh!

Uh oh!

Interpause Nov 3, 2025

Replies: 1 comment

Uh oh!

ggerganov Nov 4, 2025 Maintainer

Interpause
Nov 3, 2025

ggerganov
Nov 4, 2025
Maintainer