Closed
Description
Feature Description
Would it be possible to create functions that looked something like this:
llama_kv_save_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * dst);
llama_kv_load_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * src);
Motivation
In llama.cpp it is possible to save and load the entire context state in one operation with llama_copy_state_data
and llama_set_state_data
. For example this could be used to evaluate a large system prompt once, save it to disk, and then load the state every time a new conversation is started.
However with the batch decoding this isn't really possible. If you have many sequences being evaluated at once you can only load and save them all simultaneously.