-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Description
As of writing this issue, we have three kvblock.Index implementations. An index contains KV-block locality information and is updated when KV-Events are digested from the vLLM servers, each containing a set of block-hashes, admission or removal type and an identifier of the sender.
The kvblock.Index is the heart of the kv-cache-manager and its performance is essential for low-latency and accurate scheduling.
The implementations are:
InMemoryIndex(default) - an in-memory index backed by a hashicorp LRU cache implementation, with relaxed lockingCostAwareMemoryIndex- contributed by @yankay, ahypermodeinc/ristrettomemory bound high-performance in-memory cacheRedis- an index backed by a Redis server
Profiling the three options in realistic/high-scale workloads inputs would be helpful for drawing configuration recommendations and general transparency. The promotion of the 2nd option to default would also be desirable given good profiling data.
yankay