Skip to content

Profile the different kvblock.Index implementations #108

@vMaroon

Description

@vMaroon

As of writing this issue, we have three kvblock.Index implementations. An index contains KV-block locality information and is updated when KV-Events are digested from the vLLM servers, each containing a set of block-hashes, admission or removal type and an identifier of the sender.

The kvblock.Index is the heart of the kv-cache-manager and its performance is essential for low-latency and accurate scheduling.

The implementations are:

  1. InMemoryIndex (default) - an in-memory index backed by a hashicorp LRU cache implementation, with relaxed locking
  2. CostAwareMemoryIndex - contributed by @yankay, a hypermodeinc/ristretto memory bound high-performance in-memory cache
  3. Redis - an index backed by a Redis server

Profiling the three options in realistic/high-scale workloads inputs would be helpful for drawing configuration recommendations and general transparency. The promotion of the 2nd option to default would also be desirable given good profiling data.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions