Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve
-
Updated
Feb 9, 2026 - Python