[ACL 2026] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization
-
Updated
Apr 21, 2026 - Python
[ACL 2026] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization
ClearML - Model-Serving Orchestration and Repository Solution
MoDM is a cache-aware, hybrid serving system that accelerates image generation by dynamically combining small and large diffusion models for efficient, high-quality output.
An async ML service built with FastAPI, Celery, RabbitMQ, and Redis for efficient, scalable ML model serving
Dhruva is a full-fledged DPG platform for serving AI models at scale.
Add a description, image, and links to the serving-ml topic page so that developers can more easily learn about it.
To associate your repository with the serving-ml topic, visit your repo's landing page and select "manage topics."