Consider supporting individual deployments of the model-proxy and model-operator

KubeAI consists of two primary components:

**1. A model proxy:** the KubeAI proxy provides an OpenAI-compatible API layer. Behind this API, the proxy implements a prefix-aware load balancing strategy that optimizes for KV the cache utilization of the backend serving engines (i.e. vLLM). The proxy also implements request queueing (while the system scales from zero replicas) and request retries (to seamlessly handle bad backends).

**2. A model operator:** the KubeAI model operator manages backend server instances (Pods) directly. It automates common operations such as downloading models, mounting volumes, loading dynamic LoRA adapters.

Both of these components are colocated in the same deployment to keep things simple. They integrate with each other to provide functionality like scale-from zero and dynamic LoRA routing. This integration is done via the Kubernetes API and it would be possible to deploy one without the other.

This issue is here to gather feedback on whether the proxy and the operator should be independently deployable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider supporting individual deployments of the model-proxy and model-operator #430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider supporting individual deployments of the model-proxy and model-operator #430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions