Explore inference-aware routing on the control plane gateway

### What problem are you facing?

The control plane's InferenceGateway uses basic path-based HTTPRoute routing across model placements. The [Gateway API Inference Extension](https://gateway-api.sigs.k8s.io/geps/gep-3567/) adds `InferenceModel` and `InferencePool` CRDs to Envoy Gateway with features like criticality-based priority and weighted traffic splitting between backends.

### How could Modelplane help solve your problem?

The ModelDeployment function could compose `InferenceModel` and `InferencePool` resources on the control plane instead of (or alongside) plain HTTPRoutes. This would give us:

- **Criticality-based priority** — Critical requests preempt BestEffort under load, useful for platform teams that want to prioritize production traffic over experimentation.
- **Weighted traffic splitting** — Route a percentage of traffic to specific environments, enabling canary deployments across placements.

One design constraint is that model-name routing (single endpoint, model name in the request body selects the backend) doesn't fit cleanly in a multi-tenant platform — two teams deploying the same model would have the same model name but need separate endpoints. The current path-based approach (`/<namespace>/<deployment>/v1/chat/completions`) handles multi-tenancy correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore inference-aware routing on the control plane gateway #8

What problem are you facing?

How could Modelplane help solve your problem?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Explore inference-aware routing on the control plane gateway #8

Description

What problem are you facing?

How could Modelplane help solve your problem?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions