Withhold a ModelEndpoint until its ModelReplica is Ready by negz · Pull Request #163 · modelplaneai/modelplane

negz · 2026-06-16T05:52:51Z

Description of your changes

Fixes #102.

A ModelDeployment fans out into a ModelReplica and a ModelEndpoint per scheduled placement. compose-model-deployment composed the ModelEndpoint as soon as the placement was scheduled, from the cluster's gateway address alone, with no regard for whether the replica's model was actually serving. Once the endpoint's Service and EndpointSlice existed it advertised a backendName, ModelService picked it up, and the HTTPRoute routed traffic to it — while the destination pods were still pulling the engine image and loading weights. The workload cluster gateway returned 503s: on every deployment from scratch, and on scale-up for the share of traffic hitting each new replica until it warmed up.

This withholds the ModelEndpoint until its ModelReplica reports Ready=True. The replica's Ready tracks both the engine workloads serving and the remote Service and HTTPRoute that front them — the whole traffic path the endpoint advertises — so gating on it ensures routing only ever points at a backend that can serve. The endpoint is composed on the reconcile that first observes the replica Ready, and withdrawn again if the replica later goes not-Ready, pulling a dead backend out of rotation. This mirrors the existing handling of placements on clusters with no gateway address, which already get no endpoint.

I have:

Read and followed Modelplane's contribution process.
Run nix flake check (or ./nix.sh flake check) and made sure it passes.
Added or updated tests covering any composition function changes.
Signed off every commit with git commit -s.

A ModelDeployment fans out into a ModelReplica and a ModelEndpoint per scheduled placement. compose-model-deployment composed the ModelEndpoint as soon as the placement was scheduled, from the cluster's gateway address alone, without regard for whether the replica's model was actually serving. As soon as the endpoint's Service and EndpointSlice existed it advertised a backendName, ModelService picked it up, and the HTTPRoute routed traffic to it. The destination pods were still warming up - pulling the engine image and loading model weights - so the workload cluster gateway returned 503s. This happened on every deployment from scratch, and on scale-up a share of traffic 503'd for the duration of each new replica's warm-up. This change withholds the ModelEndpoint until its ModelReplica reports Ready=True. The replica's Ready tracks both the engine workloads serving and the remote Service and HTTPRoute that front them - the whole traffic path the endpoint advertises - so gating on it ensures routing only ever points at a backend that can serve. The endpoint is composed on the reconcile that first observes the replica Ready, and withdrawn again if the replica later goes not-Ready, pulling a dead backend out of rotation. This mirrors the existing behaviour for placements on clusters with no gateway address, which already get no endpoint. Fixes #102. Signed-off-by: Nic Cope <nicc@rk0n.org>

Copilot

Pull request overview

This PR prevents ModelService/HTTPRoute from routing traffic to a newly scheduled replica before it can actually serve by withholding (and, if necessary, withdrawing) the corresponding ModelEndpoint until the ModelReplica reports Ready=True. This addresses the warm-up window 503s described in #102 by ensuring only ready backends are advertised for routing.

Changes:

Gate ModelEndpoint composition on the observed ModelReplica Ready condition, and omit the endpoint from desired state when the replica is not-ready (triggering deletion).
Update composition tests to cover “withhold until ready” and “withdraw when not-ready” behaviors, and adjust expectations across existing scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
functions/compose-model-deployment/function/fn.py	Withholds endpoint composition until the per-placement replica is observed `Ready=True`, preventing premature routing to warming backends.
functions/compose-model-deployment/tests/test_fn.py	Adds/updates cases to validate endpoint withholding/withdrawal based on replica readiness and updates expected conditions/status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings June 16, 2026 05:52

Copilot started reviewing on behalf of negz June 16, 2026 05:53 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

dennis-upbound approved these changes Jun 16, 2026

View reviewed changes

dennis-upbound merged commit 9328afb into main Jun 16, 2026
4 checks passed

negz mentioned this pull request Jun 16, 2026

EKS has no autoscaler installed #166

Closed

negz deleted the cold-shoulder branch June 16, 2026 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Withhold a ModelEndpoint until its ModelReplica is Ready#163

Withhold a ModelEndpoint until its ModelReplica is Ready#163
dennis-upbound merged 1 commit into
mainfrom
cold-shoulder

negz commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

negz commented Jun 16, 2026

Description of your changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants