Gate serving-stack Gateway readiness on its LoadBalancer address by negz · Pull Request #162 · modelplaneai/modelplane

negz · 2026-06-16T05:39:24Z

Description of your changes

Fixes #121

On a fresh InferenceCluster a ModelDeployment never schedules — it sits at ReplicasScheduled=False / InsufficientCapacity because the cluster's status.gateway.address is never populated, even though the live Envoy Gateway on the workload cluster has had its address the whole time.

compose-serving-stack wraps the Gateway in a provider-kubernetes Object with the default readiness.policy: SuccessfulCreate, so it's Ready the instant it's applied. provider-kubernetes only re-observes an Object's manifest on its fast (~30s) poll while the Object is not Ready; a Ready Object re-observes only on the slow (~10m) drift poll. The Gateway's address is assigned asynchronously after the first observe, so the observed manifest stays frozen at a pre-address snapshot for up to ~10m.

This change gives the Gateway Object a DeriveFromCelQuery readiness policy gating on the observed status.addresses. While the address is absent the Object is not Ready, so provider-kubernetes keeps re-observing on its ~30s poll and the address propagates promptly. This mirrors the pattern compose-model-replica already uses.

I have:

Read and followed Modelplane's contribution process.
Run nix flake check (or ./nix.sh flake check) and made sure it passes.
Added or updated tests covering any composition function changes.
Signed off every commit with git commit -s.

Copilot

Pull request overview

This PR fixes a scheduling deadlock on fresh InferenceClusters by ensuring the composed Envoy Gateway (wrapped as a provider-kubernetes Object) is not considered ready until its LoadBalancer address is actually observed, allowing the address to propagate quickly into status.gateway.address for downstream scheduling.

Changes:

Add a DeriveFromCelQuery readiness policy to the composed Gateway Object, gated on status.addresses being present/non-empty.
Extend the serving-stack unit tests to validate readiness gating and status propagation behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
functions/compose-serving-stack/function/fn.py	Adds a CEL readiness query and wires it into the composed Gateway `Object` to keep provider-kubernetes re-observing until the address appears.
functions/compose-serving-stack/tests/test_fn.py	Adds/updates tests to cover the Gateway readiness gating and address surfacing behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

On a fresh InferenceCluster a ModelDeployment never schedules: it stays at ReplicasScheduled=False / InsufficientCapacity because the cluster's status.gateway.address is never populated, even though the live Envoy Gateway on the workload cluster has an address. The scheduler filters out any cluster without a gateway address. compose-serving-stack wraps the Envoy Gateway in a provider-kubernetes Object with the default readiness.policy: SuccessfulCreate, so the Object is Ready the instant it's applied. provider-kubernetes only re-observes an Object's status.atProvider.manifest on its fast (~30s) poll while the Object is not Ready; a Ready Object re-observes only on the slow (~10m) drift poll. The Gateway's LoadBalancer address is assigned asynchronously after the first observe, so the observed manifest stays frozen at a pre-address snapshot, and the address fails to propagate up the chain, for up to ~10m. This change gives the Gateway Object a DeriveFromCelQuery readiness policy that gates on the observed manifest's status.addresses. While the address is absent the Object is not Ready, so provider-kubernetes keeps re-observing on its ~30s poll and the address propagates promptly instead of after the full drift interval. This mirrors the DeriveFromCelQuery pattern compose-model-replica already uses for workload readiness, and needs no alpha watch feature gate. Fixes #121. Signed-off-by: Nic Cope <nicc@rk0n.org>

Copilot AI review requested due to automatic review settings June 16, 2026 05:39

Copilot started reviewing on behalf of negz June 16, 2026 05:39 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread functions/compose-serving-stack/function/fn.py

Comment thread functions/compose-serving-stack/function/fn.py

Comment thread functions/compose-serving-stack/tests/test_fn.py

negz force-pushed the mind-the-gate branch from c7cbd40 to 919cc67 Compare June 16, 2026 06:10

dennis-upbound approved these changes Jun 16, 2026

View reviewed changes

dennis-upbound merged commit 95e3b1c into main Jun 16, 2026
3 checks passed

negz mentioned this pull request Jun 16, 2026

EKS has no autoscaler installed #166

Closed

negz deleted the mind-the-gate branch June 16, 2026 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gate serving-stack Gateway readiness on its LoadBalancer address#162

Gate serving-stack Gateway readiness on its LoadBalancer address#162
dennis-upbound merged 1 commit into
mainfrom
mind-the-gate

negz commented Jun 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

negz commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of your changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

negz commented Jun 16, 2026 •

edited

Loading