Gate serving-stack Gateway readiness on its LoadBalancer address#162
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a scheduling deadlock on fresh InferenceClusters by ensuring the composed Envoy Gateway (wrapped as a provider-kubernetes Object) is not considered ready until its LoadBalancer address is actually observed, allowing the address to propagate quickly into status.gateway.address for downstream scheduling.
Changes:
- Add a
DeriveFromCelQueryreadiness policy to the composed GatewayObject, gated onstatus.addressesbeing present/non-empty. - Extend the serving-stack unit tests to validate readiness gating and status propagation behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| functions/compose-serving-stack/function/fn.py | Adds a CEL readiness query and wires it into the composed Gateway Object to keep provider-kubernetes re-observing until the address appears. |
| functions/compose-serving-stack/tests/test_fn.py | Adds/updates tests to cover the Gateway readiness gating and address surfacing behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
On a fresh InferenceCluster a ModelDeployment never schedules: it stays at ReplicasScheduled=False / InsufficientCapacity because the cluster's status.gateway.address is never populated, even though the live Envoy Gateway on the workload cluster has an address. The scheduler filters out any cluster without a gateway address. compose-serving-stack wraps the Envoy Gateway in a provider-kubernetes Object with the default readiness.policy: SuccessfulCreate, so the Object is Ready the instant it's applied. provider-kubernetes only re-observes an Object's status.atProvider.manifest on its fast (~30s) poll while the Object is not Ready; a Ready Object re-observes only on the slow (~10m) drift poll. The Gateway's LoadBalancer address is assigned asynchronously after the first observe, so the observed manifest stays frozen at a pre-address snapshot, and the address fails to propagate up the chain, for up to ~10m. This change gives the Gateway Object a DeriveFromCelQuery readiness policy that gates on the observed manifest's status.addresses. While the address is absent the Object is not Ready, so provider-kubernetes keeps re-observing on its ~30s poll and the address propagates promptly instead of after the full drift interval. This mirrors the DeriveFromCelQuery pattern compose-model-replica already uses for workload readiness, and needs no alpha watch feature gate. Fixes #121. Signed-off-by: Nic Cope <nicc@rk0n.org>
dennis-upbound
approved these changes
Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of your changes
Fixes #121
On a fresh
InferenceClusteraModelDeploymentnever schedules — it sits atReplicasScheduled=False / InsufficientCapacitybecause the cluster'sstatus.gateway.addressis never populated, even though the live EnvoyGatewayon the workload cluster has had its address the whole time.compose-serving-stackwraps the Gateway in a provider-kubernetesObjectwith the defaultreadiness.policy: SuccessfulCreate, so it'sReadythe instant it's applied. provider-kubernetes only re-observes anObject's manifest on its fast (~30s) poll while the Object is notReady; aReadyObject re-observes only on the slow (~10m) drift poll. The Gateway's address is assigned asynchronously after the first observe, so the observed manifest stays frozen at a pre-address snapshot for up to ~10m.This change gives the Gateway
ObjectaDeriveFromCelQueryreadiness policy gating on the observedstatus.addresses. While the address is absent theObjectis notReady, so provider-kubernetes keeps re-observing on its ~30s poll and the address propagates promptly. This mirrors the patterncompose-model-replicaalready uses.I have:
nix flake check(or./nix.sh flake check) and made sure it passes.git commit -s.