What happened?
On a fresh EKS InferenceCluster, a ModelDeployment never schedules. It stays at ReplicasScheduled=False with:
0 of 1 replicas scheduled (checked 1 clusters)
Reason: InsufficientCapacity
even though the cluster is Ready=True, reports a matching GPU pool, and has free nodes. The deployment never recovers on its own — no model is ever served, with no manual intervention.
The root cause is upstream of the scheduler: the cluster's status.gateway.address is never populated, so the scheduler treats the cluster as unschedulable.
The ModelDeployment scheduler filters out any cluster without a gateway address (_cluster_ready in compose-model-deployment):
def _cluster_ready(cluster):
if not cluster.status.gateway or not cluster.status.gateway.address:
return False
return any(c.type == "Ready" and c.status == "True" for c in cluster.status.conditions or [])
On the affected cluster the field is empty:
$ kubectl get ic eks-us-west -o jsonpath='{.status.gateway}'
# (empty)
$ kubectl get ic eks-us-west -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
{"reason":"Available","status":"True","type":"Ready"}
The address propagation chain is: Envoy Gateway (on the workload cluster) gets a LoadBalancer address → compose-serving-stack reads it from the gateway provider-kubernetes Object's observed manifest and writes ServingStack.status.gateway.address → compose-inference-cluster copies it to InferenceCluster.status.gateway.address → scheduler reads it.
The chain breaks at the first hop. The live Gateway on the workload cluster has its address:
$ kubectl --kubeconfig eks get gateway inference-gateway -n modelplane-system \
-o jsonpath='{.status.addresses}'
[{"type":"Hostname","value":"...elb.amazonaws.com"}]
but the gateway Object's observed manifest is frozen at a pre-address snapshot, indefinitely:
$ kubectl get object.kubernetes.m.crossplane.io <gateway-obj> -n modelplane-system \
-o jsonpath='{.status.atProvider.manifest.status}'
{"conditions":[{"lastTransitionTime":"1970-01-01T00:00:00Z","message":"Waiting for controller","reason":"Pending","status":"Unknown","type":"Accepted"},
{"lastTransitionTime":"1970-01-01T00:00:00Z","message":"Waiting for controller","reason":"Pending","status":"Unknown","type":"Programmed"}]}
This snapshot persisted for >8 minutes while the live Gateway had its address the whole time.
Root cause
compose-serving-stack composes the Envoy Gateway as a provider-kubernetes Object with readiness.policy: SuccessfulCreate and the default watch: false (the _k8s_object helper sets neither field; the Gateway Object is built at functions/compose-serving-stack/function/fn.py compose_gateway).
provider-kubernetes (v1.2.5) refreshes an Object's status.atProvider.manifest only when its reconciler runs. Two settings in the namespaced object controller (internal/controller/namespaced/object/object.go) mean a reconcile almost never runs after the initial apply:
WithEventFilter(resource.DesiredStateChanged()) — a reconcile is enqueued only when the Object's own desired state (spec/annotations) changes. The Gateway acquiring its LoadBalancer address is an external change and does not enqueue a reconcile.
WithPollInterval(o.PollInterval) with WithPollIntervalHook — the periodic drift poll defaults to 10m. The hook shortens it to 30s only while the Object is not Ready=True. Because the Gateway Object uses readiness: SuccessfulCreate, it is Ready=True the moment it is created, so it polls on the full 10m interval.
Timeline on the affected cluster:
06:43:45 — Gateway Object applied; first (and only timely) observe captures the manifest before Envoy programs it ("Waiting for controller").
06:43:45 — readiness: SuccessfulCreate ⇒ Ready=True immediately ⇒ poll interval is the full 10m.
06:44:36 — Envoy assigns the ELB address (live Gateway Programmed=True). This external change does not trigger a reconcile.
- next re-observe is ~10 minutes later. The cluster has no gateway address, and the scheduler reports
InsufficientCapacity, for that whole window.
Why this Object and not the others: the model-serving Object (a Deployment) captures live .status on the very first observe, because the Deployment controller populates .status within milliseconds of apply. The Gateway is the only serving-stack Object whose consumed status field is populated asynchronously by a controller after the apply round-trip, and whose value a downstream consumer (the scheduler) blocks on. So the staleness is invisible everywhere except here.
How can we reproduce it?
- Install Modelplane on a kind control plane.
- Create an
InferenceGateway, an EKS InferenceClass, and an EKS InferenceCluster (examples/platform/inference-class-eks-l4.yaml, examples/platform/inference-cluster-eks.yaml). Wait for the cluster to reach Ready=True.
- Apply
examples/deployment/model-deployment.yaml + model-service.yaml.
- Observe the
ModelDeployment stuck at ReplicasScheduled=False / InsufficientCapacity, and kubectl get ic <name> -o jsonpath='{.status.gateway}' empty, despite the live Envoy Gateway on the workload cluster having an address.
Workaround (manual): force the gateway Object to reconcile, which makes provider-kubernetes re-observe the live manifest and propagate the address:
kubectl annotate object.kubernetes.m.crossplane.io <gateway-obj> -n modelplane-system \
poke="$(date +%s)" --overwrite
Within a reconcile the address propagates up the chain, the cluster becomes schedulable, and the deployment serves. (Verified end to end: the model served an OpenAI chat completion after the poke.)
Possible fixes
- Set
spec.watch: true on the Gateway Object so provider-kubernetes reacts to the live Gateway's status changes. This is the purpose-built mechanism, but watch is an alpha field that is only honored when the provider's EnableAlphaWatches feature gate is on — which would also require a DeploymentRuntimeConfig enabling --enable-watches (and the matching RBAC) on provider-kubernetes. The provider currently runs with only EnableBetaManagementPolicies and EnableBetaServerSideApply.
- Or change the Gateway
Object's readiness.policy so it is not considered Ready until the address is observed (e.g. a DeriveFromObject-style policy that gates on status.addresses). While not Ready, the poll-interval hook drops to 30s, so the address would propagate within ~30s instead of ~10m. This keeps the deployment's time-to-serve bounded without enabling an alpha gate.
What environment did it happen in?
Modelplane version: v0.1.0-dev.1781072131.gb5e0089 (PR #101 branch)
Crossplane: v2.3.2
provider-kubernetes: v1.2.5 (default args; EnableAlphaWatches off)
Cluster source: EKS (provisioned), us-west-2, 1x NVIDIA L4 (g6.xlarge)
Kubernetes (control plane): kind v1.34.0
What happened?
On a fresh EKS
InferenceCluster, aModelDeploymentnever schedules. It stays atReplicasScheduled=Falsewith:even though the cluster is
Ready=True, reports a matching GPU pool, and has free nodes. The deployment never recovers on its own — no model is ever served, with no manual intervention.The root cause is upstream of the scheduler: the cluster's
status.gateway.addressis never populated, so the scheduler treats the cluster as unschedulable.The
ModelDeploymentscheduler filters out any cluster without a gateway address (_cluster_readyincompose-model-deployment):On the affected cluster the field is empty:
The address propagation chain is: Envoy
Gateway(on the workload cluster) gets a LoadBalancer address →compose-serving-stackreads it from thegatewayprovider-kubernetesObject's observed manifest and writesServingStack.status.gateway.address→compose-inference-clustercopies it toInferenceCluster.status.gateway.address→ scheduler reads it.The chain breaks at the first hop. The live Gateway on the workload cluster has its address:
but the
gatewayObject's observed manifest is frozen at a pre-address snapshot, indefinitely:This snapshot persisted for >8 minutes while the live Gateway had its address the whole time.
Root cause
compose-serving-stackcomposes the EnvoyGatewayas a provider-kubernetesObjectwithreadiness.policy: SuccessfulCreateand the defaultwatch: false(the_k8s_objecthelper sets neither field; the GatewayObjectis built atfunctions/compose-serving-stack/function/fn.pycompose_gateway).provider-kubernetes (v1.2.5) refreshes an
Object'sstatus.atProvider.manifestonly when its reconciler runs. Two settings in the namespaced object controller (internal/controller/namespaced/object/object.go) mean a reconcile almost never runs after the initial apply:WithEventFilter(resource.DesiredStateChanged())— a reconcile is enqueued only when theObject's own desired state (spec/annotations) changes. The Gateway acquiring its LoadBalancer address is an external change and does not enqueue a reconcile.WithPollInterval(o.PollInterval)withWithPollIntervalHook— the periodic drift poll defaults to10m. The hook shortens it to30sonly while theObjectis notReady=True. Because the GatewayObjectusesreadiness: SuccessfulCreate, it isReady=Truethe moment it is created, so it polls on the full10minterval.Timeline on the affected cluster:
06:43:45— GatewayObjectapplied; first (and only timely) observe captures the manifest before Envoy programs it ("Waiting for controller").06:43:45—readiness: SuccessfulCreate⇒Ready=Trueimmediately ⇒ poll interval is the full10m.06:44:36— Envoy assigns the ELB address (live GatewayProgrammed=True). This external change does not trigger a reconcile.InsufficientCapacity, for that whole window.Why this
Objectand not the others: themodel-servingObject(a Deployment) captures live.statuson the very first observe, because the Deployment controller populates.statuswithin milliseconds of apply. The Gateway is the only serving-stackObjectwhose consumed status field is populated asynchronously by a controller after the apply round-trip, and whose value a downstream consumer (the scheduler) blocks on. So the staleness is invisible everywhere except here.How can we reproduce it?
InferenceGateway, an EKSInferenceClass, and an EKSInferenceCluster(examples/platform/inference-class-eks-l4.yaml,examples/platform/inference-cluster-eks.yaml). Wait for the cluster to reachReady=True.examples/deployment/model-deployment.yaml+model-service.yaml.ModelDeploymentstuck atReplicasScheduled=False / InsufficientCapacity, andkubectl get ic <name> -o jsonpath='{.status.gateway}'empty, despite the live Envoy Gateway on the workload cluster having an address.Workaround (manual): force the gateway
Objectto reconcile, which makes provider-kubernetes re-observe the live manifest and propagate the address:Within a reconcile the address propagates up the chain, the cluster becomes schedulable, and the deployment serves. (Verified end to end: the model served an OpenAI chat completion after the poke.)
Possible fixes
spec.watch: trueon the GatewayObjectso provider-kubernetes reacts to the live Gateway's status changes. This is the purpose-built mechanism, butwatchis an alpha field that is only honored when the provider'sEnableAlphaWatchesfeature gate is on — which would also require aDeploymentRuntimeConfigenabling--enable-watches(and the matching RBAC) on provider-kubernetes. The provider currently runs with onlyEnableBetaManagementPoliciesandEnableBetaServerSideApply.Object'sreadiness.policyso it is not consideredReadyuntil the address is observed (e.g. aDeriveFromObject-style policy that gates onstatus.addresses). While notReady, the poll-interval hook drops to30s, so the address would propagate within ~30s instead of ~10m. This keeps the deployment's time-to-serve bounded without enabling an alpha gate.What environment did it happen in?
Modelplane version:
v0.1.0-dev.1781072131.gb5e0089(PR #101 branch)Crossplane: v2.3.2
provider-kubernetes: v1.2.5 (default args;
EnableAlphaWatchesoff)Cluster source: EKS (provisioned), us-west-2, 1x NVIDIA L4 (g6.xlarge)
Kubernetes (control plane): kind v1.34.0