Skip to content

Autoscale EKS GPU node pools with the cluster autoscaler#183

Merged
negz merged 2 commits into
mainfrom
elastic-band
Jun 18, 2026
Merged

Autoscale EKS GPU node pools with the cluster autoscaler#183
negz merged 2 commits into
mainfrom
elastic-band

Conversation

@negz

@negz negz commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Description of your changes

Fixes #166.

Closes #173.

The fleet scheduler treats InferenceCluster.status.gpuPools[].nodes as the node headroom it may place ModelReplicas against — a pool's maxNodeCount for every cluster source. On GKE that holds: the managed control plane autoscales node pools up to maxNodeCount on demand. On EKS it didn't. We compose the managed node group with a scalingConfig but install nothing to scale within it, so only the realized nodeCount ever materializes. The scheduler, trusting maxNodeCount, places gangs onto nodes that never appear and the pods hang Pending forever.

DRA rules out the obvious alternatives: it's incompatible with both Karpenter and EKS Auto Mode, so neither can back our GPU pools. That leaves the Kubernetes cluster autoscaler on managed node groups.

This composes the autoscaler in compose-eks-cluster, alongside the EFS CSI driver it mirrors: a custom IAM policy and role bound to the cluster-autoscaler ServiceAccount through EKS Pod Identity (reusing the eks-pod-identity-agent addon), and the cluster-autoscaler Helm chart on the cluster's own helm ProviderConfig. The autoscaler discovers node groups by the tags EKS puts on their ASGs, so the EKS cluster name is pinned to the XR name to keep autoDiscovery.clusterName in sync. The Helm release is gated on the cluster being observed, and the EKSCluster pipeline gains a compose-usages step so the ProviderConfig outlives the release on teardown.

With a working autoscaler on EKS, maxNodeCount is reachable headroom on both sources — so this supersedes #173 (the per-source autoscaled flag): the node count gpu_pools already publishes is now honest for EKS, with no per-source distinction needed.

I have:

  • Read and followed Modelplane's contribution process.
  • Run nix flake check (or ./nix.sh flake check) and made sure it passes.
  • Added or updated tests covering any composition function changes.
  • Signed off every commit with git commit -s.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Kubernetes Cluster Autoscaler support to EKS-based EKSCluster compositions so GPU node pools can actually scale up to maxNodeCount, aligning EKS behavior with GKE and preventing the scheduler from overcommitting to capacity that will never materialize.

Changes:

  • Compose cluster-autoscaler on EKS clusters (IAM policy/role + Pod Identity association + Helm Release gated on cluster observation).
  • Pin the EKS cluster name to a compose-time-known value used by autoscaler autodiscovery.
  • Extend the EKSCluster composition pipeline with a compose-usages step and update unit tests / schema lock.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
schemas/.lock.json Updates schema lock to reflect generated model changes used by the new resources.
functions/compose-eks-cluster/function/fn.py Composes autoscaler IAM + Pod Identity + Helm Release and pins cluster naming for autodiscovery.
functions/compose-eks-cluster/tests/test_fn.py Adds expected resources and gating behavior coverage for autoscaler composition.
apis/eksclusters/composition.yaml Adds compose-usages pipeline step to keep ProviderConfig dependencies alive for teardown ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread functions/compose-eks-cluster/function/fn.py
Comment thread functions/compose-eks-cluster/function/fn.py Outdated
Comment thread functions/compose-eks-cluster/function/fn.py
Comment thread functions/compose-eks-cluster/tests/test_fn.py
negz added 2 commits June 17, 2026 19:10
The fleet scheduler treats InferenceCluster.status.gpuPools[].nodes as
the node headroom it may place ModelReplicas against, and for every
cluster source that is a pool's maxNodeCount. On GKE that holds: the
managed control plane autoscales node pools up to maxNodeCount on
demand. On EKS it didn't. We compose the managed node group with a
scalingConfig but install nothing to scale within it, so only the
realized nodeCount ever materializes. The scheduler, trusting
maxNodeCount, places gangs onto nodes that never appear and the pods
hang Pending forever (#166).

DRA rules out the obvious alternatives: it's incompatible with both
Karpenter and EKS Auto Mode, so neither can back our GPU pools. That
leaves the Kubernetes cluster autoscaler on managed node groups.

This change composes the autoscaler in compose-eks-cluster, alongside
the EFS CSI driver it mirrors: a custom IAM policy and role bound to the
cluster-autoscaler ServiceAccount through EKS Pod Identity (reusing the
eks-pod-identity-agent addon), and the cluster-autoscaler Helm chart on
the cluster's own helm ProviderConfig. The autoscaler discovers node
groups by the tags EKS puts on their ASGs, so the EKS cluster name is
pinned to the XR name to keep autoDiscovery.clusterName in sync. The
Helm release is gated on the cluster being observed, and the EKSCluster
pipeline gains a compose-usages step so the ProviderConfig outlives the
release on teardown.

Fixes #166.

Signed-off-by: Nic Cope <nicc@rk0n.org>
The node groups set scalingConfig.desiredSize in forProvider, which
Crossplane continuously reconciles. Once the cluster autoscaler scales a
group's ASG, Crossplane reverts its DesiredCapacity back to the composed
nodeCount on the next reconcile, fighting the autoscaler — the classic
autoscaler-versus-IaC conflict. An end-to-end test saw the two coexist
during a scale-up, but on a longer horizon Crossplane would periodically
scale the group back down.

This moves desiredSize into initProvider, which seeds it only at
creation and is then ignored, and sets managementPolicies to exclude
LateInitialize so the initProvider value takes effect. Crossplane now
owns min/max; the autoscaler owns desired. This is the canonical
initProvider use case in the Crossplane docs.

Signed-off-by: Nic Cope <nicc@rk0n.org>
@negz negz marked this pull request as ready for review June 18, 2026 03:11
@negz negz merged commit 1101164 into main Jun 18, 2026
4 checks passed
@negz negz deleted the elastic-band branch June 18, 2026 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EKS has no autoscaler installed

2 participants