fix(agent): scale down using agent shutdown hook #22
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
This change, which is present on Kubernetes 1.26+, added validation to not allow multiple HPAs to point at the same target. That restriction causes our autoscaling to break.
Solution
Instead of using 2 HPAs (one for scaling up, and one for scaling down), we use only 1 HPA now - for scaling up the agent deployment. The scaling down logic will be handled by a shutdown hook, which executes every time an agent gets idle for a specified period of time. The idle timeout period can be configured through the
agent.autoscaling.idleTimeoutForScaleDown
configuration value; by default, that value is 30min.Shutdown hook logic
Since we can't choose which pods to delete when scaling a deployment, when an agent gets idle and shuts down, we:
agent.autoscaling.min
. If we are at the minimum number of replicas already, we delete the pod, without decreasing the replica count, to avoid potentially getting into aCrashLoopBackOff
.Since multiple shutdown hooks can be executing at the same time, we use optimistic locking on the deployment, using a
semaphoreci.com/handle
annotation.