-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAPI cluster autoscaler delete nodes from clusters it doesn't have access #5510
Comments
in general, we do not recommend running a single autoscaler to cover multiple clusters. but, with that said, i would expect it to be ok for the autoscaler to manage a single cluster even if the workload hosts multiple clusters. (if the capi resources are isolated)
this is by design, but also we do allow specify multiple kubeconfig, but only for a single "cluster" deployment. eg one config for management and one config for workload.
agreed about documenting this. i'm not sure we want to disable autodiscovery as users have come to expect that the autoscaler works that way, but i do wonder if we couldn't be doing more to identify the cluster owner. i'm not sure that is desirable, but something to think about. |
thank you for taking a look, @elmiko.
Ok, perfect, I could already start working at least on this one. I guess the docs should alert that if the management cluster manages multiple clusters we recomend multiple cluster autoscaler (CA) deployments, and each instance of CA must filter its own cluster with
I think I'm missing some context in here, but I don't see any benefit of autodiscovery for multiple clusters since CA won't be able to manage more than one cluster. I agree with you that autodiscovery for nodepools in the same cluster is something I would expect. I'm thinking of two possible solutions in here, since IMHO just documentation is not a valid solution, because it can lead to serious incidents deleting nodes with some workload.
|
+1, starting with a docs patch would be great imo. instructing users to filter on their clusterName as part of the autodiscovery also seems like a win.
yeah, i wasn't suggesting that we should make it more tolerant to multi-cluster setups. i was saying what you suggested about adding the
i like option two, but then i wonder how would that work in a scale from zero scenario? |
I don't think scaling from zero would be a problem, because at least one control plane has to be registered right? I can't think of a Kubernetes cluster that can scale to 0 all its nodes, I can imagine scaling to 0 just some nodepools. Do you think I might be missing something? |
i was thinking that perhaps we would need a way to signal the cluster-name from the scalable resource so that when a node group is scaling from zero replicas the autoscaler could ensure that it is focused on the proper node group. but, i think you are correct about grabbing the cluster name from a control plane node or similar, that should be relatively safe. one of my main concerns here also is to ensure that we don't make this too impenetrable for others to understand, ideally we will solve this in a manner that is clear for users to understand how the cluster name factors in to the decisions. |
That's a very good concern! I would like very much to have it clear for users too, but I can't think of any other way to make it clearer but documentation and good logs that will say something like "nodepool foo-bar is not considered for scaling up/down because it refers to a cluster different from the one identified in the workload cluster". I believe it is still not 100% clear, bu the term workload cluster is defined in the docs, the same docs we will provider further details on how we identify the cluster name (via the label in any registered node). |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
i think this is a good issue to keep open, we just need to find someone who might want to propose the docs and logging change. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
still would like to see this fixed /remove-lifecycle stale |
@elmiko: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Which component are you using?:
cluster-autoscaler for cluster API (CAPI).
What version of the component are you using?:
Component version:
v.1.26.1
What k8s version are you using (
kubectl version
)?:v.1.26.1
What environment is this in?:
CAPI clusters using cluster API Provider AWS (CAPA).
There are 2 clusters (at least) involved in this setup. One is the management cluster which is also managed by CAPI (self-managed). A second cluster is the workload cluster, managed by the management cluster.
In this setup cluster-autoscaler was not configured to have access to the workload cluster, it just has access to the management cluster kubernetes API via its service account. It has access to the CAPI objects of all clusters and tries to do that via its auto-discovery feature.
What did you expect to happen?:
I would expect cluster API to do not consider clusters where it doesn't have access to its kubernetes API.
What happened instead?:
Cluster autoscaler by its autodiscovery feature, identifies
MachineDeployments
for the workload cluster and since it can't access the kuberntes API assumes the nodes are unregistered and deletes them after 15 minutes (by default, configured by--max-node-provision-time
).How to reproduce it (as minimally and precisely as possible):
--cloud-provider=clusterapi
flag. No further config required.MachineDeployments
withcluster.x-k8s.io/cluster-api-autoscaler-node-group-[min|max]-size
.MachineDeployment
of the second cluster is scaled to its minimum even if the nodes are being utilized.Anything else we need to know?:
It seems like currently cluster-autoscaler for CAPI does not plan to manage more than 1 cluster, since all the documented options allows to provide just one kubeconfig. We should document that and consider making the autodiscovery feature disabled by default.
That can lead to outages since the nodes might be under utilization.
I would like to work on this issue.
The text was updated successfully, but these errors were encountered: