Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry logic around describe-cluster doesn't handle rate-limiting #999

Open
cartermckinnon opened this issue Aug 18, 2022 · 2 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@cartermckinnon
Copy link
Member

(relayed from an internal ticket)

What happened:

aws eks wait cluster-active may get rate-limited (TooManyRequestsException) and cause the bootstrap script to terminate, instead of falling back to the retry logic around aws eks describe-cluster.

What you expected to happen:

The describe-cluster call should be retried the desired number of times, despite rate-limiting errors.

@orirawlings
Copy link

@cartermckinnon What ever happened with #1004?

We're facing a similar issue where aws eks wait cluster-active fails due to a transient timeout with the AWS API and then a node gets stuck without joining the cluster (which has other knock-on effects, wedging cluster-autoscaler).

2024-03-20T15:00:46+0000 [eks-bootstrap] INFO: --b64-cluster-ca or --apiserver-endpoint is not defined, describing cluster...

Connect timeout on endpoint URL: "https://eks.us-west-2.amazonaws.com/clusters/eks-prod-us-west-2"
Exited with error on line 358

It seems like the patch in #1004 would fix our problem, but it appears it was closed after sitting for a long time.

@cartermckinnon
Copy link
Member Author

The best thing to do here is to pass --apiserver-endpoint and --b64-cluster-ca and avoid the DescribeCluster call entirely. This fallback mechanism has been removed in our AL2023 AMI's.

I'll see if we can reboot the PR, in any case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants