Skip to content

Fix JujuMachine getting stuck after cluster not ready #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2023

Conversation

Cynerva
Copy link
Contributor

@Cynerva Cynerva commented Mar 16, 2023

Quick fix for a case I hit that prevented my kubernetes-worker machine from coming up.

This will hit the exponential backoff as is. @stonepreston did you get a chance to look into changing the exponential backoff limit to something more reasonable? Should I change this to a requeueAfter?

@stonepreston
Copy link
Contributor

stonepreston commented Mar 16, 2023

yeah requeue after is probably a better bet, but it wont make too much of a difference since there are other requeue: True calls that happen (and the reason we hit the max backoff is due to the amount it takes for the charms to come up). Another solution would be to implement that cluster watch so that a reconcile is triggered whenever the cluster status is updated, then the requeue wouldnt be needed at all (but again, this wouldnt amount to much since we are gonna be waiting for charms to come up anyways)

If you find/replace Requeue: true with RequeueAfter: requeueTime in the machine controller it would cut down on the waiting game. Currently we end up waiting around 30 minutes for a machine to be marked ready because it takes quite a bit of time for the charms to come up active, and we hit 1 or 2 of the 16 minute requeues

I did find that its possible to configure a custom backoff as part of controller creation: see here and this blog post for some details. I plan on implementing this in my upcoming status PR for the control plane provider, but for now have simply converted all my requeues to requeueAfters in the meantime.

Co-authored-by: Stone Preston <stonepreston@users.noreply.github.com>
@Cynerva
Copy link
Contributor Author

Cynerva commented Mar 17, 2023

Cool, thanks for the info and the suggestion. I've committed the change so this should be good to go.

@stonepreston stonepreston merged commit 07a8c93 into main Mar 17, 2023
@stonepreston stonepreston deleted the gkk/fix-machine-stuck-after-cluster-not-ready branch March 17, 2023 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants