use internal exponential backoff to avoid flapping on DB startup #8797
Labels
area/workflow-archive
solution/suggested
A solution to the bug has been suggested. Someone needs to implement it.
type/feature
Feature request
Walking through the quickstart, I notice that both the argo-server and workflow controller "flap" while waiting for postgres to become available. On average (on my machine, at least), both components restart 3 times before coming up clean. This is by no means an out of the ordinary thing for k8s apps, however, if either of those components get too far into a crashloop backoff, the overall effect can be that the system as a whole takes longer than it ought to to come up clean.
I wanted to propose that it's fairly easy to implement an exponential backoff (with a low max backoff between retries) internally so that components don't "flap" like this while waiting for their own network-bound dependencies to be satisfied. Speaking from experience, this strategy can allow a system such as this one to start faster and smoother, as a whole.
Message from the maintainers:
Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.
The text was updated successfully, but these errors were encountered: