use internal exponential backoff to avoid flapping on DB startup #8797

krancour · 2022-05-18T23:06:52Z

Walking through the quickstart, I notice that both the argo-server and workflow controller "flap" while waiting for postgres to become available. On average (on my machine, at least), both components restart 3 times before coming up clean. This is by no means an out of the ordinary thing for k8s apps, however, if either of those components get too far into a crashloop backoff, the overall effect can be that the system as a whole takes longer than it ought to to come up clean.

I wanted to propose that it's fairly easy to implement an exponential backoff (with a low max backoff between retries) internally so that components don't "flap" like this while waiting for their own network-bound dependencies to be satisfied. Speaking from experience, this strategy can allow a system such as this one to start faster and smoother, as a whole.

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

agilgur5 · 2024-10-18T07:19:13Z

I think this can be done within CreateDBSession?

argo-workflows/workflow/controller/config.go

Line 39 in c9b1477

    
           session, err := sqldb.CreateDBSession(wfc.kubeclientset, wfc.namespace, persistence)

using the existing Backoff function and potentially existing transient error detection (although the errors may be fairly different for DB connections and may be DB dependent as well)

krancour added the type/feature Feature request label May 18, 2022

agilgur5 changed the title ~~Proposal: use internal exponential backoff to avoid flapping on startup~~ Proposal: use internal exponential backoff to avoid flapping on DB startup Oct 16, 2024

agilgur5 changed the title ~~Proposal: use internal exponential backoff to avoid flapping on DB startup~~ use internal exponential backoff to avoid flapping on DB startup Oct 16, 2024

agilgur5 added area/controller Controller issues, panics area/server labels Oct 18, 2024

agilgur5 added solution/suggested A solution to the bug has been suggested. Someone needs to implement it. area/workflow-archive and removed area/controller Controller issues, panics area/server labels Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use internal exponential backoff to avoid flapping on DB startup #8797

use internal exponential backoff to avoid flapping on DB startup #8797

krancour commented May 18, 2022

agilgur5 commented Oct 18, 2024 •

edited

Loading

use internal exponential backoff to avoid flapping on DB startup #8797

use internal exponential backoff to avoid flapping on DB startup #8797

Comments

krancour commented May 18, 2022

agilgur5 commented Oct 18, 2024 • edited Loading

agilgur5 commented Oct 18, 2024 •

edited

Loading