Support retryStrategy for running daemoned steps #5833

simster7 · 2021-05-05T17:09:45Z

Summary

What change needs making?

Currently, if a daemoned steps starts correctly, subsequent steps begin running, and the daemoned step suddenly fails, it will not be retried even if it has a retryStrategy. We should support this retrying behavior, suspending new steps until the daemoned step is successfully restarted.

Use Cases

When would you use this?

To create resilient/HA daemoned steps.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

alexec · 2022-02-01T16:23:10Z

Is there another way for this? Kubernetes comes with a number of ways to run reliable resources. Rather than running a pod as a daemon step, run a deployment using a resource template?

Guillermogsjc · 2022-06-08T10:48:05Z

A simple use case:

We have a recovery daemon step, watching a main step consisting in a wait for a kubernetes job that uses a broker.

This is a stateless (no worry about HA or status) daemon that just consults the broker and and auxiliar redis to figure out if any batch is lost in the kubernetes job, and republishes it into the broker to be consumed again with subsampling.

Sometimes, this recovery daemon crashes for any reason, and then the main process losses the possibility of recover the lost batches.

It would be really great that on pod deletion or crash of the daemon step, it would have the same retryStrategy available than the not daemon: true steps, despite more complex logics about being HA or blocking child steps, that are nice but no are the most important behaviour obtained on the retryStrategy.

Kind regards.

simster7 added the type/feature Feature request label May 5, 2021

alexec added area/spec Changes to the workflow specification. area/daemon-steps labels Feb 7, 2022

sarabala1979 assigned dpadhiar Apr 6, 2022

agilgur5 added the area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries label Oct 6, 2023

agilgur5 added area/retryStrategy Template-level retryStrategy and removed area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Apr 25, 2024

agilgur5 removed the area/spec Changes to the workflow specification. label Oct 18, 2024

agilgur5 mentioned this issue Oct 18, 2024

daemon containers don't support retry strategy #13705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support retryStrategy for running daemoned steps #5833

Support retryStrategy for running daemoned steps #5833

simster7 commented May 5, 2021

alexec commented Feb 1, 2022

Guillermogsjc commented Jun 8, 2022

Support retryStrategy for running daemoned steps #5833

Support retryStrategy for running daemoned steps #5833

Comments

simster7 commented May 5, 2021

Summary

Use Cases

alexec commented Feb 1, 2022

Guillermogsjc commented Jun 8, 2022