Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support retryStrategy for running daemoned steps #5833

Open
simster7 opened this issue May 5, 2021 · 2 comments
Open

Support retryStrategy for running daemoned steps #5833

simster7 opened this issue May 5, 2021 · 2 comments
Assignees
Labels
area/daemon-steps area/retryStrategy Template-level retryStrategy type/feature Feature request

Comments

@simster7
Copy link
Member

simster7 commented May 5, 2021

Summary

What change needs making?

Currently, if a daemoned steps starts correctly, subsequent steps begin running, and the daemoned step suddenly fails, it will not be retried even if it has a retryStrategy. We should support this retrying behavior, suspending new steps until the daemoned step is successfully restarted.

Use Cases

When would you use this?

To create resilient/HA daemoned steps.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@simster7 simster7 added the type/feature Feature request label May 5, 2021
@alexec
Copy link
Contributor

alexec commented Feb 1, 2022

Is there another way for this? Kubernetes comes with a number of ways to run reliable resources. Rather than running a pod as a daemon step, run a deployment using a resource template?

@alexec alexec added area/spec Changes to the workflow specification. area/daemon-steps labels Feb 7, 2022
@Guillermogsjc
Copy link

A simple use case:

We have a recovery daemon step, watching a main step consisting in a wait for a kubernetes job that uses a broker.

This is a stateless (no worry about HA or status) daemon that just consults the broker and and auxiliar redis to figure out if any batch is lost in the kubernetes job, and republishes it into the broker to be consumed again with subsampling.

Sometimes, this recovery daemon crashes for any reason, and then the main process losses the possibility of recover the lost batches.

It would be really great that on pod deletion or crash of the daemon step, it would have the same retryStrategy available than the not daemon: true steps, despite more complex logics about being HA or blocking child steps, that are nice but no are the most important behaviour obtained on the retryStrategy.

Kind regards.

@agilgur5 agilgur5 added the area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries label Oct 6, 2023
@agilgur5 agilgur5 added area/retryStrategy Template-level retryStrategy and removed area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Apr 25, 2024
@agilgur5 agilgur5 removed the area/spec Changes to the workflow specification. label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon-steps area/retryStrategy Template-level retryStrategy type/feature Feature request
Projects
None yet
Development

No branches or pull requests

5 participants