You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, if a daemoned steps starts correctly, subsequent steps begin running, and the daemoned step suddenly fails, it will not be retried even if it has a retryStrategy. We should support this retrying behavior, suspending new steps until the daemoned step is successfully restarted.
Use Cases
When would you use this?
To create resilient/HA daemoned steps.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered:
Is there another way for this? Kubernetes comes with a number of ways to run reliable resources. Rather than running a pod as a daemon step, run a deployment using a resource template?
We have a recovery daemon step, watching a main step consisting in a wait for a kubernetes job that uses a broker.
This is a stateless (no worry about HA or status) daemon that just consults the broker and and auxiliar redis to figure out if any batch is lost in the kubernetes job, and republishes it into the broker to be consumed again with subsampling.
Sometimes, this recovery daemon crashes for any reason, and then the main process losses the possibility of recover the lost batches.
It would be really great that on pod deletion or crash of the daemon step, it would have the same retryStrategy available than the not daemon: true steps, despite more complex logics about being HA or blocking child steps, that are nice but no are the most important behaviour obtained on the retryStrategy.
Summary
What change needs making?
Currently, if a daemoned steps starts correctly, subsequent steps begin running, and the daemoned step suddenly fails, it will not be retried even if it has a retryStrategy. We should support this retrying behavior, suspending new steps until the daemoned step is successfully restarted.
Use Cases
When would you use this?
To create resilient/HA daemoned steps.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: