Implement durable execution with automatic recovery from checkpoints

## Summary

Add checkpoint-based execution so workflows can be resumed from the last checkpoint if a pod dies mid-execution.

## Problem

Currently there is no automatic replay from checkpoint. If a pod dies mid-workflow, the entire step must be retried. Users must rely on external state stores and step-level retries.

## Proposed change

- Checkpoint API for steps to persist intermediate state
- Controller-driven recovery that resumes from the last checkpoint
- Integration with existing storage offloading infrastructure

## References

- Roadmap: https://bubustack.io/docs/community/roadmap (Known gaps, Near-term vision)
- Durable semantics: https://bubustack.io/docs/overview/durable-semantics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement durable execution with automatic recovery from checkpoints #42

Summary

Problem

Proposed change

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement durable execution with automatic recovery from checkpoints #42

Description

Summary

Problem

Proposed change

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions