The agent framework where every run is durable, replayable, and resumable by default.
-
Updated
Jul 5, 2026 - Rust
The agent framework where every run is durable, replayable, and resumable by default.
A GPipe implementation in PyTorch
An I/O benchmark for deep Learning applications
Sayiir: simple, embeddable durable workflow engine in Rust with Node.js & Python bindings. Checkpoint-based recovery, no deterministic replay. A simplified alternative to Temporal, Restate, and Airflow.
Very-Low Overhead Checkpointing System
Extending DOLFINx with checkpointing functionality
Keras wrapper that autosaves what ModelCheckpoint cannot.
Execution runtime for intelligent agents with event-sourced, recoverable task orchestration.
Git-like branching, checkpointing, and comparison for AI agent execution paths. pip install agentgit
Weavegraph rust graph/agent/node
An operating system for autonomous research — from literature to manuscript inside a governed, checkpointed loop.
[WIP] Debug TypeScript/JavaScript via TUI. Checkpoint functions, edit state, skip execution. Written in Rust 🦀
Zero-cost, crash-proof LLM pipeline orchestrator. Features disk-based checkpointing, free-tier routing, and structured output. (LangGraph / CrewAI alternative)
A Python package for checkpointing, saving, and loading objects.
Checkpoint and rewind Claude Code runs with repo and context recovery.
A Python package for performing memory-intensive computations in parallel using chunks and checkpointing.
Recoverable long-horizon AI agents — a framework-agnostic reference harness + recovery-faithful live benchmark. Thesis: "Checkpoints Are Compactions" via Re-grounding Recovery. 0.x: v1.0 held until a powered live-LLM study confirms the claims.
Minimal PyTorch training framework — implement three methods, get a full training loop with checkpointing, early stopping, metrics, and a live web dashboard.
Perkunas AI Training Platform is a memory-aware model training and serving system for serious language model experimentation under tight hardware limits. It combines streaming training, rich telemetry, guarded recovery, checkpoint export, and OpenAI-compatible serving.
Add a description, image, and links to the checkpointing topic page so that developers can more easily learn about it.
To associate your repository with the checkpointing topic, visit your repo's landing page and select "manage topics."