Agents that run until done
Define what "done" means. Go to sleep. Wake up to verified results.
Proofloop is an AI agent orchestrator that solves the main problem with coding agents on real-world tasks: long tasks require constant user-in-the-loop.
When a task takes more than an hour (often several hours), a regular agent becomes a project you need to babysit:
- "continue", "try differently", "now run tests", "fix the regression"
- Manual verification after every step
- Lost context between sessions and iterations
- Subjective "seems done" instead of proven results
Proofloop changes the paradigm: describe what you want (detailed or brief — your choice). During planning, the agent proposes assumptions, risks, a step-by-step plan, and verifiable conditions for "done". You review and approve — then the agent works autonomously until verified completion.
|
Agents alone You: "Migrate our monolith to microservices"
Agent works... extracts user service...
- [usage limit]
- "Progress: user-service extracted.
- TODO: orders, payments, gateway..."
*Next day*
You: "Continue migration..."
Agent works...
- [usage limit]
- "orders-service done.
- TODO: payments, gateway, tests..."
*This goes on for a week*
*Then integration bugs everywhere* |
Proofloop You: "Migrate our monolith to microservices"
Conditions:
- All services pass health checks
- Integration tests green
- Zero downtime deployment works
- "Data consistency verified across services"
+ *You go to sleep*
+ Agent works... extracts services
+ ✗ integration tests fail → retry
+ Agent works... fixes contracts
+ ✗ deployment failing → retry
+ Agent works... 47 iterations later
+ ✓ All conditions pass
+ "Done. 8 hours." |
1. Install
curl -LsSf https://raw.githubusercontent.com/exiw-ai/proofloop/main/install.sh | sh2. Setup provider (choose one)
Proofloop orchestrates existing AI agents — install whichever you prefer:
OpenCode
npm i -g opencode-ai@latest
opencode # Interactive setupCodex (ChatGPT Plus/Pro)
npm i -g @openai/codex
codex # OAuth loginClaude Code
# Install: https://claude.com/download or npm i -g @anthropic-ai/claude-code
claude login3. Run
proofloop run "Implement OAuth2 with Google, GitHub, and email/password auth" \
--path ./my-project \
--provider <provider>Where <provider> is one of: opencode, codex, claude
╭──────────────────────────────────────────────────────────────────────────────╮
│ │
│ proofloop - agents that run until done │
│ │
│ Global Options: │
│ -v, --verbose Enable verbose output │
│ -V, --version Show version and exit │
│ --help Show this help message │
│ │
│ proofloop run <description> -p <path> --provider <provider> │
│ Run a coding task autonomously. │
│ │
│ Required: │
│ -p, --path PATH Workspace path │
│ --provider NAME Agent: claude, codex, opencode │
│ Options: │
│ -y, --auto-approve Skip interactive approvals │
│ -t, --timeout HOURS Timeout (default: 4) │
│ │
│ proofloop task list │
│ List all tasks. │
│ │
│ proofloop task status <task_id> │
│ Show task status. Accepts full UUID or 4+ char prefix. │
│ │
│ proofloop task resume <task_id> │
│ Resume a stopped task. │
│ │
│ Examples: │
│ proofloop run "Migrate to microservices" -p ./backend --provider claude │
│ proofloop run "Add multi-tenancy" -p . --provider codex │
│ proofloop task resume a1b2 --provider opencode │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
- Completion conditions — Automated (
pytest,mypy,make build) or text-based ("API returns 200 for all endpoints") - No limits — Runs for hours, handles 50+ iterations, retries failures automatically
- Fire and forget — Start before bed, wake up to verified results
- Independent verification — Conditions checked by running actual commands, not agent self-assessment
- Smart supervisor — Detects loops, stagnation, regressions; decides retry vs rollback vs stop
- Multi-provider — Uses OpenCode, Codex, or Claude Code under the hood
- Multi-repo — Coordinates changes across multiple repositories
proofloop run "Implement real-time notifications system with WebSocket server, \
React hooks, PostgreSQL pub/sub, and comprehensive test coverage" \
--path ./myapp \
--provider <provider>proofloop run "Migrate from MongoDB to PostgreSQL: schema design, \
data migration scripts, update all repositories and services, \
ensure zero data loss" \
--path ./backend -t 8 \
--provider <provider># ~/company/
# ├── api/ (Go backend)
# ├── web/ (React frontend)
# └── mobile/ (React Native)
proofloop run "Add end-to-end encryption for messages: \
implement in API, update web and mobile clients, \
add key rotation, write integration tests" \
--path ~/company -t 6 \
--provider <provider>proofloop run "Convert jQuery frontend to React: \
component architecture, state management with Zustand, \
preserve all existing functionality, add TypeScript" \
--path ./legacy-app -t 10 \
--provider <provider>proofloop run "..." -p . --provider opencode # OpenCode
proofloop run "..." -p . --provider codex # Codex (ChatGPT)
proofloop run "..." -p . --provider claude # Claude Codeproofloop task list # List all tasks
proofloop task status 550e # Check status (short ID)
proofloop task resume 550e --provider claude # Resume stopped taskflowchart TB
subgraph You
A[Describe task]
F[Review & approve]
K[Get results]
end
subgraph Proofloop
B[Intake: analyze project]
C[Inventory: discover checks]
D[Plan: create steps]
E[Conditions: define success]
G[Delivery: execute plan]
H[Verify: run all checks]
I{All pass?}
J[Supervisor: analyze failure]
end
A --> B --> C --> D --> E --> F
F --> G --> H --> I
I -->|No| J --> G
I -->|Yes| K
| Phase | What happens |
|---|---|
| Intake | Scans project structure, detects stack |
| Inventory | Discovers tests, linters, type checkers |
| Plan | Breaks task into implementation steps |
| Conditions | Defines success criteria (automated + text-based) |
| You approve | Review plan, adjust conditions, then approve |
| Delivery | Agent executes all steps |
| Verify | Runs every condition, collects evidence |
| Supervisor | On failure: analyzes, decides retry/rollback/stop |
| Loop | Repeats until all conditions pass or budget exhausted |
Automated — linked to commands:
pytest tests/passesmake buildsucceedsmypy --strictclean
Text-based — verified by agent each iteration:
- "API handles 1000 req/s under load test"
- "All UI components render without console errors"
- "Database queries use indexes, no full table scans"
- Getting Started — Installation and first task
- User Guide — Workflows and features
- Reference — CLI commands and options
git clone https://github.com/exiw-ai/proofloop.git
cd proofloop
make dev # Install dev dependencies
make check # Run all checksSee CONTRIBUTING.md for guidelines.
Apache 2.0 — see LICENSE.