Skip to content

AI orchestrator that runs until done. Define what "done" means, go to sleep, wake up to verified results. Supports OpenCode, Codex, Claude Code,

License

Notifications You must be signed in to change notification settings

exiw-ai/proofloop

Proofloop

Agents that run until done
Define what "done" means. Go to sleep. Wake up to verified results.

Python 3.12+ License CI Stars


Why Proofloop

Proofloop is an AI agent orchestrator that solves the main problem with coding agents on real-world tasks: long tasks require constant user-in-the-loop.

When a task takes more than an hour (often several hours), a regular agent becomes a project you need to babysit:

  • "continue", "try differently", "now run tests", "fix the regression"
  • Manual verification after every step
  • Lost context between sessions and iterations
  • Subjective "seems done" instead of proven results

Proofloop changes the paradigm: describe what you want (detailed or brief — your choice). During planning, the agent proposes assumptions, risks, a step-by-step plan, and verifiable conditions for "done". You review and approve — then the agent works autonomously until verified completion.


The Problem

Agents alone

You: "Migrate our monolith to microservices"

  Agent works... extracts user service...
- [usage limit]

- "Progress: user-service extracted.
-  TODO: orders, payments, gateway..."

  *Next day*

  You: "Continue migration..."
  Agent works...
- [usage limit]

- "orders-service done.
-  TODO: payments, gateway, tests..."

  *This goes on for a week*
  *Then integration bugs everywhere*

Proofloop

You: "Migrate our monolith to microservices"

Conditions:
  - All services pass health checks
  - Integration tests green
  - Zero downtime deployment works
  - "Data consistency verified across services"

+ *You go to sleep*

+ Agent works... extracts services
+ ✗ integration tests fail → retry

+ Agent works... fixes contracts
+ ✗ deployment failing → retry

+ Agent works... 47 iterations later
+ ✓ All conditions pass

+ "Done. 8 hours."

Quickstart

1. Install

curl -LsSf https://raw.githubusercontent.com/exiw-ai/proofloop/main/install.sh | sh

2. Setup provider (choose one)

Proofloop orchestrates existing AI agents — install whichever you prefer:

OpenCode
npm i -g opencode-ai@latest
opencode  # Interactive setup
Codex (ChatGPT Plus/Pro)
npm i -g @openai/codex
codex  # OAuth login
Claude Code
# Install: https://claude.com/download or npm i -g @anthropic-ai/claude-code
claude login

3. Run

proofloop run "Implement OAuth2 with Google, GitHub, and email/password auth" \
  --path ./my-project \
  --provider <provider>

Where <provider> is one of: opencode, codex, claude


CLI

╭──────────────────────────────────────────────────────────────────────────────╮
│                                                                              │
│  proofloop - agents that run until done                                      │
│                                                                              │
│  Global Options:                                                             │
│    -v, --verbose    Enable verbose output                                    │
│    -V, --version    Show version and exit                                    │
│    --help           Show this help message                                   │
│                                                                              │
│  proofloop run <description> -p <path> --provider <provider>                 │
│    Run a coding task autonomously.                                           │
│                                                                              │
│    Required:                                                                 │
│      -p, --path PATH           Workspace path                                │
│      --provider NAME           Agent: claude, codex, opencode                │
│    Options:                                                                  │
│      -y, --auto-approve        Skip interactive approvals                    │
│      -t, --timeout HOURS       Timeout (default: 4)                          │
│                                                                              │
│  proofloop task list                                                         │
│    List all tasks.                                                           │
│                                                                              │
│  proofloop task status <task_id>                                             │
│    Show task status. Accepts full UUID or 4+ char prefix.                    │
│                                                                              │
│  proofloop task resume <task_id>                                             │
│    Resume a stopped task.                                                    │
│                                                                              │
│  Examples:                                                                   │
│    proofloop run "Migrate to microservices" -p ./backend --provider claude   │
│    proofloop run "Add multi-tenancy" -p . --provider codex                   │
│    proofloop task resume a1b2 --provider opencode                            │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

Features

  • Completion conditions — Automated (pytest, mypy, make build) or text-based ("API returns 200 for all endpoints")
  • No limits — Runs for hours, handles 50+ iterations, retries failures automatically
  • Fire and forget — Start before bed, wake up to verified results
  • Independent verification — Conditions checked by running actual commands, not agent self-assessment
  • Smart supervisor — Detects loops, stagnation, regressions; decides retry vs rollback vs stop
  • Multi-provider — Uses OpenCode, Codex, or Claude Code under the hood
  • Multi-repo — Coordinates changes across multiple repositories

Usage Examples

Full-stack feature with tests

proofloop run "Implement real-time notifications system with WebSocket server, \
  React hooks, PostgreSQL pub/sub, and comprehensive test coverage" \
  --path ./myapp \
  --provider <provider>

Database migration

proofloop run "Migrate from MongoDB to PostgreSQL: schema design, \
  data migration scripts, update all repositories and services, \
  ensure zero data loss" \
  --path ./backend -t 8 \
  --provider <provider>

Multi-repo refactoring

# ~/company/
# ├── api/        (Go backend)
# ├── web/        (React frontend)
# └── mobile/     (React Native)

proofloop run "Add end-to-end encryption for messages: \
  implement in API, update web and mobile clients, \
  add key rotation, write integration tests" \
  --path ~/company -t 6 \
  --provider <provider>

Legacy modernization

proofloop run "Convert jQuery frontend to React: \
  component architecture, state management with Zustand, \
  preserve all existing functionality, add TypeScript" \
  --path ./legacy-app -t 10 \
  --provider <provider>

Available providers

proofloop run "..." -p . --provider opencode  # OpenCode
proofloop run "..." -p . --provider codex     # Codex (ChatGPT)
proofloop run "..." -p . --provider claude    # Claude Code

Task management

proofloop task list                            # List all tasks
proofloop task status 550e                     # Check status (short ID)
proofloop task resume 550e --provider claude   # Resume stopped task

How It Works

flowchart TB
    subgraph You
        A[Describe task]
        F[Review & approve]
        K[Get results]
    end

    subgraph Proofloop
        B[Intake: analyze project]
        C[Inventory: discover checks]
        D[Plan: create steps]
        E[Conditions: define success]

        G[Delivery: execute plan]
        H[Verify: run all checks]
        I{All pass?}
        J[Supervisor: analyze failure]
    end

    A --> B --> C --> D --> E --> F
    F --> G --> H --> I
    I -->|No| J --> G
    I -->|Yes| K
Loading
Phase What happens
Intake Scans project structure, detects stack
Inventory Discovers tests, linters, type checkers
Plan Breaks task into implementation steps
Conditions Defines success criteria (automated + text-based)
You approve Review plan, adjust conditions, then approve
Delivery Agent executes all steps
Verify Runs every condition, collects evidence
Supervisor On failure: analyzes, decides retry/rollback/stop
Loop Repeats until all conditions pass or budget exhausted

Conditions

Automated — linked to commands:

  • pytest tests/ passes
  • make build succeeds
  • mypy --strict clean

Text-based — verified by agent each iteration:

  • "API handles 1000 req/s under load test"
  • "All UI components render without console errors"
  • "Database queries use indexes, no full table scans"

Docs


Development

git clone https://github.com/exiw-ai/proofloop.git
cd proofloop
make dev      # Install dev dependencies
make check    # Run all checks

See CONTRIBUTING.md for guidelines.


License

Apache 2.0 — see LICENSE.

About

AI orchestrator that runs until done. Define what "done" means, go to sleep, wake up to verified results. Supports OpenCode, Codex, Claude Code,

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages