=============================
Run multiple LLM CLIs (Claude Code / Codex CLI / Gemini CLI) in parallel, and compare their I/O, diffs, and lightweight scores in a local UI. A demo mode produces visually rich outputs so you can showcase quickly. Spec Kit is currently excluded from the workflow.
cp .env.example .env(defaults toDEMO_MODE=1)docker compose -f docker-compose.dev.yml up --build- Open UI:
http://localhost:5173(API:http://localhost:8000)
- New run: enter a prompt → select tools (2–3) → Run
- Run detail: React Flow graph (state/time/score, hover to preview logs) → per‑node I/O → Diff (line/word, ignore space/case, Markdown) → Best tool
- PR packaging: “Create PR package (adopt left)” to preview
summary.md/review.md/patch.txt - One‑click demo: “デモ実行(3ツール一括)” runs a showcase with a preset prompt
- Backend:
cd backend && pip install -r requirements.txt && uvicorn app.api.server:app --reload - UI:
cd ui && npm install && npm run dev
- Set
DEMO_MODE=0in.env - Provide API keys in your shell (recommended) or
.env:ANTHROPIC_API_KEY,OPENAI_API_KEY,GOOGLE_API_KEYorGEMINI_API_KEY
- Start as usual. Command/arg preferences will move to profiles (
tools.yml) over time;.envshould mainly hold secrets.
- Create run → show ID:
set -l RUN_ID (curl -s -X POST http://localhost:8000/runs -H 'Content-Type: application/json' -d '{\"prompt\":\"Five ideas for dev tools\",\"tools\":[\"claude_code\",\"codex_cli\",\"gemini_cli\"]}' | jq -r .id); echo $RUN_ID
- Stream progress (SSE):
curl -N http://localhost:8000/stream/runs/$RUN_ID - Fetch detail:
curl -s http://localhost:8000/runs/$RUN_ID | jq . - PR package:
curl -s -X POST http://localhost:8000/runs/$RUN_ID/package_pr -H 'Content-Type: application/json' -d '{\"tool\":\"claude_code\",\"title\":\"Demo PR\"}' | jq .
- Empty output: ensure
DEMO_MODE=1(demo) or real keys are exported, thendocker compose -f docker-compose.dev.yml up -d --build backend - Port conflict: ensure 8000/5173 are free
- Real CLI in Docker: use native backend or a custom image that bundles your CLIs
- Parallel runs + per‑node persistence, SSE:
status/node/log/done - Lightweight evaluation →
best_tool/best_score - Demo mode with rich outputs and simulated progress logs
- PR packaging:
POST /runs/{id}/package_pr→summary.md/review.md/patch.txt - React Flow graph with live log preview on hover
docs/70_demo_playbook.md,docs/75_demo_run.md,docs/80_progress_next.md,docs/85_pr_coderabbit.md,docs/90_supervisor.md
- Minimal (demo):
DEMO_MODE=1,DB_PATH,DATA_DIR,VITE_API_BASE - Real CLIs: mainly provide API keys; the supervisor (
tools.yml) will guide tool choiceANTHROPIC_API_KEY,OPENAI_API_KEY,GOOGLE_API_KEY/GEMINI_API_KEY
- Command overrides (temporary):
TOOL_*_CMD/ARGS/USE_STDINexist but will move to profiles
Note: the dev Docker image does not include vendor CLIs. For real runs, use native backend or build a custom image.