Autonomous multi-agent development with self-verification. PRD in, tested code out.
Current Version: v5.52.4
Loki Mode is a multi-agent system that transforms a Product Requirements Document into a built and tested product. It orchestrates 41 specialized agent types across 8 swarms -- engineering, operations, business, data, product, growth, review, and orchestration -- working in parallel with continuous self-verification.
Every iteration follows the RARV cycle: Reason (read state, identify next task) -> Act (execute, commit) -> Reflect (update continuity, learn) -> Verify (run tests, check spec). If verification fails, the system captures the error as a learning and retries from Reason. This is the core differentiator: code is not "done" until it passes automated verification. See Core Workflow.
What "autonomous" actually means: The system runs RARV cycles without prompting. It does NOT have access to your cloud accounts, payment systems, or external services unless you provide credentials. Human oversight is expected for deployment credentials, domain setup, API keys, and critical decisions. The system can make mistakes, especially on novel or complex problems.
| Project Type | Examples | Typical Duration | Experience |
|---|---|---|---|
| Simple | Landing page, todo app, single API | 5-30 min | Completes independently. Human reviews output. |
| Standard | CRUD app with auth, REST API + React frontend | 30-90 min | Completes most features. May need guidance on complex parts. |
| Complex | Microservices, real-time systems, ML pipelines | 2+ hours | Use as accelerator. Human reviews between phases. |
| Area | What Works | What Doesn't (Yet) |
|---|---|---|
| Code Generation | Full-stack apps from PRDs | Complex domain logic may need human review |
| Deployment | Generates configs, Dockerfiles, CI/CD workflows | Does not deploy -- human provides cloud credentials and runs deploy |
| Testing | 9 automated quality gates, blind review | Test quality depends on AI-generated assertions |
| Multi-Provider | Claude (full), Codex/Gemini (sequential only) | Codex and Gemini lack parallel agents and Task tool |
| Enterprise | TLS, OIDC, RBAC, audit trail | Self-signed certs only; some features require env var activation |
| Dashboard | Real-time status, task queue, agents | Single-machine only; no multi-node clustering |
Requirements: Node.js 18+, Python 3.8+, macOS/Linux/WSL2, and at least one AI CLI (Claude Code, Codex, or Gemini).
npm install -g loki-mode
loki doctor # verify environment
loki start ./prd.md # uses Claude Code by defaultclaude --dangerously-skip-permissions
# Then type: "Loki Mode" or "Loki Mode with PRD at ./my-prd.md"This is the easiest way to try it if you already have Claude Code installed. No separate loki CLI installation needed.
The system classifies your PRD complexity, assembles an agent team, and runs RARV cycles with 9 quality gates. Output is committed to a Git repo with source code, tests, deployment configs, and audit logs. The dashboard auto-starts at http://localhost:57374 for real-time monitoring, or use loki status from the terminal.
Other install methods: Homebrew (brew tap asklokesh/tap && brew install loki-mode), Docker, Git clone, VS Code Extension. See Installation Guide.
Cost: Loki Mode uses your AI provider's API. Simple projects typically consume modest token usage; complex projects with parallel agents use more. Monitor token economics with loki memory economics. See Token Economics for details.
9 slides: Problem, Solution, 41 Agents, RARV Cycle, Benchmarks, Multi-Provider, Full Lifecycle | Download PPTX
Fallback: PRD -> Classifier -> Agent Team (41 types, 8 swarms) -> RARV Cycle <-> Memory System -> Quality Gates (pass/fail loop) -> Output
See full architecture documentation for the detailed view.
Key components:
- RARV Cycle -- Reason-Act-Reflect-Verify with self-correction on failure. Core Workflow
- 41 Agent Types -- 8 swarms auto-composed by PRD complexity. Agent Types
- 9 Quality Gates -- Blind review, anti-sycophancy, severity blocking, mock/mutation detection. Quality Gates
- Memory System -- Episodic, semantic, procedural tiers with progressive disclosure. Memory Architecture
- Dashboard -- Real-time monitoring, API v2, WebSocket at port 57374. Dashboard Guide
- Enterprise Layer -- OTEL, policy engine, audit trails, RBAC, SSO (requires env var activation). Enterprise Guide
| Category | Highlights | Docs |
|---|---|---|
| Agents | 41 types across 8 swarms, auto-composed by PRD complexity | Agent Types |
| Quality | 9 gates: blind review, anti-sycophancy, mock/mutation detection | Quality Gates |
| Dashboard | Real-time monitoring, API v2, WebSocket, auto-starts with loki start |
Dashboard Guide |
| Memory | 3-tier (episodic/semantic/procedural), knowledge graph, vector search | Memory System |
| Providers | Claude (full), Codex (sequential), Gemini (sequential) | Provider Guide |
| Enterprise | TLS, OIDC/SSO, RBAC, OTEL, policy engine, audit trails | Enterprise Guide |
| Integrations | Jira, Slack, Teams, GitHub Actions (Linear: partial) | Integration Cookbook |
| Deployment | Helm, Docker Compose, Terraform configs (AWS/Azure/GCP) | Deployment Guide |
| SDKs | Python (loki-mode-sdk), TypeScript (loki-mode-sdk) |
SDK Guide |
| Provider | Install | Autonomous Flag | Parallel Agents |
|---|---|---|---|
| Claude Code | npm i -g @anthropic-ai/claude-code |
--dangerously-skip-permissions |
Yes (10+) |
| Codex CLI | npm i -g @openai/codex |
--full-auto |
No (sequential) |
| Gemini CLI | npm i -g @google/gemini-cli |
--approval-mode=yolo |
No (sequential) |
Claude gets full features (subagents, parallelization, MCP, Task tool). Codex and Gemini run in sequential mode -- one agent at a time, no Task tool. See Provider Guide for the full comparison.
| Command | Description |
|---|---|
loki start [PRD] |
Start with optional PRD file |
loki stop |
Stop execution |
loki pause / resume |
Pause/resume after current session |
loki status |
Show current status |
loki dashboard |
Open web dashboard |
loki doctor |
Check environment and dependencies |
loki import |
Import GitHub issues as tasks |
loki memory <cmd> |
Memory system CLI (index, timeline, search, consolidate) |
loki enterprise |
Enterprise feature management (tokens, OIDC) |
loki version |
Show version |
Run loki --help for all commands. Full reference: CLI Reference | Configuration: config.example.yaml
Enterprise features are included but require env var activation. Self-audit results: 35/45 capabilities working, 0 broken, 1,314 tests passing (683 npm + 631 pytest). 2 items partial, 3 scaffolding (OTEL/policy active only when configured). See Audit Results.
export LOKI_TLS_ENABLED=true
export LOKI_OIDC_PROVIDER=google
export LOKI_AUDIT_ENABLED=true
export LOKI_METRICS_ENABLED=true
loki enterprise status # check what's enabled
loki start ./prd.md # enterprise features activate via env varsEnterprise Architecture | Security | Authentication | Authorization | Metrics | Audit Logging | SIEM
Results from the included test harness. Self-reported and not independently verified. Verification scripts included so you can reproduce. See benchmarks/ for methodology.
| Benchmark | Result | Notes |
|---|---|---|
| HumanEval | 162/164 (98.78%) | Max 3 retries per problem, RARV self-verification |
| SWE-bench | 299/300 patches generated | Patch generation only -- SWE-bench evaluator not yet run to confirm resolution |
| Source | What We Use From It |
|---|---|
| Anthropic: Building Effective Agents | Evaluator-optimizer pattern, parallelization strategy |
| Anthropic: Constitutional AI | Self-critique against quality principles |
| DeepMind: Scalable Oversight via Debate | Debate-based verification in council review |
| DeepMind: SIMA 2 | Self-improvement loop design |
| OpenAI: Agents SDK | Guardrails, tripwires, tracing patterns |
| NVIDIA ToolOrchestra | Efficiency metrics, reward signal tracking |
| CONSENSAGENT (ACL 2025) | Anti-sycophancy checks in blind review |
| GoalAct | Hierarchical planning for complex PRDs |
Practitioner insights: Boris Cherny -- self-verification loop patterns | Simon Willison -- sub-agents for context isolation | HN Community -- production patterns from real deployments
Full Acknowledgements -- 50+ research papers, articles, and resources
git clone https://github.com/asklokesh/loki-mode.git && cd loki-mode
npm install && npm test # 683 tests, ~10 sec
python3 -m pytest # 631 tests, ~3 sec
bash tests/run-all-tests.sh # shell tests, ~2 minSee CONTRIBUTING.md for guidelines.
MIT -- see LICENSE.
Autonomi | Documentation | Changelog | Installation | Comparisons
