Multi-Agent Coding

How we use 3 AI agents as a full engineering team.

Claude Code as the boss. Codex + Gemini as the team. Real patterns from production.

The Setup

We run an animal sanctuary in Japan. 28 cats and dogs. No engineering team.

We built an API marketplace with 30+ integrations, token economy, multi-language support, Docker deployment — using 3 AI coding agents working as a team.

This document is exactly how we orchestrate them.

The Three Agents

┌─────────────────────────────────────────────────┐
│                 Claude Code (Boss)               │
│  "The Old Man" — thinks, plans, decides, codes  │
│  Model: Claude Sonnet/Opus                       │
└──────────┬──────────────────┬────────────────────┘
           │                  │
     ┌─────▼──────┐    ┌─────▼──────┐
     │   Codex    │    │   Gemini   │
     │ "Muscle"   │    │  "Eyes"    │
     │ Bulk work  │    │ Vision +   │
     │ Background │    │ Research   │
     └────────────┘    └────────────┘

Agent	Codename	Best at	Weak at
Claude Code	The Boss	Planning, architecture, complex logic, multi-file edits	Can't see images, limited web search
Codex CLI	Muscle	Bulk refactoring 10+ files, long background jobs, parallel tests	No vision, can't browse web
Gemini CLI	Eyes	Reading images/PDFs/videos, web search, analyzing 1M+ token codebases	High hallucination rate, overclaims solutions

The Dispatch Table

When a task comes in, the Boss decides who handles it:

Task	Who	Why
Read screenshot / PDF / video	Gemini	Only agent with vision
"What's trending in AI this week?"	Gemini	Real-time web search
Analyze a 500K-line codebase	Gemini	Handles massive context windows
Refactor 10+ files	Codex `--full-auto`	Bulk parallel work, won't get distracted
Run test suite for 30 minutes	Codex (background)	Long-running, no interaction needed
Batch rename across project	Codex `--full-auto`	Mechanical, parallelizable
Design new feature architecture	Boss (Claude)	Needs judgment and planning
Debug a subtle logic bug	Boss (Claude)	Needs deep reasoning
Write API endpoint + tests	Boss (Claude)	Multi-step, needs context
Scheduled monitoring	Nebula (cron)	Runs on its own schedule
Cross-platform integration	Nebula	Bridges external services
Everything else	Boss (Claude)	Default — handles 80% of work

The Rules

Rule 1: Boss decides, never the human

The human says what they want. The Boss decides who does it. The human never needs to think about which agent to use.

Human:  "Fix the login bug and update the README"
Boss:   (thinks) Login bug = complex reasoning → I'll do it
        (thinks) README update = mechanical → dispatch to Codex

Rule 2: Trust but verify (by risk level)

Risk	Action	Example
🟢 Low	Use directly	Code formatting, README updates
🟡 Medium	Spot-check	Codex refactoring, Gemini research
🔴 High	Full verification	Anything touching money, database, or deployment

Gemini hallucination rule: The more confident Gemini sounds, the more you should verify. Gemini will say "I've fixed everything!" when it hasn't even found the right file.

Rule 3: Three-decision framework

The Boss follows this decision tree for every action:

Can I decide this myself?     → Just do it (rename a variable, fix a typo)
Is there risk?                → Give options (2-3 approaches, let human pick)
Does it touch money/DB/deploy? → Must ask (never proceed without confirmation)

Rule 4: Cross-agent communication

When the Boss dispatches work, the output is structured so the Boss can immediately use it:

Dispatch to Codex:
  "Refactor all API routes to use the new error handler.
   Files: src/routes/*.ts
   Pattern: replace try/catch with handleError wrapper
   Expected: ~15 files changed, no logic changes"

Codex returns:
  "Done. 14 files changed. 1 file skipped (already used handleError).
   Files: [list]
   Tests: all passing"

Boss verifies → merges

Parallel Development Pattern

The most powerful pattern: running 3 Claude Code terminals simultaneously.

Terminal 1 (Boss):     Feature development
Terminal 2 (Worktree): Bug fixes on separate branch
Terminal 3 (Review):   Code review + testing

How it works

# Terminal 1: Main development
claude code  # Working on feature branch

# Terminal 2: Isolated bug fix (separate git worktree)
# Claude Code creates a worktree automatically
# Changes don't conflict with Terminal 1

# Terminal 3: QA and testing
claude code  # Run tests, review code from other terminals

Communication between terminals

The human acts as message router:

Terminal 1: "I've finished the API endpoint. Tell the reviewer."
Human:      (copy-pastes summary to Terminal 3)
Terminal 3: "Reviewing... Found 2 issues. Tell the developer."
Human:      (copy-pastes issues to Terminal 1)

Tip: Each terminal maintains its own context. Keep messages between them structured and concise — don't paste entire conversations.

Real Example: Building an API Endpoint

Here's how a typical task flows through the system:

Human: "Add a new /api/translate endpoint that supports 5 providers"

Boss (Claude Code):
  1. Plans architecture (provider interface, fallback logic, pricing)
  2. Writes the main endpoint + provider interface
  3. Dispatches to Codex: "Implement 5 provider adapters using this interface"
  4. Dispatches to Gemini: "Research current pricing for DeepL, Google, AWS Translate"
  5. Integrates Codex's adapters + Gemini's pricing data
  6. Writes tests
  7. Commits

Time: 25 minutes (vs 2-3 hours solo coding)

Setup

Prerequisites

# Claude Code (the boss)
# Install: https://docs.anthropic.com/en/docs/claude-code

# Codex CLI (optional — for bulk work)
npm install -g @openai/codex

# Gemini CLI (optional — for vision + research)
npm install -g @google/gemini-cli

Configuration

Add to your CLAUDE.md (project-level instructions):

## Agent Dispatch Rules

| Task | Agent |
|------|-------|
| Images/PDFs/videos, web search, large codebase analysis | Gemini |
| Bulk refactoring 10+ files, background tests, mechanical tasks | Codex --full-auto |
| Scheduled monitoring, cross-platform integration | Nebula |
| Everything else | Claude Code (self) |

## Dispatch Commands
- Codex: `codex exec "..." --full-auto --skip-git-repo-check`
- Gemini: `gemini -m gemini-3-flash -p "..."`

## Verification by Risk
- 🟢 Low → use directly
- 🟡 Medium → spot-check
- 🔴 High (money/DB/deploy) → full verification

Lessons Learned

After 6 months of multi-agent development:

What works

Parallel terminals — 3x throughput on independent tasks
Codex for bulk — Renaming 20 files in 2 minutes instead of 20
Gemini for research — Reading a 50-page PDF and summarizing in 30 seconds
Clear dispatch rules — No time wasted deciding "which agent should do this?"

What doesn't work

Gemini for code — It hallucinates function signatures and claims success when it failed
Codex for architecture — It follows instructions literally but can't make judgment calls
Too many agents — 3 is the sweet spot. More than that = more coordination overhead than value
Trusting Gemini's confidence — "I've fixed everything!" → nothing was fixed

The 80/20 rule

In practice, Claude Code handles 80% of all work. Codex handles 15% (bulk tasks). Gemini handles 5% (vision + research). Don't over-engineer the orchestration — most tasks are best handled by one smart agent, not a committee.

Related Projects

Project	Description
112 Claude Code Skills	Everything we learned, extracted as reusable skills
AI API Benchmark	Monthly tests of 30+ AI APIs from Tokyo
AI Prompt Mastery	One prompt to make any AI respond like an expert

_{Built at Washin Village — an animal sanctuary in Japan where 28 cats & dogs and 3 AI agents work together.}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
examples		examples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Coding

The Setup

The Three Agents

The Dispatch Table

The Rules

Rule 1: Boss decides, never the human

Rule 2: Trust but verify (by risk level)

Rule 3: Three-decision framework

Rule 4: Cross-agent communication

Parallel Development Pattern

How it works

Communication between terminals

Real Example: Building an API Endpoint

Setup

Prerequisites

Configuration

Lessons Learned

What works

What doesn't work

The 80/20 rule

Related Projects

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

License

sstklen/multi-agent-coding

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Coding

The Setup

The Three Agents

The Dispatch Table

The Rules

Rule 1: Boss decides, never the human

Rule 2: Trust but verify (by risk level)

Rule 3: Three-decision framework

Rule 4: Cross-agent communication

Parallel Development Pattern

How it works

Communication between terminals

Real Example: Building an API Endpoint

Setup

Prerequisites

Configuration

Lessons Learned

What works

What doesn't work

The 80/20 rule

Related Projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Packages