Skip to content

calionauta/stelow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

500 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stelow logo

stelow · opinionated agentic product workflow

Ask DeepWiki Ask zRead CI Coverage Version Pi Muxy Herdr CLI

I'm trying to make ai agents behave less like coding assistants and more like cross-functional product teams.

This package brings Shape Up methodology to AI coding agents. Instead of open-ended feature lists, you shape proposals with clear scope boundaries, validate them through adversarial critique, and generate typed technical scopes ready for autonomous execution.

Built by a former product manager and developer, for AI agents and humans. I've led product teams, taught product leadership, advised product strategy, and written code across the full stack. stelow is that experience, systematized — no conference-room theory, no abstract architecture. Lessons from live products, shipped features, real teams, and real codebases. More about my background.


🎯 "Measure thrice, cut once" - applies to product decisions, not just code.

Key differentiators:

  • Shape Up methodology for AI agents - IN/OUT scope boundaries, appetite-driven sizing, risk analysis, focused scoping. Every proposal is a shaped bet, not a wishlist.
  • Appetite × Review Mode stage control - Two orthogonal dimensions control the full workflow: how deep to prepare (Appetite: Lean / Core / Complete) and which gates run (Review Mode: Auto / Product Spec Gate / Product Spec + Interface Gates / Product Spec + Interface + Scopes / Product Spec + Interface + Tech Review). The cascade propagates automatically through critique depth, supervisor use, verification rigor, and gate requirements - no manual stage skipping needed.
  • Adversarial plan critique - Plans are reviewed for gaps, risks, and assumptions by parallel (fresh context) reviewers, not just approved in chat.
  • Visual review gate - Plannotator opens the full plan for point-by-point comments before implementation, not a rubber-stamp approval.
  • Appetite-scaled interface exploration - 1, 3, or 5 ASCII archetypes plus hybrid depending on scope depth - no coded mockups wasted.
  • Product domain libraries - 8 domains auto-detected from your language (Pricing, Trust, Ads, Promotions, Open Source, Health, Marketplace, Business Models).
  • Typed technical scopes - feature, spike, optimize, test-* with dependency mapping and sequencing for autonomous execution.
  • Acceptance-based scope execution - each scope is delegated with a contract (criteria, verify commands, stop rules). On acceptance-native harnesses (e.g. pi-subagents), the child self-corrects in the same context before returning. On other harnesses, the parent re-delegates with feedback until criteria pass or max iterations exhaust.
  • Audit gap-to-scope loop — post-execution audit classifies gaps (FIXED / DOCUMENTED / ESCALATED). ESCALATED gaps become new scopes in the tracking file. /sw-next enforces the loop: when pending scopes exist at the Audit phase, it blocks completion and resets to Execution. The cycle repeats until no scopes remain pending.
  • Bidirectional product ↔ tech flow — tech constraints and opportunities inform product decisions before execution. Tech Preview uses cymbal for appetite-gated codebase recon; Alignment Check catches product-vs-tech misalignment with mode-dependent resolution (auto or user-flagged).
  • Stack-matched skills + fresh docs — during execution setup, the workflow discovers skills (via npx skills) optimized for the chosen tech stack and fetches current library docs (via ctx7). Both skip if already installed or unavailable. Skills install in project scope only, after user confirmation.
  • Real-time TUI tracking - see workflow state as it progresses through all stages.
  • Pulse — autonomous inbox processing — background cron-driven system periodically checks your inbox and auto-creates workflows with review_mode=Auto (no gates, no questions). Items needing human review skip Pulse and land in the interactive inbox for manual triage, preventing silent loops on ambiguous requests.

📋 Table of Contents


Why stelow

"Let's go slow to go fast: invest time in thorough planning to gain speed and deliver value in execution."

Traditional AI development: "Here's what I want. Start coding."

With stelow: The user just says:

/sw-start "Here's what I want to build"

And the workflow begins asking questions, exploring scope, shaping the proposal, reviewing for gaps, getting visual approval, and only then generating typed technical scopes for execution.

Critique → Gate → Scope sequencing. Execution (stage 12) only runs after all three pass. Lighter review modes (Auto/Product Spec Gate) skip some gates; the full path is there when you need it.

The Problem

Building products with AI agents often leads to:

  • Scope creep and unclear boundaries - defining what not to build is harder than what to build
  • Plans without adversarial review - no one questions assumptions before coding begins
  • Technical work before business validation - shipping features that shouldn't exist
  • No systematic testing for AI-generated code - AI writes fast, but also writes wrong
  • Generic workflows missing product-specific insights - pricing, trust, ads, and launch strategy are product decisions, not code decisions

What stelow does

A structured workflow that makes AI think like a product manager:

  • Measure thrice, cut once - shapes proposals with IN/OUT boundaries BEFORE coding
  • Strategic exploration - Job To Be Done, Opportunity Mapping, Evolutionary Principles, Market Analysis, and Product Discovery knowledge integrated
  • Adversarial critique - reviews every plan for gaps, risks, and assumptions
  • Visual review gate - Plannotator opens the full plan for point-by-point comments (not just chat)
  • Interface exploration in ASCII art - visualize 5 different approaches in seconds, no coding wasted, then LLM creates a hybrid version combining the best points for the context
  • Domain libraries - auto-detects 8 product domains (Pricing, Trust, Ads, etc.) from your language
  • Technical scope mapping - breaks down into typed scopes, maps dependencies, sequences execution
  • AI-aware testing strategy - for software products, with coverage targets, CI gates, and contextual evaluation of mutation testing for critical paths
  • Greenfield & Brownfield - works for new products and existing product evolution

Key Features

  • 24 sub-skills organized into 4 layers - orchestrator + strategies + workflow stages + tactics
  • Part of a broader ecosystem of 25 skills within the project (plus additional skills from other packages in the user's agent environment)
  • Real-time TUI tracking with visual status overlay (/sw-status)
  • Gate approval via Plannotator - review, comment, approve or reject before implementation
  • Typed scopes for autonomous execution (feature, spike, test-*, optimize)

🎚️ Appetite & Review Mode

The workflow is controlled by two orthogonal dimensions: Appetite (declared by the human) and Review Mode (declared by the human). Appetite controls scope/exploration depth. Review Mode controls which gates, questions, and approvals are active.

Appetite (Constraint, Not Estimate)

Appetite is the scope and exploration budget - how much product depth the human wants prepared before execution.

Appetite is a constraint, not an estimate. Unlike traditional estimation (which asks "how long will this take?"), appetite asks "how much is this worth?" before the work is defined. This forces scope cuts to fit the budget - the budget never expands.

Appetite What it means Scope depth Interface exploration Supervisor Testing Best for
Lean Validate an idea fast. Minimal scope ceremony. 1 minimal feature, 1-2 scopes 1 suggested interface; no alternative exploration Low sensitivity Smoke tests + critical-path unit tests; a11y lint/static if UI exists Idea validation, spike, throwaway prototype
Core (default) Standard product feature. Enough depth to catch obvious gaps. Main JTBD, 3-5 scopes 3 interface archetypes explored + 1 hybrid recommendation Medium sensitivity Unit tests + integration tests for external seams; a11y codebase/browserless audit if UI exists Most features, bug fixes, small improvements
Complete Multi-feature or high-risk product work. 8-15 scopes, full edge mapping 5 interface archetypes explored + 1 hybrid recommendation High sensitivity Unit + integration + behavior/e2e + security tests; live a11y audit if UI exists Critical features, high-risk changes, production releases

Cut policy implied by appetite:

Appetite What to cut first
Lean Edge cases, secondary flows, alternative strategies, non-critical integrations. Keep only the happy path.
Core Low-value variants. Keep the main JTBD, obvious edge cases, and one alternative only if it changes the core flow.
Complete Cut nothing unless impossible. Keep full edge case mapping, multiple implementation strategies, and domain context.

The Shape Up stage runs a mechanical check (scope count, spec size) and writes a preliminary appetite_fit in the spec frontmatter. The Plan Critique stage validates it via its fresh-context feasibility reviewer (see cali-product-plan-critique checklists — Scope Fit dimension). This uses the existing 5-reviewer infrastructure instead of adding a dedicated subagent.

appetite_fit Meaning
fits Proposal fits within appetite - proceed as shaped
cuts_needed Proposal almost fits but needs targeted cuts (LLM suggests what; human decides)
reshape Proposal fundamentally exceeds appetite - must be reshaped before continuing

This is not an estimate. The LLM does not estimate effort - it checks whether the shaped design fits the human's declared budget. If it doesn't fit, the LLM proposes cuts or reshaping, never an appetite extension. The final decision is always human.

All three appetites benefit from appetite_fit validation by the Plan Critique's fresh-context feasibility reviewer — this uses the existing 5-reviewer infrastructure, no dedicated subagent needed. The Shape Up stage provides only a preliminary mechanical check (scope count, spec size). This aligns appetite_fit with the workflow's convention: all critical evaluations use fresh context via the Plan Critique stage.

Critique and Gate are Review Mode controls, not Appetite controls. Product Critique and Plannotator Gate are governed by Review Mode: Auto skips gates; all other modes run the configured gates. Appetite changes the depth of the shaped proposal, interface exploration, supervisor sensitivity, and test scope breadth — not whether quality gates exist.

Appetite-specific execution budget:

Area Lean Core Complete
Spec + scopes ~1 page; 1-2 scopes; one direct implementation path ~3 pages; 3-5 scopes; 1-2 implementation alternatives with brief rationale ~8+ pages; 8-15 scopes; 3-5 alternatives with trade-offs
Cut policy Cut edge cases, secondary flows, alternative strategies, non-critical integrations. Keep the happy path. Cut low-value variants. Keep main JTBD, obvious edge cases, and one alternative only if it changes the core flow. Cut nothing unless impossible. Keep full edge mapping, multiple strategies, and domain context.
Interface exploration 1 suggested interface only 3 archetypes explored + 1 hybrid recommendation 5 archetypes explored + 1 hybrid recommendation
Supervisor Low sensitivity Medium sensitivity High sensitivity
Testing Smoke tests + critical-path unit tests Unit tests + integration tests for external seams Unit + integration + behavior/e2e + security tests
Quality baseline Build/test/lint/typecheck always; a11y lint/static if UI exists Build/test/lint/typecheck always; a11y codebase/browserless audit if UI exists Build/test/lint/typecheck always; live a11y audit if UI exists

Review Mode

Review Mode controls the breadth of human review — which gates, questions, and approvals are active. Unlike Appetite (depth of scope), Review Mode determines the level of human oversight during the workflow.

Review Mode is set explicitly during the setup phase via ask_user_question. It is NOT auto-detected.

Review Mode Plannotator Gates Interface IN/OUT Confirmation Tech Approval Best for
Auto None LLM decides LLM decides Auto Throwaway prototype, quick validation, spike
Product Spec Gate 1 pre-tech LLM decides LLM decides Auto Standard feature, bug fix, small improvement
Product Spec + Interface Gates 1 pre-tech + Int.Gate User chooses LLM decides Auto Feature where interface matters
Product Spec + Interface + Scopes Gate + Int.Gate User chooses User confirms Auto Critical feature, product with domain context
Product Spec + Interface + Tech Review Gate + Int.Gate + Plan.Gate User chooses User confirms Gate + tech Qs Full pipeline, high-risk changes, production
Product Spec + Interface + Tech Review + Code Diff Gate + Int.Gate + Plan.Gate + Diff.Gate User chooses User confirms Gate + tech Qs + code diff Maximum oversight, critical infrastructure

Key rules:

  • Auto: No gates, no Plannotator, no questions. LLM decides everything. Quickest path.
  • Product Spec Gate: One Plannotator gate (spec-product visual approval before tech planning). AI resolves all gaps. Interface auto-generated, no choice. No IN/OUT confirmation.
  • Product Spec + Interface Gates: Product spec gate + interface gate. User chooses between generated interface alternatives. AI resolves trivial gaps, asks about moderate/critical.
  • Product Spec + Interface + Scopes: All product gates active (pre-tech + scope IN/OUT + int-gate). User confirms boundaries. Tech approval uses Auto.
  • Product Spec + Interface + Tech Review: Everything in product review + tech plan goes through Plannotator gate + user answers technical questions.
  • Product Spec + Interface + Tech Review + Code Diff: All the above + Plannotator code diff review on the working tree after verification. Maximum human oversight for critical changes.

How Appetite & Review Mode Interact

Review Mode controls WHAT runs (breadth)      →  Which gates are active
Appetite controls HOW DEEP it runs             →  Scope depth per gate
Lean Core Complete
Auto No gates. Fastest path: smaller spec, minimal verify. No gates. Standard planning depth, standard verify. No gates. Deep planning, full verify.
Product Spec + Interface + Scopes 2 gates (Gate + Int.Gate). User confirms IN/OUT. 2 gates + IN/OUT confirmation. Full workflow. 2 gates + all questions. No shortcuts.
Product Spec + Interface + Tech Review + Code Diff 4 gates + plan-gate + diff-gate. Full review. 4 gates + all questions. Max oversight. 4 gates + all questions + code diff review. No shortcuts.

Examples:

  • Lean + Auto → Fastest path: no gates, no questions, no Plannotator. LLM decides scope. Interface runs automatically with 1 suggested interface. (~6 stages)
  • Core + Product Spec Gate → Standard feature: 1 Plannotator gate (pre-tech), interface runs automatically with 3 interfaces + hybrid. (~10 stages)
  • Core + Product Spec + Interface Gates → Feature where interface matters: 1 Plannotator gate + user chooses among 3 interfaces + hybrid. (~8 stages)
  • Complete + Product Spec + Interface + Tech Review → Critical feature: 3 Plannotator gates + all questions. Interface explores all 5 archetypes + hybrid. No shortcuts. (~17 stages)
  • Complete + Product Spec + Interface + Tech Review + Code Diff → Maximum oversight: 4 Plannotator gates + code diff review. All questions. All archetypes. (~17 stages)

Motivation

Product ideas vary widely in scope and risk. A throwaway prototype should not require the same planning depth as a critical production feature. The Appetite × Review Mode cascade system ensures:

  • Lean appetite limits scope and exploration - smaller spec, fewer scopes, one interface suggestion, and critical-path tests only.
  • Complete appetite expands exploration and verification - full edge mapping, all 5 interface archetypes + hybrid, behavior/e2e tests, security tests, and live a11y audit when UI exists.
  • Auto review mode skips Plannotator - for lightweight validations where visual review is overkill
  • Product Spec + Interface + Scopes review mode enforces strategy - JTBD, Opportunity Mapping, etc. run before shaping if product context exists

This is an appetite-first design: the human's declaration of review budget propagates automatically through all stages - no estimation step required.


🔄 Process

The workflow has 3 conceptual phases (17 stages total), from idea triage to post-execution audit. See the Stage Index in the orchestrator skill for the complete stage map with auto-chain rules and flow diagram.

1. 🎨 Shaping

Stages 0-11 — From raw idea through shaped proposal, adversarial critique, visual gate approval, interface exploration, to typed technical plan. Stages 12 — Tech plan gate (conditional). Stages 13+ — Execution onward.

Bidirectional Product ↔ Tech Flow

Traditional planning is linear: product spec → tech spec. stelow adds two feedback loops that let tech constraints and opportunities inform product decisions before execution:

  • Tech Preview — Before shaping the product spec, a lightweight codebase analysis runs (via cymbal, when available) to surface existing architecture, entry points, hotspots, and constraints. This prevents shaping features that conflict with the codebase reality. Depth is appetite-gated. Additionally searches existing features by workflow name/topic to avoid duplicating or conflicting with what already exists.

  • Codebase Feature Recon — Before tech planning generates typed scopes, a deeper cymbal investigation runs: searches for related modules, maps references (who connects to what), and analyzes impact (what breaks if changed). Depth varies by appetite — see table below.

  • Alignment Check — After tech planning generates typed scopes, a bidirectional check compares the tech plan against the product spec. If tech reveals constraints that change the product scope, the LLM classifies alignment and acts per Review Mode: Auto/Product Spec Gate auto-updates the product spec; Product Spec + Interface Gates and above ask the user. This catches "tech discovered too late" before any code is written.

Appetite Tech Preview (shaping) Codebase Feature Recon (planning) Alignment Check
Lean cymbal search --text by workflow name cymbal search --text — verify existence Quick feasibility
Core Structure overview (entry points, hotspots) + feature search search + cymbal refs — find connections Standard IN/OUT vs feasibility
Complete Structure + impact analysis (blast radius) + feature search search + refs + cymbal impact — blast radius Deep: each scope's ACs vs codebase

Greenfield skips all codebase analysis (no code to inspect). If cymbal is not installed, falls back to find + git log — no cross-references or impact data.

Review Mode Alignment Check behavior
Auto/Product Spec Gate Auto-resolve. Updates spec-product if needed. No questions.
Product Spec + Interface Gates Auto-resolve if aligned; flags user if misaligned.
Product Spec + Interface + Tech Review / +Code Diff Always shows diff, asks user to choose update/ignore/reshape.

These loops are appetite- and mode-respecting by design — they inherit the same two-axis control as the rest of the workflow. No new mechanism needed.

2. ⚡ Execution

Stages 13-14 — Autonomous scope execution via acceptance contracts: each scope is delegated with criteria, verify commands, and stop rules. Self-correction is harness-dependent - native acceptance loops (pi-subagents) let the child fix gaps in the same context; other harnesses use parent-controlled re-delegation. Optimization scopes use benchmark-driven iteration. Scope completion is gated - /sw-next blocks advance to Verification if any scopes remain incomplete.

3. ✅ Verification & Audit

Stage 14 — Verification (tests, code review, UI audit). Stage 15 — Code diff review gate (conditional). Stage 16 — Execution critique (scope fidelity, NFR coverage, edge cases, docs, test quality). The audit classifies gaps as FIXED / DOCUMENTED / ESCALATED. ESCALATED gaps become new scopes. /sw-next detects pending scopes at the Audit phase and loops back to Execution.


📋 Skills

All 25 skills are flat in skills/ directory, ready for ~/.agents/skills/. They're organized into 4 layers plus 1 complementary skill.

Each skill is fully self-contained - the installer copies the complete directory tree including its own references/cli-tools/, references/, and stages/ files. This means:

  • Skills work standalone - invoke any sub-skill (e.g., cali-product-shape-up, cali-product-plan-critique) independently of the orchestrator
  • Portable across CLIs - Pi, Claude Code, Codex, OpenCode all reference skills by name (~/.agents/skills/)
  • References resolve locally - every references/cli-tools/*.md path is relative to the skill's own directory
  • Not in ~/.agents/skills/? Use ./install.sh or npx skills add calionauta/stelow -g

🎛️ Orchestrator (1)

Skill Purpose
stelow Coordinates the multi-stage workflow (Setup → Context → Shape → Critique → Gate → Scope → Interface → Int.Gate → Selection → Planning → Plan.Gate → Execution → Verification → Diff.Gate → Audit)

🧠 Product Strategies (5)

Skill Purpose
cali-product-job-to-be-done Job To Be Done - understand what job users hire the product to do
cali-product-discovery Product discovery and validation
cali-product-opportunity-mapping Map opportunities to see where to focus
cali-product-multi-method-market-analysis Multi-method market analysis
cali-product-evolutionary-principles Evolutionary principles for sustainable development

⚙️ Workflow Stages (10)

Skill Purpose
cali-product-shape-up Shape Up planning + Tech Preview (appetite-gated codebase recon via cymbal) — surfaces codebase reality before product decisions
cali-product-interface-alternatives Interface alternatives exploration (1/3/5 archetypes by appetite)
cali-product-plan-critique Product plan gap analysis (flows, states, affordances, data, system, compositional quality, feasibility); mode-dependent resolution
cali-product-codebase-critique Codebase structural critique (architecture, performance, AI slop)
cali-product-ux-critique Full UX/UI audit (accessibility, Nielsen heuristics, personas, AI slop)
cali-product-tech-planning Technical scope generation + Alignment Check (mode-gated bidirectional product↔tech feedback loop)
cali-product-testing-ai-code AI-aware testing strategy with contextual mutation testing evaluation
cali-product-testing-execution Post-implementation testing protocol
cali-product-scope-executor Autonomous scope execution via acceptance contracts - child self-corrects (harness-dependent), parent evaluates final result
cali-product-execution-critique Post-execution audit - classifies gaps as FIXED/DOCUMENTED/ESCALATED; ESCALATED gaps become new scopes

📘 Product Tactics (8)

Skill Purpose
cali-product-ads Advertising and growth channels
cali-product-business-models Business model canvas and options
cali-product-health Product health metrics
cali-product-marketplace-playbook Marketplace dynamics
cali-product-open-source Open source strategy
cali-product-pricing Pricing strategy and tactics
cali-product-promotions Promotions and campaigns
cali-product-trust-building Trust-building mechanisms

📐 Complementary (1)

Skill Purpose
cali-product-coding-standards Self-contained coding standards - KISS, DRY, LoB, SoC, Fail Fast, YAGNI, file/function size limits

🚀 Quick Start

This package works across multiple coding agents - not just pi.dev. See the compatibility table in Installation to pick your path.

Your situation Recommended command What you get
New to CLIs (no Node, no agent) curl -fsSL https://raw.githubusercontent.com/.../setup.sh | sh Node.js + pi.dev + all extensions + 25 skills
Already use pi.dev git clone ... && ./install.sh 25 skills + TUI overlay + slash commands
Use OpenCode / Claude Code / Codex git clone ... && ./install.sh 25 skills + command files (no TUI)
Any CLI (skills only) npx skills add calionauta/stelow -g 25 skills + cross-CLI support

Intent-Aware Start

/sw-start auto-detects what kind of request you're making:

/sw-start "reduce complexity of the codebase"
# → Detected as: Refactor
# → Pipeline: Planning → Execution → Verification → Audit
# → Skips Shape Up, Interface, all Gates

/sw-start "fix login crash when email is empty"
# → Detected as: Bugfix
# → Pipeline: Planning → Execution → Verification → Audit

/sw-start "create a new invoicing platform"
# → Detected as: New Product
# → Full pipeline: Setup → Shape → ... → Execution → Audit

If detection is ambiguous or incorrect, you can change the category before the workflow starts. This prevents token waste from running the full Shape Up pipeline on a simple bugfix.

Drift-Aware Resume

/sw-resume checks for git changes before resuming a paused workflow. If files changed while paused, it warns you and asks for confirmation before proceeding.

See docs/INSTALLATION.md for detailed options. Per-agent configuration files (commands, install scripts) are in cli-agents/.


📦 Installation

CLI Compatibility

Not every feature works on every CLI. Here's what to expect:

Feature pi.dev OpenCode Claude Code Codex
Skills (all 25)
/sw-start command ✅ Slash commands ✅ Via sw-*.md files ✅ Via command files ✅ Via command files
TUI overlay (real-time status) ✅ Native extension
Plannotator visual gate ✅ Extension ⚠️ Manual ⚠️ Manual ⚠️ Manual
Deep hooks (events, gates) ✅ Extension

Bottom line: The 25 skills work identically on every CLI - they run the full Shape Up workflow, generate plans, critique, scopes, everything. The deep integration features (real-time TUI overlay, slash commands, lifecycle hooks, Plannotator gate) are native to pi.dev, which has the extension system to support them. Two CLI-agnostic surfaces also read workflow state from .stelow/ files on disk: the Muxy.app webview panel (macOS terminal multiplexer) and the Herdr split-pane TUI plugin (terminal multiplexer). Both work with any CLI. All CLIs can still complete the workflow; on non-Pi CLIs it happens via chat and command files rather than extensions.


External Dependencies

stelow is designed to be self-contained — the 25 skills + installer cover the full workflow. Some features optionally integrate with external tools for enhanced capability. Every external dependency has a documented fallback.

Dependency Required? Used by Install method Fallback if absent
cymbal Optional Tech Preview, Codebase Feature Recon, Alignment Check brew install 1broseidon/tap/cymbal (macOS), or go install / binary release Basic find + git log — no cross-references or impact data
npx skills Optional Stack-matched skill discovery during execution setup Part of Node.js ecosystem (npx bundled with npm) Skip — workflow runs without stack-matched skills
ctx7 Optional Current library doc fetching during execution setup npx @vedanth/context7 (auto-install via npx) Skip — docs not fetched (less informed execution)
sem Optional Entity-level diff in Execution Critique (functions, types, methods instead of raw lines); enhanced changelog + bump detection in releases brew install sem (macOS), or go install github.com/bcongdon/sem@latest git diff — raw line-level only, no structural awareness
plannotator Optional Visual review gate annotation Pi: @plannotator/pi-extension · OpenCode: @plannotator/opencode · Claude Code: @backnotprop/plannotator · Codex: built-in hook Manual review with approval receipt file — no structured annotation
safe-change (pi-agent-codebase-workflows) Optional Pre-execution code safety checks npx skills add Prinova/pi-agent-codebase-workflows -g (works on Pi, OpenCode, Claude Code, Codex) Skip — pre-execution check omitted
Subagents (built-in to all CLIs) Optional Parallel reviewer orchestration during Plan Critique subagent({ agent, task }) — built-in on Pi/OpenCode/Claude Code/Codex Sequential execution — slower, same outcome (single-context review)
pi-subagents Optional (Pi only) Advanced subagent features (fork/fresh context semantics, parent-child contracts) npm:pi-subagents Use the CLI's built-in subagent() instead — same outcome, fewer features
pi-intercom Optional (Pi only) Session-to-session coordination npm:pi-intercom Skip — no intercom capability
pi-supervisor Optional (Pi only) Conversation supervision during execution npm:pi-supervisor Skip — no supervision; rely on stages-guard for invariant enforcement
Muxy.app + stelow Muxy extension Optional (macOS) Webview panel showing workflow state with phase progress and quick actions Install Muxy.app, then load extension from integrations/muxy/stelow/ No webview — read .stelow/ files directly or use Herdr split-pane TUI
herdr + stelow plugin Optional Split-pane TUI showing workflow state with click-to-drill herdr plugin install calionauta/stelow No TUI — read .stelow/ files directly or use Muxy webview panel

Design principle: stelow is harness-agnostic. Zero external tools are required to run the full product workflow. Each optional integration enhances a specific phase but never blocks progress. The installer (./install.sh) auto-installs Pi npm packages when Pi is detected — other tools (cymbal, ctx7) remain user-managed.

For every external tool above, the workflow teaches the agent the specific fallback strategy in skills/stelow-product-orchestrator/references/cli-tools/<tool>.md. When a tool is unavailable, the orchestrator instructs the agent to use harness-native capabilities (built-in subagent(), git grep, terminal-based review with approval receipts) rather than skipping the workflow step entirely. Degraded capability is the trade-off — see the Fallback column above for what you lose without each tool.

🚀 Path A: From Zero (pi.dev + Everything)

One command, everything included. Pick this if you don't have pi.dev yet.

curl -fsSL https://raw.githubusercontent.com/calionauta/stelow/main/setup.sh | sh

What gets installed (in order):

Step Component Details Works on
1 Node.js v20+ via Homebrew (macOS) or nvm (Linux/Windows) -
2 pi.dev @earendil-works/pi-coding-agent via npm pi.dev
3 Pi extensions 12 npm packages: pi-subagents, pi-skillful, pi-intercom, pi-supervisor, @plannotator/pi-extension, pi-rewind, pi-powerline-footer, plus 5 harness tooling packages pi.dev only
4 Skills (25) stelow orchestrator + 24 subskills, copied to ~/.agents/skills/ All CLIs
5 Settings theme, model defaults, skill shortcuts in ~/.pi/agent/settings.json pi.dev
6 cymbal codebase navigation via brew install 1broseidon/tap/cymbal (macOS) or go install (Linux). Skipped gracefully if brew/Go absent macOS, Linux
7 ctx7 library docs fetcher via npx @vedanth/context7 (interactive OAuth — prompts the user) All CLIs
8 safe-change pre-planning regression check via npx skills add PrinNova/pi-agent-codebase-workflows -g All CLIs
9 Herdr plugin stelow split-pane TUI installed via herdr plugin install calionauta/stelowonly if herdr CLI is on PATH All CLIs (via Herdr)
10 Muxy detection detects /Applications/Muxy.app or muxy binary; prints install link if absent (cannot auto-install — Muxy is macOS-only, distributed via GitHub releases) macOS
11 Pulse (optional) copies Pulse scripts to project's .stelow/pulse/ and creates inbox. Or run standalone: ./scripts/setup-pulse.sh (no pi required — works in CI/CD or before pi is installed) All CLIs (cron/launchd/systemd/Task Scheduler)

Not using pi.dev? Skills land in ~/.agents/skills/ and work on OpenCode, Claude Code, and Codex too. You just won't get the Pi-only extensions or TUI overlay. The workflow itself runs fine.

Muxy.app can't be auto-installed. It's a macOS-only app (SwiftUI + libghostty), open-source under MIT license, distributed via GitHub releases. Path A detects whether it's present and tells you how to install if not. Once installed, load the stelow extension from integrations/muxy/stelow/.

📋 Path B: Existing pi.dev User

git clone https://github.com/calionauta/stelow.git
cd stelow
./install.sh

The installer auto-detects your CLIs and installs skills + extensions + slash commands. Skills go to ~/.agents/skills/.

📋 Path C: Other CLI (OpenCode / Claude Code / Codex)

The skills are the core of this project - they work on any agent (Pi, OpenCode, Claude Code, Codex).

git clone https://github.com/calionauta/stelow.git
cd stelow
./install.sh

The installer detects your CLI and installs skills + command files. No extensions, no TUI - just the 25 skills that run the workflow.

Or, with npx (no clone needed):

npx skills add calionauta/stelow -g

This installs all 25 skills to ~/.agents/skills/ - works on any CLI.

For CLI-specific setup (OpenCode config, Claude Code plugin, Codex plugin), see docs/INSTALLATION.md.

Manual setup & dependencies

For per-CLI commands, required npm packages, third-party skills, and updates, see docs/INSTALLATION.md.

For toolchain dependencies (TypeScript, Vitest), see package.json.

This project distributes exclusively via GitHub (no npm) — see docs/SECURITY.md for rationale.


🎮 Commands

Primary Commands

Command Description
/sw-start [idea] Start new workflow. Auto-detects intent type and routes to appropriate stage pipeline. If called without arguments, reads from inbox (.stelow/inbox/items.md). If the input contains multiple items, auto-runs triage (group) + select (pick one).
/sw-status Show active workflow phase list, stage progress, and scopes.
/sw-next Advance to next stage. Auto-completes workflow on last phase.
/sw-pause Pause active workflow (keeps state for resume).
/sw-resume [name=] Resume paused/in-progress workflow. Checks git drift before resuming.
/sw-abort [name=] Abort and archive active workflow.
/sw-archive [name=] Archive completed or inactive workflow.
/sw-unarchive name= Restore archived workflow to paused state.
/sw-status Display current phase, progress, and scope status.
/sw-ls [all|archived] List workflows in current project (or all projects).
/sw-setphase phase=N Jump to specific phase by index.
/sw-info [name=] Print workflow path, current stage, and copy-pasteable cd + /sw-resume commands.
/sw-rename <name> Rename active workflow.
/sw-complete Force-complete active workflow.
/sw-inbox [add|remove|clear|history] View or manage deferred inbox items.
/sw-pulse Manage autonomous inbox processing (see Pulse section below).
/sw-doctor [--fix] Diagnose workflow health. Detects zombie workflows, index mismatches, orphaned entries.
/sw-unlock Disable stage guard for current session (debug only).

All 17 commands work in Pi natively (via pi.registerCommand()). In OpenCode, Claude Code, Codex, 15 commands work via Skill delegation — the .md files in cli-agents/<cli>/commands/sw-*.md invoke /skill:stelow-product-orchestrator <command> and route through the orchestrator. The exceptions are /sw-inbox and /sw-pulse, which are marked piOnly because they operate on filesystem state with native TUI notifications; in non-Pi CLIs, the agent falls back to reading files directly.


📡 Pulse — Autonomous Inbox Processing

Pulse is a background system that periodically checks your inbox and creates workflows automatically — no interactive session needed. It runs on a timer (cron, launchd, systemd, Task Scheduler) and processes items with review_mode=Auto (no gates, no questions, no Plannotator).

cron/launchd/systemd (every 30m)
  → pulse.sh / pulse.ps1
    → checks inbox (.stelow/inbox/items.md)
    → runs `pi --print` with triage prompt
    → creates workflow(s) with review_mode=Auto
    → logs provenance to .stelow/inbox/history.jsonl
Command Purpose
/sw-pulse status Show pulse state (paused, inbox count, last run)
/sw-pulse pause Pause automatic processing
/sw-pulse resume Resume automatic processing
/sw-pulse process Force immediate processing
/sw-pulse log [n] Show last N log entries
Flag / Env Default Description
--max-items N 10 Items per cycle. 0 = uncapped (all). 1 = one at a time
--force Skip pause + user-activity checks
--dry-run Preview without executing
PULSE_MODEL Optional. Override harness's configured model for pi --print. If unset, uses whatever the user's harness is configured with (no hardcoded default).
PULSE_TIMEOUT 120 Max seconds for pi --print
PULSE_USER_ACTIVITY_MINUTES 15 Skip if user modified stelow.json recently

Marking items for human review: Prefix an inbox item with [human-in-the-loop] (or [hitl]) — Pulse skips it entirely. Use for items that need human judgement (pricing, partnership, strategy). Items without the marker are processed automatically.

Conflict prevention: Pulse detects active user sessions (modified stelow.json mtime + interactive pi process) and skips automatically. Lock file prevents concurrent runs.

Setup guides: See .stelow/pulse/SETUP.md for macOS (launchd), Linux (systemd/cron), and Windows (Task Scheduler + PowerShell).

Getting the scripts: The stelow extension auto-copies Pulse scripts to .stelow/pulse/ on the first /sw-pulse invocation. To pre-stage (no pi required, useful for CI/CD or before the extension is installed): ./scripts/setup-pulse.sh [--project-dir DIR] [--dry-run].


Setup per CLI

When working on software projects, trigger the product workflow:

  1. Trigger: Use /skill stelow
  2. Execute: Only after visual review gate (Plannotator approval)
CLI File
Pi ~/.pi/agent/AGENTS.md
OpenCode ~/.config/opencode/AGENTS.md or project AGENTS.md
Claude Code ~/.claude/CLAUDE.md or project CLAUDE.md
Codex ~/.codex/AGENTS.md or project AGENTS.md

🖥️ Visual & TUI Integrations

Two CLI-agnostic surfaces read workflow state from .stelow/ files on disk and present it alongside your terminal. Pick one or both — they share no code and don't require each other.

Surface Host UI model
Muxy webview panel Muxy.app (macOS terminal multiplexer) WKWebView docked/floating panel with HTML/CSS/JS
Herdr split-pane TUI Herdr (terminal multiplexer) Rust+ratatui TUI in split pane (placement = "split")
Muxy plugin Herdr Extension

Herdr Extension

Both integrations share the same workflow state (.stelow/), both work with any CLI (Pi, OpenCode, Claude Code, Codex), neither requires pi.dev.

Muxy Webview Panel

Requires Muxy.app + the stelow extension loaded. Muxy is a macOS terminal multiplexer — think tmux with a native Mac UI: project-based terminal workspaces, tabs, splits, and custom panels. The webview panel is a Muxy plugin surface, not a web app or a Pi feature.

The panel shows:

  • Current phase and progress
  • Phase artifacts and outputs
  • Upcoming tasks
  • Quick actions

Install: Muxy.app (macOS-only) + the stelow Muxy extension at integrations/muxy/stelow/. To load the extension in Muxy: open Extensions modal → Create, pick the folder, and Muxy auto-detects the built dist/. See Muxy's Get started guide for the official workflow.

Herdr Split-Pane TUI

Requires Herdr + the stelow plugin installed. Herdr is a terminal multiplexer — tmux-style persistence, mouse-native panes, agent state tracking, CLI + socket API. The TUI plugin runs as a Rust binary in a split pane, the same model as herdr-file-viewer.

The TUI shows:

  • Current stage (Discovery → Shape Up → Tech Planning → ...)
  • Per-stage status (✓ done, ▶ active, · pending, ! blocked)
  • Drill-down: stage → project → scope → task
  • Quick action invocation via herdr plugin action invoke

Keybinds: prefix+w toggle · Tab/j/k next/prev workflow · r refresh · ? help · q/Esc quit. Detail card shows prompt + current stage + scope; click workflow rows to select.

Install: herdr plugin install calionauta/stelow (or herdr plugin link integrations/herdr/stelow/ for local dev after cargo build --release). Source under integrations/herdr/stelow/. See herdr's plugin docs for the official install/link workflow.


📁 Artifact Directory

All workflow artifacts are stored in:

<project>/.stelow/
Subdirectory Contents
input/ User's original idea
stages/ Stage outputs (JTBD, Shape Up, Interface, etc.)
plans/ Technical plans and specs
reviews/ Plannotator feedback
scopes/ Typed execution scopes
logs/ Workflow execution logs

📖 Evidence & Limitations

✅ Evidence-Based Design

This workflow is grounded in empirical evidence from the 2025-2026 AI agent research boom. Every architectural decision - from parallel subagent orchestration to cross-session learning - is backed by peer-reviewed papers, open-source tools, and industry benchmarks.

Practice Source Evidence Where We Implement
Parallel orchestration CAID (Geng & Neubig, CMU, 2026) +26.7% accuracy using git-worktree isolation + dependency DAG 5 parallel reviewers + consolidator during plan critique
Cross-session learning Cat (Liu et al., Beihang, 2025); Memory Transfer (Kim et al., KAIST, 2026) Context as callable tool; +3.7% via abstract memory pools Session knowledge from past cycles read during workflow setup
Output validation guards Stage-Gate Agentic (PDMA, 2026); Phaselock (2026) AI agents with gates reduce execution failures; 80 enforceable rules Shape Up output guard + Tech Planning validation guard
Context isolation Clean Context Pattern (Agent Factory, 2026); GAM (Zhejiang U., 2026) Fresh context per agent outperforms shared pipelines; write isolation prevents contamination subagents.md - context:"fresh" per subagent; disk-based artifacts
Visual review gate Plannotator (backnotprop, 2025); Placement Theory (Tian Pan, 2026) Browser-based plan annotation with structured feedback loop Plannotator gate active when Review Mode > Auto; skipped in Auto
Intra-step recovery Try-Heal-Retry (Nweke, 2026); PALADIN (Chaudhary et al., 2025) 89.68% recovery rate via annotated failure trajectories subagents.md - Retry 1× + skip with logged error per subagent
Parallel review isolation CooperBench (Khatua et al., 2026) 2-agent cooperation → 25% success vs 50% solo; monotonic decline from 68% (2 agents) to 30% (4 agents) Plan Critique uses fresh-context subagents with zero inter-agent communication and independent file outputs
Communication topology limits clawRxiv 2604.00736 (2026) Overhead grows quadratically: C(n)=0.023n²+0.04n; 50% at n=7; agents inflate 34% when aware of peers Max 4-5 parallel subagents (n≤5 optimal zone); no message passing between agents — each writes independent file
Research vs code parallelism Co-Coder (Yang et al., 2026) Parallel speedup requires cohesion-aware partitioning (+14% pass rate, 2.10× speedup); naive file-parallel = worse than sequential Research/review tasks are naturally cohesion-free; code execution defaults to sequential; parallel scope execution is opt-in with file-overlap guard
Metric-driven optimization ReflexGrad (Kadu et al., 2025); ReliabilityBench (Gupta et al., 2026) +40pp lift via dual-process routing; standardized reliability measurement optimization scopes routed to optimization goals (subagent + acceptance)
Acceptance-based execution Pattern inspired by Try-Heal-Retry (Nweke, 2026) and PALADIN (Chaudhary et al., 2025) Self-correction in same context outperforms fresh re-delegation Scope executor delegates with acceptance contract - child self-corrects (harness-dependent) before parent evaluates
Audit gap-to-scope loop Pattern inspired by Agentic Debugging (Zhang et al., 2025) Multi-agent feedback loops improve fix rate Audit classifies gaps → ESCALATED become new scopes → /sw-next enforces loop back to Execution

Research parallelism, not code parallelism. All subagent parallelism in stelow is research and review — Plan Critique (4-5 parallel reviewers), Strategic Context (N skill executors), Interface exploration (5 proposals). Every reviewer receives fresh context, writes to an independent file, and communicates zero with other agents. No message passing, no shared mutable state, no concurrent code edits. This avoids the "curse of coordination" deliberately: CooperBench (2026) shows 2-agent cooperation achieves only 25% success vs 50% solo, with monotonic decline as agents increase (68% → 46% → 30% from 2→3→4 agents). Communication overhead scales quadratically (clawRxiv 2604.00736: C(n)=0.023n², 50% of tokens lost to coordination at n=7 agents). For code execution, stelow defaults to sequential scope execution — each scope runs in a single agent turn, no parallel file edits. Parallel scope execution is opt-in, guarded by the execution stage's explicit file-overlap check: if scopes modify the same files, stelow warns and recommends sequential execution or git worktree isolation with merge instructions. The rationale is documented in the execution stage: "If conflicts are extensive, consider sequential execution instead." Research parallelism shows consistent gains (+26.7% accuracy, CAID 2026); code parallelism on shared files degrades quality (CooperBench 2026). Stelow uses each pattern where evidence supports it.

⚠️ Known Limitations & Radical Transparency

Even with these guardrails, the AI agent still exhibits predictable failure modes. This workflow is a tool for amplifying human judgment, not a substitute for it.

How to read this table: Each row is honest about what the workflow can and cannot do. Every mitigation has a corresponding "not solved" assessment. Read both before deciding whether this workflow helps your context.

# Limitation Impact What the workflow tries to do Why it's not solved
1 Context rot - compliance with own rules drops from ~73% (turn 5) to ~33% (turn 16) in long sessions Gamage 2026, 4,416 trials, 12 models/8 providers. Replicated by Liu et al. 2023 "Lost in the Middle". Subagents use context: "fresh". Ordered-execution-goal creates isolated scope execution. Execution stage has explicit "Context Rot Check" re-reading plan from disk. Reduced but not solved. The orchestrator itself can forget its own rules in long sessions spanning multiple stages. The core transformer limitation (U-shaped attention curve) remains intrinsic.
2 Confabulated research references - Agents cite nonexistent papers or books (~11-57% hallucination rate across models) arXiv 2604.03173 - 10 models/3 databases/69K citation instances Claim verification via Lessons Learned cross-referencing during setup. Caught by structure, not guaranteed. Multi-model consensus (≥3 LLMs citing same work) yields 95.6% accuracy, but the workflow doesn't enforce this.
3 Silent wrong answers - Cross-task state leakage produces plausible but incorrect outputs UCC (arXiv 2604.01350), 2026 Write isolation per subagent; clean context pattern Mitigated by isolation, not by detection. No mechanism to detect when contamination happens despite isolation.
4 Overconfidence in estimates - AI systematically underestimates implementation complexity Agentic Overconfidence (ICLR 2026) - all tested agents exhibit agentic overconfidence Appetite is declared by human as a constraint, not estimated by the LLM. The LLM only checks appetite_fit (fits/cuts_needed/reshape). No estimation step. Addressed by design - appetite is a constraint, not an estimate. The human sets the budget before shaping. The LLM checks fit, not effort. But the human still needs to set appetite honestly.
5 Approval gate fatigue - Users can desensitize to visual gates and approve without scrutiny Tian Pan Apr 2026 - HITL queues have dynamics Plannotator requires active annotations (deletions, comments, labels). Auto/Product Spec Gate review modes skip gates entirely when appropriate. Delayed, not prevented. Review Mode selection helps reduce unnecessary gates, but if the human always picks Complete+Product Spec + Interface + Scopes, fatigue still sets in.
6 80% Problem - AI ships the happy path (CRUD, main flow) but omits error handling, observability, security, retry, rollback, edge cases Osmani Jan 2026 (coined the term); GitClear 2025 Tech Planning requires NFRs per scope. Acceptance contracts can include NFR criteria (if the plan specifies them). Audit classifies omissions as gaps - ESCALATED ones become new scopes. Partially mitigated, not solved. NFRs must be in the plan to appear in the contract. Audit classification depends on the LLM - misclassification means gaps slip through. Same model evaluates both stages.
7 Model dependency - Claude Opus, Gemini Flash, GPT-4o produce significantly different quality Veracode 2025 - 45% of AI-generated code contains flaws across 100+ models; Anthropic Jan 2026 - RCT: AI-assisted devs score 17% lower on comprehension tests Every artifact tracks generated_by: {model_name} in frontmatter. Gate stage shows provenance before Plannotator review. Transparency, not mitigation. Knowing the model helps calibrate expectations, but it doesn't fix the quality gap. The comprehension penalty (Anthropic 2026) affects users regardless.
8 Constraint decay - AI progressively violates its own self-imposed rules over time arXiv 2026 (Constraint Decay) - structural constraints drift in backend code generation; HORIZON - agents break on long-horizon tasks Context rot rules explicitly warn about this. "No patching in degraded context" rule blocks the most common decay pattern. Same root cause as context rot. The warning helps, but stopping a session mid-flow is disruptive and users rarely do it.
9 Code hallucination - AI invents APIs, functions, or contracts that don't exist (~20% of failures) CloudAPIBench - 20.41% of failures are hallucinated APIs; Code LLM failures Verification stage runs the test suite, which catches some hallucinated APIs. Caught by tests, not by the workflow. If tests don't exist (or are also hallucinated), neither Verification nor Critique detects it.
10 Shallow review trap - same LLM that wrote the code also reviews it Ox Security 2025 - 300+ repos, 10 anti-patterns, AI code in production with critical flaws Verification uses context: "fresh" subagent reviewers - same model but fresh session context. Automatic via context: "fresh" - fresh context restores full rule awareness lost to context rot (~33% rule adherence at turn 16 vs ~73% at turn 5). True cross-model independence offers marginal additional benefit.
11 Expertise cliff - AI fails in mature codebases with implicit conventions, undocumented architecture Tian Pan Mai 2026; METR 2025 RCT - experienced devs 19% slower with AI Domain libraries and structured specs help surface some conventions. Execution Critique checks for broken refs and anti-patterns. Not addressed. This workflow was designed for greenfield or well-documented features. If your codebase has 10 years of undocumented architecture decisions, the AI will violate them.
12 Plan staleness - plans generated against one snapshot; by execution time, target has changed Superpowers Issue #989 - parallel sessions cause spec/plan staleness Git diff check before scope execution detects if target files changed since plan creation. Staleness detected but not auto-resolved. Only detects file-level changes, not semantic staleness. LLM decides whether staleness matters - no forced re-plan.
13 Pipeline memory loss - no cross-session memory of own failure patterns Flamehaven 2026 - cross-session memory, MICA governance schema Execution Critique saves lessons from each cycle. Setup stage automatically reads past lessons with forced reflection. Captured and injected, but not verified. Same model that made mistakes reads the lessons. Context rot can still cause mid-session forgetting. Cannot auto-verify lesson adherence.
14 Code complexity growth - AI-generated code increases complexity over time Cursor Study (MSR 2026) - static analysis warnings +30%, code complexity +41% after month 2 Execution Critique includes anti-pattern detection (god functions >100 lines, global mutable state). Optional Code Quality Gate with static analysis. Caught too late. Complexity analysis happens after code is written. No mechanism to prevent complexity during generation - only flag it after.
15 Activity ≠ productivity - more PRs, more commits does not mean more value delivered METR 2025 RCT - 19% slower for experienced devs; Faros AI 2025 - 9% more tasks, 0% DORA improvement Appetite system anchors scope size to human attention budget. OUT/IN scoping keeps proposals focused. Execution Critique includes "close without follow-up" as valid outcome. Honest assessment: Appetite system mitigates scope bloat, but requires human to set appetite honestly. appetite_fit is validated by the Plan Critique stage's fresh-context feasibility reviewer (reusing existing 5-reviewer infrastructure). The appetite system is new - its real-world effectiveness is not yet measured.
16 Coordination overhead — adding agents to shared-state coding tasks degrades quality CooperBench 2026 — 2-agent cooperation: 25% success vs 50% solo; clawRxiv 2604.00736 — overhead hits 50% of tokens at n=7 agents Parallelism limited to research/review tasks with fresh context, zero inter-agent communication, and independent file outputs. Code execution defaults to sequential. Parallel scope execution is opt-in: the execution stage checks for overlapping file targets and warns + recommends sequential or git worktree isolation. Addressed by design — research-only parallelism and file-overlap guard. If users force parallel scope execution on overlapping files despite the warning, these failure modes re-emerge. The file-overlap check is a heuristic (file paths) rather than semantic dependency analysis.

What this means for you

  • Every artifact is a draft. Treat spec-product.md, spec-tech.md, critique reports, and interface proposals as first drafts that need human eyes.
  • Results vary by model and codebase. A small model generating a plan for a mature codebase is a recipe for failure - regardless of how structured the workflow is.
  • Human review is required. The workflow catches structural gaps (missing scopes, contradictory requirements, some untested edge cases). It does NOT catch logic errors in individual lines, security flaws in business logic, or nuanced architectural trade-offs - those need you.

We don't claim to solve product planning. We claim to structure the thinking so you catch more before you code. The rest is still up to you.

Research sourced May 2026. All references are hyperlinked for verification.


About the Author

Cali (Renato Caliari)

This workflow wasn't designed in a vacuum. It comes from years inside real teams — as a developer, product manager, consultant, and leader across different organizations. The skills, patterns, and disciplines here were tested, broken, and rebuilt in live product environments and real codebases, not conference rooms.

📚 Published Work

  • 🇧🇷 [e-book, Brazilian Portuguese] Inovação baseada em Jobs To Be Done (Innovation based on Jobs To Be Done)
  • 🇧🇷 [e-book, Brazilian Portuguese] A Arte da Experimentação: Da Ideia ao Produto (The Art of Experimentation: From Idea to Product - Innovate with a simplified process and AI assistance)

💼 Experience

  • Former Developer — built products across the full stack before moving into product
  • Former Product Manager at tech companies
  • Product Consultant helping leaders with strategy and teams with processes
  • Creator of Triple Track Agile - adds an opportunity mapping track to product cycles
  • Developed Contornos - a social technology for decentralized decisions

🌐 Resources

Site Description
timeproduto.com.br Product process divided into stages, with AI tools and prompts for each stage
espacocalionauta.substack.com Blog exploring AI, organizational culture, daily philosophy, narrative practices, and product thinking - with published prompts and free e-books

License

MIT


📞 Support

About

stelow — opinionated agentic product workflow orchestrator for AI agents and humans. AI agents less like coding assistants, more like cross-functional product teams.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors