moedash · moedash · Dec 30, 2025 · Dec 30, 2025 · Dec 30, 2025 · Dec 30, 2025
diff --git a/examples/EXPERIMENT.md b/examples/EXPERIMENT.md
@@ -0,0 +1,231 @@
+# Agent Debugging Experiment
+
+This document describes experiments to validate that the `temporal workflow` CLI commands enable AI agents to debug Temporal workflow failures using structured output instead of logs.
+
+## Hypothesis
+
+AI agents can query failures, trace nested workflow chains across namespaces, and get compact timelines and state — without scraping logs or manually traversing workflows.
+
+## Experiment 1: Basic Agent Commands
+
+### Environment
+
+| Setting | Value |
+|---------|-------|
+| Temporal Environment | Staging (`us-west-2.aws.api.tmprl-test.cloud:7233`) |
+| Namespace | `moedash.temporal-dev` |
+| CLI Version | Built from source |
+| AI Agent | Claude Code (Cursor) |
+
+### Failure Scenarios
+
+| Scenario | Command | Expected Failure |
+|----------|---------|------------------|
+| Success | `go run ./starter -scenario success` | No failure (control) |
+| Payment Fail | `go run ./starter -scenario payment-fail` | Activity fails with "payment gateway connection timeout" |
+| Shipping Fail | `go run ./starter -scenario shipping-fail` | Activity fails with "warehouse inventory depleted" |
+| Nested Fail | `go run ./starter -scenario nested-fail` | 3-level deep child workflow chain, leaf fails with "database connection refused" |
+| Timeout | `go run ./starter -scenario timeout` | Activity times out (5s activity with 2s timeout) |
+| Retry Exhaustion | `go run ./starter -scenario retry-exhaustion` | Activity fails 5 times then exhausts retries |
+| Multi-Child | `go run ./starter -scenario multi-child` | 3 parallel children, only "validation" child fails |
+
+### Results: 2025-12-29
+
+| Test | Tool Used | Root Cause Found | Score | Notes |
+|------|-----------|------------------|-------|-------|
+| Test 1 | `workflow failures` | 6/6 failure types identified | 95/100 | All failures found with clear root causes |
+| Test 2 | `workflow diagnose` | "database connection refused" at depth 3 | 100/100 | Perfect chain traversal |
+| Test 3 | `workflow show --compact` | ValidationWorkflow failed with invalid SKU | 100/100 | Clear child workflow timeline |
+| Test 4 | `workflow diagnose` | "activity StartToClose timeout" | 100/100 | Correctly identified timeout vs app error |
+| Test 5 | `workflow failures --error-contains` | Found 2 timeout-related failures | 100/100 | Filter worked correctly |
+
+**Overall Score:** 99/100
+
+---
+
+## Experiment 2: Multi-Namespace Nexus Traversal
+
+### Environment
+
+| Setting | Value |
+|---------|-------|
+| Temporal Environment | Staging (`us-west-2.aws.api.tmprl-test.cloud:7233`) |
+| Namespaces | `moedash-commerce-ns.temporal-dev`, `moedash-finance-ns.temporal-dev`, `moedash-logistics-ns.temporal-dev` |
+| Example | `examples/ecommerce-nexus/` |
+
+### Scenarios Tested
+
+| Scenario | Chain | Expected Failure |
+|----------|-------|------------------|
+| Nexus Payment Fail | commerce → finance (Nexus) | Fraud detection fails |
+| Child Shipping Fail | commerce → logistics (child workflow) | Shipping carrier error |
+| Deep Chain | commerce → finance → fraud-check | 3-level Nexus + child chain |
+
+### Results: 2025-12-30
+
+| Metric | Target | Result | Status |
+|--------|--------|--------|--------|
+| Time to first failure found | < 30 seconds | 3.1 seconds | ✅ PASS |
+| Root cause accuracy | 100% | 100% (all failures correctly identified) | ✅ PASS |
+| Chain depth accuracy | 100% | 100% (depth 2 for Nexus chains) | ✅ PASS |
+| Cross-NS traversal success | 100% | 100% (commerce-ns → finance-ns) | ✅ PASS |
+| Token efficiency | < 1000 bytes per failure | 685 bytes/failure | ✅ PASS |
+
+### Key Findings
+
+- Cross-namespace Nexus traversal correctly followed fraud workflows from commerce-ns to finance-ns
+- `--compact-errors` effectively stripped verbose wrapper messages
+- `--leaf-only` reduced results by 69%, eliminating duplicate parent/child entries
+- Namespace-specific API keys worked seamlessly via `TEMPORAL_API_KEY_<NAMESPACE>` pattern
+
+---
+
+## Experiment 3: Blind AI Diagnosis (TOCTOU Race Condition)
+
+### Environment
+
+| Setting | Value |
+|---------|-------|
+| Temporal Environment | Local dev server |
+| Namespace | `default` |
+| Example | `examples/debug-loop-fresh/` (hint-free version) |
+| AI Agent | Claude (separate LLM session) |
+
+### The Challenge
+
+The `debug-loop-fresh` example contains a TOCTOU race condition with all hints removed. The LLM was given only:
+
+> "I've created a sample example under `examples/debug-loop-fresh`, and I want you to find and fix its issue with the use of temporal workflow CLI"
+
+### LLM's Diagnosis Process
+
+1. **Ran the scenario** - Started worker and triggered race condition
+2. **Used `temporal workflow describe --trace-root-cause`** - Found `ReserveInventory` failed for KEYBOARD-03
+3. **Used `temporal workflow show --compact`** - Analyzed timestamps of both workflows
+4. **Built a race timeline** - Correlated events across both orders:
+
+| Time | Main Order | Competing Order |
+|------|------------|-----------------|
+| 03:37:04.708 | CheckInventory (all 3) ✓ | |
+| 03:37:04.711 | | CheckInventory ✓ |
+| 03:37:05.723 | | **ReserveInventory ✓** (takes keyboard) |
+| 03:37:05.730 | Reserve KEYBOARD **FAILED** | Completed ✓ |
+
+5. **Proposed the fix** - Atomic `CheckAndReserveInventory` activity
+6. **Verified the fix** - Both orders now behave deterministically
+
+### Results
+
+| Metric | Result |
+|--------|--------|
+| Root cause identified | ✅ TOCTOU race condition |
+| Timeline analysis used | ✅ Cross-workflow timing correlation |
+| Fix proposed | ✅ Atomic check-and-reserve |
+| Fix verified | ✅ Deterministic behavior |
+| Human intervention needed | ❌ None |
+
+**This validates the core thesis:** An LLM can autonomously diagnose complex timing bugs using only `temporal workflow` CLI output.
+
+---
+
+## Features Implemented
+
+Based on experiment findings, the following improvements were made:
+
+### Phase 1: Core Commands
+
+| Feature | Status | Command |
+|---------|--------|---------|
+| Find recent failures | ✅ Done | `temporal workflow list --failed` |
+| Trace workflow chain | ✅ Done | `temporal workflow describe --trace-root-cause` |
+| Workflow timeline | ✅ Done | `temporal workflow show --compact` |
+
+### Phase 2: Filtering & Compaction
+
+| Feature | Status | Flag/Command |
+|---------|--------|--------------|
+| Error message filter | ✅ Done | `--error-contains` |
+| Multiple status values | ✅ Done | `--status Failed,TimedOut` |
+| Leaf-only failures | ✅ Done | `--leaf-only` |
+| Compact error messages | ✅ Done | `--compact-errors` |
+| Follow child workflows | ✅ Done | `--follow-children` |
+
+### Phase 3: State & Aggregation
+
+| Feature | Status | Flag/Command |
+|---------|--------|--------------|
+| Workflow state | ✅ Done | `temporal workflow describe --pending` |
+| Pending activities | ✅ Done | Included in state output |
+| Pending Nexus operations | ✅ Done | Included in state output |
+| Group failures by type | ✅ Done | `--group-by type\|namespace\|status\|error` |
+
+### Phase 4: Cross-Namespace
+
+| Feature | Status | Notes |
+|---------|--------|-------|
+| Nexus chain traversal | ✅ Done | Follows Nexus operations across namespaces |
+| Namespace-specific API keys | ✅ Done | `TEMPORAL_API_KEY_<NAMESPACE>` env vars |
+| Cross-NS documentation | ✅ Done | Added to README and examples |
+
+### Phase 5: AI Tool Specs
+
+| Feature | Status | Format |
+|---------|--------|--------|
+| OpenAI function spec | ✅ Done | `temporal tool-spec --format openai` |
+| LangChain tool spec | ✅ Done | `temporal tool-spec --format langchain` |
+| Claude tool spec | ✅ Done | `temporal tool-spec --format claude` |
+
+### Phase 6: Visualization
+
+| Feature | Status | Flag/Command |
+|---------|--------|--------------|
+| Trace flowchart | ✅ Done | `temporal workflow describe --trace-root-cause --output mermaid` |
+| Timeline sequence diagram | ✅ Done | `temporal workflow show --compact --output mermaid` |
+| State diagram | ✅ Done | `temporal workflow describe --pending --output mermaid` |
+| Failures pie chart | ✅ Done | `temporal workflow list --failed --group-by error --output mermaid` |
+| Failures flowchart | ✅ Done | `temporal workflow list --failed --output mermaid` |
+
+---
+
+## Comparison: Agent Commands vs Log-Based Debugging
+
+| Aspect | Agent Commands | Log-Based (LogQL/grep) |
+|--------|----------------|------------------------|
+| Time to root cause | ~3-5 seconds | 5-30 minutes |
+| Token consumption | ~500 tokens per query | ~5000+ tokens |
+| Accuracy | 100% (structured data) | Variable |
+| Domain knowledge required | Minimal | High |
+| Manual steps | 1 command | 5+ steps |
+| Cross-namespace correlation | Automatic | Manual |
+| Race condition diagnosis | Timeline timestamps | Nearly impossible |
+
+---
+
+## Success Criteria Validation
+
+| Criterion | Status | Evidence |
+|-----------|--------|----------|
+| AI finds failures without LogQL | ✅ | All experiments used `temporal workflow` only |
+| Root cause accuracy | ✅ | 100% in all tests |
+| Low token cost | ✅ | ~10x reduction vs logs |
+| Cross-namespace traversal | ✅ | Nexus chains fully traced |
+| Timing bug diagnosis | ✅ | Race condition identified from timeline |
+| Autonomous fix proposal | ✅ | LLM proposed correct atomic operation fix |
+
+---
+
+## Conclusion
+
+The `temporal workflow` CLI commands successfully achieve the goals:
+
+1. **Agent-native feedback loop**: AI agents effectively debug Temporal workflow failures using structured output
+2. **No logs required**: All debugging done via `temporal workflow` commands
+3. **Automatic chain traversal**: Traces follow child workflows and Nexus operations across namespaces
+4. **Root cause extraction**: Leaf failures clearly identified with `--leaf-only`
+5. **Error compaction**: `--compact-errors` strips wrapper context for cleaner output
+6. **Timing analysis**: Timeline timestamps enable race condition diagnosis
+7. **Low token cost**: Structured JSON is ~10x more efficient than raw logs
+8. **Autonomous debugging**: LLM successfully diagnosed and fixed a TOCTOU bug without hints
+9. **Mermaid visualization**: `--output mermaid` generates visual diagrams for human-in-the-loop debugging
+
+**Temporal's execution history + agent-optimized CLI = effective AI debugging feedback loop.**
diff --git a/examples/agent-demo/README.md b/examples/agent-demo/README.md
@@ -0,0 +1,166 @@
+# Temporal Agent Demo
+
+This demo project demonstrates the `temporal workflow` commands for AI-assisted debugging.
+
+## Overview
+
+The demo includes several workflow scenarios:
+
+1. **SimpleSuccessWorkflow** - A basic successful workflow with one activity
+2. **OrderWorkflow** - An order processing workflow with child workflows (PaymentWorkflow, ShippingWorkflow)
+3. **NestedFailureWorkflow** - A deeply nested workflow chain that fails at the leaf level
+
+## Setup
+
+### Prerequisites
+
+- Go 1.23+
+- **Temporal Go SDK v1.37.0+** (required for API key authentication)
+
+### Environment Variables
+
+**For Temporal Cloud (Production):**
+```bash
+export TEMPORAL_ADDRESS="us-east-1.aws.api.temporal.io:7233"
+export TEMPORAL_NAMESPACE="moedash-prod.a2dd6"
+export TEMPORAL_API_KEY="$(cat ../../prod-temporal-api-key.txt)"
+export TEMPORAL_TASK_QUEUE="agent-demo"
+```
+
+**For Temporal Cloud (Staging):**
+```bash
+export TEMPORAL_ADDRESS="us-west-2.aws.api.tmprl-test.cloud:7233"
+export TEMPORAL_NAMESPACE="moedash.temporal-dev"
+export TEMPORAL_API_KEY="$(cat ../../staging-temporal-api-key.txt)"
+export TEMPORAL_TASK_QUEUE="agent-demo"
+```
+> **Note:** Staging uses a self-signed certificate. The worker/starter auto-detect staging URLs and skip TLS verification. For CLI commands, add `--tls-disable-host-verification`.
+
+**For Local Dev Server:**
+```bash
+export TEMPORAL_ADDRESS="localhost:7233"
+export TEMPORAL_NAMESPACE="default"
+export TEMPORAL_TASK_QUEUE="agent-demo"
+```
+
+### Install Dependencies
+
+```bash
+go mod tidy
+```
+
+### SDK Version Note
+
+This demo requires **Temporal Go SDK v1.37.0+** for proper API key authentication. Earlier SDK versions may fail with "Request unauthorized" errors even with valid credentials. The demo uses `go.temporal.io/sdk/contrib/envconfig` for client configuration, matching the CLI's approach.
+
+## Running the Demo
+
+### 1. Start the Worker
+
+In one terminal:
+
+```bash
+go run ./worker
+```
+
+### 2. Start Workflows
+
+In another terminal:
+
+```bash
+# Run all scenarios
+go run ./starter -scenario all
+
+# Or run individual scenarios:
+go run ./starter -scenario success
+go run ./starter -scenario payment-fail
+go run ./starter -scenario shipping-fail
+go run ./starter -scenario nested-fail
+```
+
+## Using Temporal Workflow Commands
+
+After workflows have run, use the agent commands to analyze them.
+
+> **For staging:** Add `--tls-disable-host-verification` to all commands.
+
+### List Recent Failures
+
+```bash
+temporal workflow list --failed \
+    --address $TEMPORAL_ADDRESS \
+    --namespace $TEMPORAL_NAMESPACE \
+    --api-key $TEMPORAL_API_KEY \
+    --tls \
+    --since 1h \
+    --follow-children \
+    --output json | jq
+```
+
+### Trace a Workflow Chain
+
+```bash
+# Find the deepest failure in an order workflow
+temporal workflow describe --trace-root-cause \
+    --address $TEMPORAL_ADDRESS \
+    --namespace $TEMPORAL_NAMESPACE \
+    --api-key $TEMPORAL_API_KEY \
+    --tls \
+    -w order-payment-fail-XXXXXX \
+    --output json | jq
+
+# Trace the nested failure workflow (3 levels deep)
+temporal workflow describe --trace-root-cause \
+    --address $TEMPORAL_ADDRESS \
+    --namespace $TEMPORAL_NAMESPACE \
+    --api-key $TEMPORAL_API_KEY \
+    --tls \
+    -w nested-failure-XXXXXX \
+    --output json | jq
+```
+
+### Get Workflow Timeline
+
+```bash
+temporal workflow show --compact \
+    --address $TEMPORAL_ADDRESS \
+    --namespace $TEMPORAL_NAMESPACE \
+    --api-key $TEMPORAL_API_KEY \
+    --tls \
+    -w order-success-XXXXXX \
+    --compact \
+    --output json | jq
+```
+
+## Workflow Scenarios
+
+### Payment Failure Chain
+
+```
+OrderWorkflow (ORD-XXX-X)
+  └── PaymentWorkflow (payment-ORD-XXX-X)
+        └── ProcessPaymentActivity → FAILS: "payment gateway connection timeout"
+```
+
+### Shipping Failure Chain
+
+```
+OrderWorkflow (ORD-XXX-Y)
+  └── PaymentWorkflow (payment-ORD-XXX-Y) → SUCCESS
+  └── ShippingWorkflow (shipping-ORD-XXX-Y)
+        └── ShipOrderActivity → FAILS: "warehouse inventory depleted"
+```
+
+### Nested Failure Chain
+
+```
+NestedFailureWorkflow (depth=0)
+  └── NestedFailureWorkflow (depth=1)
+        └── NestedFailureWorkflow (depth=2)
+              └── NestedFailureWorkflow (depth=3)
+                    └── FailingActivity → FAILS: "database connection refused"
+```
+
+The `temporal workflow describe --trace-root-cause` command will automatically traverse this entire chain
+and identify the leaf failure with its root cause.
+