Skip to content

fix(workflows): remove ambiguous with-argument from help task chaining#1739

Closed
alexeyv wants to merge 13 commits intobmad-code-org:mainfrom
alexeyv:exp/quick-flow-redesign
Closed

fix(workflows): remove ambiguous with-argument from help task chaining#1739
alexeyv wants to merge 13 commits intobmad-code-org:mainfrom
alexeyv:exp/quick-flow-redesign

Conversation

@alexeyv
Copy link
Copy Markdown
Collaborator

@alexeyv alexeyv commented Feb 23, 2026

Summary

  • Remove with argument clause from 7 workflow completion steps that chain to help.md
  • The clause caused LLMs to interpret Read fully and follow: help.md with argument X as a skill/function invocation instead of a file-read instruction, triggering "unknown skill" errors
  • help.md already infers the completed workflow from the preceding "[Workflow] complete." text in conversation context — no changes to help.md needed

Also fixes:

Both are moot now that arguments are removed entirely.

Closes #1637

Test plan

  • Grep src/ for with argument — should return zero matches
  • Run any workflow to completion (e.g., create-architecture) and verify help.md loads as a file read, not a skill invocation
  • Verify help.md correctly infers the completed workflow from conversation context

Unified 5-step workflow with full BMM sharded workflow plumbing
and minimal step prompts. Steps: clarify/route, plan, implement,
review, present. 29 lines of instructions, 145 lines of plumbing.
Add QD2 trigger to agent definition and module-help.csv so
the installer picks it up across all supported platforms.
Reference specs from the design sessions: full redesign plan,
session state with all design decisions, and implementation roadmap.
18 tasks: skeleton test/eval, then per-step tighten/test/eval
cycles, plus end-to-end eval. Each task file is a self-contained
intent expression that can be fed to QD2.
The COMPOUND_TRIGGER_PATTERN and description bracket regex only
accepted [A-Z]{1,3}, rejecting shortcuts like QD2 that include
digits. Update both regexes to accept [A-Z][A-Z0-9]{0,2} so
alphanumeric shortcuts work while still requiring a leading letter.

Add test fixture covering the alphanumeric shortcut case.
Second QD2 skeleton test run (add plan-review step to task-01) with
raw JSONL log and analysis. Add CAPTURE-RUN.md as a repeatable prompt
for post-run log capture and summarization.
The "with argument" clause in 7 workflow completion steps caused LLMs
to interpret "Read fully and follow: help.md with argument X" as a
skill/function invocation rather than a file-read instruction. Drop the
clause entirely — help.md already infers the completed workflow from
the preceding "[Workflow] complete." text in conversation context.

Closes bmad-code-org#1637
@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 23, 2026

🤖 Augment PR Summary

Summary: This PR reduces ambiguity in workflow completion guidance by removing the “with argument …” clause when chaining into the shared help.md task.

Changes:

  • Updated 7 workflow “complete” steps to call {project-root}/_bmad/core/tasks/help.md without an argument, preventing LLMs from misreading it as a skill/function invocation.
  • Added a new experimental quick-dev2 unified quick-flow workflow (clarify → plan → implement → review → present) plus its step files and a tech-spec template.
  • Added a new agent menu trigger for QD2 pointing at the experimental workflow.
  • Expanded agent-schema validation to allow compound trigger shortcuts with trailing digits (e.g., QD2) and added a fixture covering the new pattern.
  • Ignored _experiment/ content in markdownlint, prettier, and eslint, and added experimental planning/run artifacts under _experiment/.

Technical Notes: The “help” task already infers the completed workflow from conversation/artifact context, so removing the argument avoids tool-invocation confusion without changing help.md behavior.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


### CHECKPOINT 1

Present summary. `[A] Approve [E] Edit [F] Full BMM`. HALT.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using [A] for Approve is likely to collide with the established [a] Advanced Elicitation shortcut used elsewhere in BMAD workflows, which could cause the agent to take the wrong branch when the user replies “A”. Consider whether the checkpoint menu letters should avoid reserved shortcuts.

Severity: medium

Other Locations
  • src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-05-present.md:19

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


## INSTRUCTIONS

1. Diff from `{baseline_commit}`. Review in context-free subagents: intent audit (skip for one-shot) + adversarial code review via `{adversarial_review_task}`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 4 assumes {baseline_commit} is available, but Step 3 doesn’t explicitly state that it must be stored in that variable (or persisted for resume). If it’s missing on a later re-entry, the review diff could be generated from the wrong baseline.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


Present classified findings. `[A] Approve [E] Edit [R] Reject`. HALT.

- **A**: Commit patches. Print push command. Wait. Create PR.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 4 already says it will “Auto-fix patches. Commit.”, so “Commit patches” again here may produce an extra/no-op commit or create ambiguity about what remains to approve at checkpoint 2.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

Introduces a new unified quick-dev2 workflow with clarify, plan, implement, review, and present steps. Adds extensive experiment planning documentation including redesign specifications and task roadmaps. Updates seven existing workflow completion steps to remove "with argument" clauses from help task chaining. Tightens trigger pattern validation and updates configuration ignores.

Changes

Cohort / File(s) Summary
Configuration & Ignores
.markdownlint-cli2.yaml, .prettierignore
Added _experiment/** ignore patterns to markdown linting and prettier configuration.
Experiment Planning Documentation
_experiment/ideal-spec-grok.md, _experiment/planning/redesign-plan.md, _experiment/planning/session-state.md
Added comprehensive spec-driven engineering blueprint, quick-flow redesign plan with step architecture and review framework, and consolidated session state documenting design decisions and task tracking.
Experiment Task Roadmap
_experiment/planning/roadmap/task-*.md (18 files)
Added 18 sequential task definitions covering testing and evaluation of each quick-dev2 step, from skeleton testing through end-to-end validation with prerequisites, metrics, and expected outputs.
Experiment Run Logs
_experiment/runs/*.md (3 files)
Added run documentation capturing QD2 execution attempts, findings on plan-review integration errors, and guidelines for capturing and documenting experimental runs.
Quick-Dev2 Workflow Implementation
src/bmm/workflows/bmad-quick-flow/quick-dev2/workflow.md, src/bmm/workflows/bmad-quick-flow/quick-dev2/tech-spec-template.md, src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-01-clarify-and-route.md, src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-02-plan.md, src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-03-implement.md, src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-04-review.md, src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-05-present.md
Introduced new unified quick-flow workflow with 5-step architecture (clarify & route, plan, implement, review with spec-loop cap, present). Defines sequential execution rules, checkpoint interactions, context-free review subagents, and tech-spec templating.
Quick-Dev2 Agent Trigger
src/bmm/agents/quick-flow-solo-dev.agent.yaml
Added QD2 workflow trigger menu entry for the new unified quick-flow workflow.
Workflow Completion Step Updates
src/bmm/workflows/1-analysis/create-product-brief/steps/step-06-complete.md, src/bmm/workflows/2-plan-workflows/create-prd/steps-c/step-12-complete.md, src/bmm/workflows/2-plan-workflows/create-prd/steps-v/step-v-13-report-complete.md, src/bmm/workflows/2-plan-workflows/create-ux-design/steps/step-14-complete.md, src/bmm/workflows/3-solutioning/create-architecture/steps/step-08-complete.md, src/bmm/workflows/3-solutioning/create-epics-and-stories/steps/step-04-final-validation.md, src/bmm/workflows/3-solutioning/check-implementation-readiness/steps/step-06-final-assessment.md
Removed with argument [X] clauses from help task chaining instructions, simplifying the file-read directive and preventing LLM misinterpretation as function calls (addresses #1637).
Schema Validation
tools/schema/agent.js
Tightened compound trigger regex to enforce 1–3 character shortcuts beginning with uppercase letter, enforcing consistency between trigger shortcut and description bracket format.

Sequence Diagram

sequenceDiagram
    participant User
    participant Step01 as Step 1:<br/>Clarify & Route
    participant Step02 as Step 2:<br/>Plan
    participant Step03 as Step 3:<br/>Implement
    participant Step04 as Step 4:<br/>Review
    participant Step05 as Step 5:<br/>Present

    User->>Step01: Provide task intent
    Step01->>Step01: Clarify intent,<br/>backfill conventions,<br/>determine route
    alt One-shot
        Step01-->>Step03: execution_mode=one-shot
        Step03->>Step03: Execute & implement
        Step03->>Step04: Auto-advance
    else Plan-code-review
        Step01-->>Step02: Default routing
        Step02->>Step02: Investigate, generate<br/>tech-spec, self-review
        Step02-->>User: [Checkpoint 1]<br/>Approve/Edit/Full-BMM
        User-->>Step03: Continue
        Step03->>Step03: Baseline, shard,<br/>execute tasks seq.
        Step03-->>User: Halt if needed
    end
    Step03->>Step04: Auto-advance
    Step04->>Step04: Context-free<br/>adversarial review
    Step04->>Step04: Classify findings,<br/>amend spec (≤5 loops)
    Step04->>Step04: Auto-fix patches,<br/>commit
    Step04->>Step05: Auto-advance
    Step05->>Step05: Prioritize findings,<br/>generate PR
    Step05-->>User: [Checkpoint 2]<br/>Approve/Edit/Reject
    User-->>Step05: Confirm
    Step05-->>User: Present results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested reviewers

  • bmadcode
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes extensive experimental documentation and workflow files (quick-dev2 redesign, planning docs, run logs) beyond the scope of issue #1637. Additionally, changes to tools/schema/agent.js appear unrelated to the help task chaining fix. These are out-of-scope for the stated objective. Remove experimental files (_experiment/*, quick-dev2 workflow files, CAPTURE-RUN.md) and tools/schema/agent.js changes that are unrelated to fixing issue #1637. Keep only the 7 targeted workflow completion step files with the 'with argument' removal.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary change: removing the ambiguous 'with-argument' clause from help task chaining in workflow completion steps.
Description check ✅ Passed The description clearly explains the problem (LLM interpretation error), the solution (remove 'with argument' clause), and documents two additional bugs fixed. It includes a test plan.
Linked Issues check ✅ Passed The PR successfully addresses all 7 files specified in issue #1637 by removing the 'with argument' clause, fixing both identified bugs (wrong argument in file #1 and casing mismatch in file #7), and maintains context for help.md inference. The change fully resolves the stated objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/bmm/workflows/3-solutioning/check-implementation-readiness/steps/step-06-final-assessment.md (1)

39-43: ⚠️ Potential issue | 🟡 Minor

Pre-existing semantic error: 🚫 emoji misused for a required action.

The EXECUTION PROTOCOLS block at line 43 reads:

🚫 Complete and present final report

Throughout these workflow files, 🚫 signals a forbidden action (e.g., 🚫 FORBIDDEN to approve incomplete coverage). Using it on a step that must be executed is a contradiction that could cause an LLM to skip this action entirely. Since the PR is already touching this file, this should be corrected now.

🔧 Proposed fix
-🚫 Complete and present final report
+🎯 Complete and present final report
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/bmm/workflows/3-solutioning/check-implementation-readiness/steps/step-06-final-assessment.md`
around lines 39 - 43, Summary: The EXECUTION PROTOCOLS line currently uses the
forbidden emoji "🚫" for a required step ("🚫 Complete and present final
report"), causing semantic conflict; replace the emoji with a positive/required
marker. Fix: edit the EXECUTION PROTOCOLS entry that reads "🚫 Complete and
present final report" and change the emoji to a required-action symbol used
elsewhere (e.g., "✅" or "🎯") or remove the emoji so the line reads "Complete
and present final report"; ensure the text remains identical otherwise so
downstream LLMs see it as a required step.
src/bmm/workflows/2-plan-workflows/create-ux-design/steps/step-14-complete.md (1)

83-89: ⚠️ Potential issue | 🟡 Minor

Section 5 (Final Completion Confirmation) is unreachable — the help.md redirect in Section 3 transfers control before Section 5 executes.

The WORKFLOW COMPLETION SEQUENCE instructs the agent to read the complete file first, then execute sections in order. When it reaches Section 3 (line 83–85) and follows help.md, it exits this step's execution context. Section 5 "Final Completion Confirmation" (line 87–89) — the congratulations message — is never delivered to the user.

Every other file in this PR places the help.md redirect as the very last instruction in the final section. In this file alone, it sits in an intermediate section with content following it.

Additionally:

  • Section 4 is absent from the WORKFLOW COMPLETION SEQUENCE (the numbering jumps 3 → 5). This was presumably Section 4 at some point and its removal/renaming was not cleaned up.
  • Line 89 contains garbled wording: "Congratulate the user on the completion you both completed together of the UX." — even if this section were reachable, the phrasing should be corrected.
🔧 Suggested fix

Move the congratulations into Section 3 (before the redirect) and renumber:

 ### 3. Suggest Next Steps
 
+Congratulate the user on completing the UX Design.
+
 UX Design complete. Read fully and follow: `{project-root}/_bmad/core/tasks/help.md`
 
-### 5. Final Completion Confirmation
-
-Congratulate the user on the completion you both completed together of the UX.
+### 4. Final Completion Confirmation
+
+(Handled by help.md guidance above.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/bmm/workflows/2-plan-workflows/create-ux-design/steps/step-14-complete.md`
around lines 83 - 89, Section 5 ("Final Completion Confirmation") is unreachable
because the help.md redirect in Section 3 transfers control early and the
numbering skips Section 4; move the congratulations text so it appears before
the `{project-root}/_bmad/core/tasks/help.md` redirect (i.e., place the final
confirmation inside Section 3 or renumber sections so the redirect is last), fix
the garbled sentence to something like "Congratulate the user on completion of
the UX design you completed together," and renumber headings sequentially to
eliminate the missing Section 4 and ensure the help.md redirect is always the
final instruction in the step.
src/bmm/workflows/2-plan-workflows/create-prd/steps-v/step-v-13-report-complete.md (1)

196-202: ⚠️ Potential issue | 🟡 Minor

Users selecting non-Exit paths (R, E, F) bypass help.md guidance, and the Edit workflow's completion option is missing it entirely.

The help.md reference at line 200 is nested inside the IF X (Exit) conditional block, which means users who select R (Review), E (Edit Workflow), or F (Fix Simpler Items) will never receive the next-step guidance from help.md. More critically, the Edit workflow completion file (step-e-04-complete.md) offers users the ability to run full validation but contains zero reference to help.md — creating a gap where Edit → Validation completions lack guidance continuity.

Separately, "PRD Validation complete." already follows the correct naming pattern (artifact-type + validation status), consistent with other workflow completions like "Architecture complete." or "UX Design complete."

To fix: Move the help.md line to a top-level step that all menu branches (R, E, F, X) converge on, or add explicit help.md routing to each branch endpoint. Additionally, audit step-e-04-complete.md to ensure it includes help.md guidance when the Edit workflow concludes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/bmm/workflows/2-plan-workflows/create-prd/steps-v/step-v-13-report-complete.md`
around lines 196 - 202, Move the help.md reference out of the IF X (Exit) block
in step-v-13-report-complete.md so it is shown for all menu outcomes (R, E, F,
X) by placing the line "PRD Validation complete. Read fully and follow:
`{project-root}/_bmad/core/tasks/help.md`" at a top-level/common completion
section that all branches converge on, or alternatively add the same help.md
line explicitly under each branch (R, E, F, X); then open step-e-04-complete.md
and add the same help.md guidance to its completion text so the Edit workflow
also directs users to help.md. Ensure the wording preserves the existing "PRD
Validation complete." naming pattern and that the inserted help.md path matches
`{project-root}/_bmad/core/tasks/help.md`.
tools/schema/agent.js (1)

93-100: ⚠️ Potential issue | 🟠 Major

Early return in superRefine exits the entire callback — subsequent menu items are never validated

When parseCompoundTrigger returns invalid on menu item N, the return statement exits the whole superRefine function, not just the loop iteration. Menu items N+1, N+2, … are silently skipped. The same problem applies to the description-shortcut mismatch check (lines 104–111 and 115–121). Callers get at most one error per schema validation run, forcing tedious fix-and-re-validate cycles.

Replace return with continue-equivalent logic (collect errors without short-circuiting the loop).

🐛 Proposed fix: accumulate errors without bailing out
 if (!result.valid) {
   ctx.addIssue({
     code: 'custom',
     path: ['agent', 'menu', index, 'trigger'],
     message: `agent.menu[].trigger compound format error: ${result.error}`,
   });
-  return;
+  index += 1;
+  continue;
 }

 const descriptionMatch = item.description?.match(/^\[([A-Z][A-Z0-9]{0,2})\]/);
 if (!descriptionMatch) {
   ctx.addIssue({
     code: 'custom',
     path: ['agent', 'menu', index, 'description'],
     message: `agent.menu[].description must start with [SHORTCUT] where SHORTCUT matches the trigger shortcut "${result.shortcut}"`,
   });
-  return;
+  index += 1;
+  continue;
 }

Note: The loop uses a manual index variable rather than entries(). A cleaner refactor would switch to for (const [index, item] of value.agent.menu.entries()) so continue works directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/schema/agent.js` around lines 93 - 100, In superRefine for the agent
schema, the current use of return inside the loop (after parseCompoundTrigger
invalid results and in the description-shortcut mismatch checks) aborts the
entire superRefine callback so subsequent agent.menu items are never validated;
change the logic to collect issues and continue iterating instead of returning —
either convert the loop to for (const [index, item] of
value.agent.menu.entries()) and use continue after calling ctx.addIssue (for the
parseCompoundTrigger failure and the description/shortcut mismatch checks), or
remove the early return and let the loop proceed while still calling
ctx.addIssue so all menu entries are validated (references: superRefine,
parseCompoundTrigger, ctx.addIssue, agent.menu).
_experiment/planning/roadmap/task-03-tighten-step-01-routing.md (1)

1-21: ⚠️ Potential issue | 🟡 Minor

Missing Output section — deliverable artifact undefined.

Every peer task in this roadmap (task-04, task-05, task-13, task-14, task-15, task-17, task-18) has an ## Output section naming the concrete deliverable. This "tighten" task has none. Readers cannot tell what artifact constitutes completion — presumably the updated step-01-clarify-and-route.md, but that is never stated. Without an explicit output, there is no way to verify the task is done.

📋 Suggested addition
 ## Constraint

 Keep prompts thin. Add the minimum words needed to close the gap. If the LLM was already doing it right from training, don't add instructions for it.
+
+## Output
+
+Updated `src/bmm/workflows/bmad-quick-flow/quick-dev2/step-01-clarify-and-route.md`. Change noted in session log.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@_experiment/planning/roadmap/task-03-tighten-step-01-routing.md` around lines
1 - 21, The roadmap task file is missing an explicit "## Output" section, so add
a brief Output block to
_experiment/planning/roadmap/task-03-tighten-step-01-routing.md that names the
concrete deliverable (e.g., "Updated step-01-clarify-and-route.md with tightened
routing criteria and exit criteria enforcement") and any acceptance criteria
needed to verify completion (e.g., specific checks for routing precision,
WIP/artifact detection, and project context backfill). Reference the target
artifact by filename step-01-clarify-and-route.md and include 1–2 bullet
acceptance checks to make completion verifiable.

Comment on lines +11 to +17
```bash
# Find the most recent conversation log
ls -lt ~/.claude/projects/-Users-alex-src-bmad-quick-flow-redesign/*.jsonl | head -1

# Copy it with the naming convention: YYYY-MM-DD-<short-slug>.jsonl
cp <path-from-above> _experiment/runs/YYYY-MM-DD-<short-slug>.jsonl
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Hardcoded personal machine path blocks all other users.

Line 13 hard-codes /Users/alex/src/bmad-quick-flow-redesign/ as the Claude project directory. This is not portable — it will silently produce no output on any machine other than the author's. A guide in a shared repository must not contain a personal filesystem path.

🛠 Proposed fix — use `fd` or `find` to locate the most recent JSONL
-# Find the most recent conversation log
-ls -lt ~/.claude/projects/-Users-alex-src-bmad-quick-flow-redesign/*.jsonl | head -1
-
-# Copy it with the naming convention: YYYY-MM-DD-<short-slug>.jsonl
-cp <path-from-above> _experiment/runs/YYYY-MM-DD-<short-slug>.jsonl
+# Find the most recent conversation log across all Claude projects
+LATEST=$(find ~/.claude/projects -name '*.jsonl' -printf '%T@ %p\n' 2>/dev/null | sort -rn | head -1 | cut -d' ' -f2-)
+echo "Latest log: $LATEST"
+
+# Copy it with the naming convention: YYYY-MM-DD-<short-slug>.jsonl
+cp "$LATEST" _experiment/runs/YYYY-MM-DD-<short-slug>.jsonl
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@_experiment/runs/CAPTURE-RUN.md` around lines 11 - 17, The script in
_experiment/runs/CAPTURE-RUN.md uses a hardcoded personal path
(/Users/alex/src/bmad-quick-flow-redesign/) which prevents others from finding
recent Claude project JSONL files; update the instructions to use a portable
discovery command (e.g., search under the shared projects directory instead of a
user-specific path) — replace the hardcoded path in the ls invocation with a
glob or a find/fd-based search that targets ~/.claude/projects/*.jsonl (or an
equivalent command that locates the newest .jsonl file) and instruct the user to
copy that discovered path to _experiment/runs/YYYY-MM-DD-<short-slug>.jsonl so
the steps work across machines.

Strip task sharding, rigid approval menu, and VC backfill. Add
intent_gap-first classification cascade to review. Use slug-based
spec naming in plan step.
Cross-references 3 QD2 test runs against the redesign plan.
Classifies 24 observations as Works/Gap/Execution Gap/Plan Gap/Plumbing.
Key findings: one-shot path bypasses step architecture, terse step files
cause execution gaps, plan-code-review path works end-to-end.
Metrics from JSONL log mining: 56% fewer human turns (18→8),
32% fewer API turns (72→49), token cost at parity (1.14x).
One-shot path identified as broken (P0 fix priority).
@alexeyv alexeyv marked this pull request as draft February 23, 2026 04:40
Captures lessons from the skeleton eval: workflow boundary scoping,
human turn filtering, token deduplication, context normalization,
and five common pitfalls to avoid.
@alexeyv
Copy link
Copy Markdown
Collaborator Author

alexeyv commented Feb 23, 2026

Closing: this PR accidentally included 12 unrelated commits from the experiment branch. Reopening as a clean single-commit PR from fix/remove-with-argument.

@alexeyv alexeyv closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: 'with argument' in help task chaining causes unknown skill errors

1 participant