Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 0 additions & 23 deletions src/bmm/workflows/4-implementation/bmad-code-review/checklist.md

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
name: Gather Context
description: 'Determine what to review, construct the diff, and load any spec/context documents.'
diff_output: '' # set at runtime
spec_file: '' # set at runtime (path or empty)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec_file is described as a “path or empty”, but later instructions treat {spec_file} as loaded content (parse its frontmatter context, and pass “{spec_file} content” to the Acceptance Auditor). Consider clarifying whether this variable is meant to store the path or the file contents to avoid the auditor/context-loading steps running with the wrong input.

Severity: medium

Other Locations
  • src/bmm/workflows/4-implementation/bmad-code-review/steps/step-02-review.md:23

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

review_mode: '' # set at runtime: "full" or "no-spec"
---

# Step 1: Gather Context

## RULES

- YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`
- The prompt that triggered this workflow IS the intent — not a hint.
- Do not modify any files. This step is read-only.

## INSTRUCTIONS

1. **Detect review intent from invocation text.** Check the triggering prompt for phrases that map to a review mode:
- "staged" / "staged changes" → Staged changes only
- "uncommitted" / "working tree" / "all changes" → Uncommitted changes (staged + unstaged)
- "branch diff" / "vs main" / "against main" / "compared to {branch}" → Branch diff (extract base branch if mentioned)
- "commit range" / "last N commits" / "{sha}..{sha}" → Specific commit range
- "this diff" / "provided diff" / "paste" → User-provided diff (do not match bare "diff" — it appears in other modes)
- When multiple phrases match, prefer the most specific match (e.g., "branch diff" over bare "diff").
- **If a clear match is found:** Announce the detected mode (e.g., "Detected intent: review staged changes only") and proceed directly to constructing `{diff_output}` using the corresponding sub-case from instruction 3. Skip to instruction 4 (spec question).
- **If no match from invocation text, check sprint tracking.** Look for a sprint status file (`*sprint-status*`) in `{implementation_artifacts}` or `{planning_artifacts}`. If found, scan for any story with status `review`. Handle as follows:
- **Exactly one `review` story:** Suggest it: "I found story {story-id} in `review` status. Would you like to review its changes? [Y] Yes / [N] No, let me choose". If confirmed, use the story context to determine the diff source (branch name derived from story slug, or uncommitted changes). If declined, fall through to instruction 2.
- **Multiple `review` stories:** Present them as numbered options alongside a manual choice option. Wait for user selection. Then use the selected story's context to determine the diff source as in the single-story case above, and proceed to instruction 3.
- **If no match and no sprint tracking:** Fall through to instruction 2.

2. Ask the user: **What do you want to review?** Present these options:
- **Uncommitted changes** (staged + unstaged)
- **Staged changes only**
- **Branch diff** vs a base branch (ask which base branch)
- **Specific commit range** (ask for the range)
- **Provided diff or file list** (user pastes or provides a path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Provided diff or file list (user pastes or provides a path)” is a bit ambiguous: if the user provides a path to a diff file, step 2 only describes validating pasted diff content. Consider clarifying whether a path here means “read that file and treat its contents as the unified diff”, vs only supporting pasted diffs and separate path-based file lists.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


3. Construct `{diff_output}` from the chosen source.
- For **branch diff**: verify the base branch exists before running `git diff`. If it does not exist, HALT and ask the user for a valid branch.
- For **commit range**: verify the range resolves. If it does not, HALT and ask the user for a valid range.
- For **provided diff**: validate the content is non-empty and parseable as a unified diff. If it is not parseable, HALT and ask the user to provide a valid diff.
- For **file list**: validate each path exists in the working tree. Construct `{diff_output}` by running `git diff HEAD -- <path1> <path2> ...`. If the diff is empty (files have no uncommitted changes), ask the user whether to review the full file contents or to specify a different baseline.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

File-list fallback is underspecified and cannot be executed deterministically.

Line 29 says to “review the full file contents” when the diff is empty, but does not define how to construct {diff_output} in that branch. This creates a dead-end path in Step 1.

Proposed fix
-  - For **file list**: validate each path exists in the working tree. Construct `{diff_output}` by running `git diff HEAD -- <path1> <path2> ...`. If the diff is empty (files have no uncommitted changes), ask the user whether to review the full file contents or to specify a different baseline.
+  - For **file list**: validate each path exists in the working tree. Construct `{diff_output}` by running `git diff HEAD -- <path1> <path2> ...`.
+  - If that diff is empty, ask user to choose one fallback:
+    1) **Full file review mode**: construct synthetic review input containing file headers + current file contents for each path.
+    2) **Alternate baseline**: ask for `<baseline_ref>` and run `git diff <baseline_ref> -- <path1> <path2> ...`.
+  - HALT if neither fallback yields non-empty review input.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-01-gather-context.md`
at line 29, When the "file list" branch finds an empty git diff, clarify and
implement a deterministic fallback for constructing {diff_output}: if the user
opts to "review the full file contents", build {diff_output} by concatenating
the full working-tree contents of each path with a clear per-file header (e.g.,
"Full contents of <path>:"), using the current working-tree content (cat <path>
or git show :<path>) so the review step that consumes {diff_output} receives
explicit, reproducible file data; update the text around the existing "git diff
HEAD -- <path1> <path2> ..." instruction to document this fallback and reference
{diff_output} and the "file list" branch explicitly.

- After constructing `{diff_output}`, verify it is non-empty regardless of source type. If empty, HALT and tell the user there is nothing to review.
Comment on lines +29 to +30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

File-list diff baseline is hardcoded to HEAD, which can silently miss intended review scope.

Using only git diff HEAD -- <paths> ignores other valid review baselines (e.g., target branch, explicit commit range), so the selected source can produce misleadingly empty output.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-01-gather-context.md`
around lines 29 - 30, The file-list diff baseline is hardcoded to "HEAD" when
constructing {diff_output}; change the git invocation to accept a configurable
baseline instead of always using HEAD (e.g., replace `git diff HEAD -- <paths>`
with `git diff <baseline> -- <paths>`), add logic to obtain/validate the
baseline source (target branch, explicit commit/range, or user-provided
baseline) and fall back to HEAD only as a default with a clear notice, and
ensure the existing verification that {diff_output} is non-empty now considers
the chosen baseline and prompts the user to pick a different baseline or review
full file contents if the diff is empty.


4. Ask the user: **Is there a spec or story file that provides context for these changes?**
- If yes: set `{spec_file}` to the path provided, verify the file exists and is readable, then set `{review_mode}` = `"full"`.
- If no: set `{review_mode}` = `"no-spec"`.

5. If `{review_mode}` = `"full"` and the file at `{spec_file}` has a `context` field in its frontmatter listing additional docs, load each referenced document. Warn the user about any docs that cannot be found.

Comment on lines +33 to +37
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Path intake lacks scope guard and can exfiltrate unrelated local files.

Line 33 and Line 36 accept arbitrary paths for {spec_file} and context docs without constraining them to repository/workspace boundaries. That opens accidental sensitive-file inclusion.

Proposed fix
-  - If yes: set `{spec_file}` to the path provided, verify the file exists and is readable, then set `{review_mode}` = `"full"`.
+  - If yes: set `{spec_file}` to the path provided, verify the file exists, is readable, and resolves inside the current workspace/repository root; otherwise HALT and request a valid in-repo path. Then set `{review_mode}` = `"full"`.

-4. If `{review_mode}` = `"full"` and the file at `{spec_file}` has a `context` field in its frontmatter listing additional docs, load each referenced document. Warn the user about any docs that cannot be found.
+4. If `{review_mode}` = `"full"` and the file at `{spec_file}` has a `context` field in its frontmatter listing additional docs, load each referenced document only if it resolves inside the workspace/repository root. Warn on missing or out-of-scope paths.
🧰 Tools
🪛 LanguageTool

[style] ~36-~36: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... set {review_mode} = "no-spec". 4. If {review_mode} = "full" and the file...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-01-gather-context.md`
around lines 33 - 37, The path intake currently accepts arbitrary values for
{spec_file} and referenced context docs (frontmatter "context") and must be
constrained to the repository/workspace to avoid exfiltration: when you set
{spec_file} and when loading each referenced context doc (only in the "full"
{review_mode} branch), resolve the candidate path against the
repository/workspace root, reject absolute paths and any paths that traverse
above the root (e.g., normalize and ensure the resolved path starts with
workspaceRoot), follow/validate symlinks to ensure they point inside the
workspace, and only load files that exist and are readable inside that root; for
any rejected or unreadable files log/warn and skip them rather than loading.

6. Sanity check: if `{diff_output}` exceeds approximately 3000 lines, warn the user and offer to chunk the review by file group.
- If the user opts to chunk: agree on the first group, narrow `{diff_output}` accordingly, and list the remaining groups for the user to note for follow-up runs.
- If the user declines: proceed as-is with the full diff.

### CHECKPOINT

Present a summary before proceeding: diff stats (files changed, lines added/removed), `{review_mode}`, and loaded spec/context docs (if any). HALT and wait for user confirmation to proceed.


## NEXT

Read fully and follow `./step-02-review.md`
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: Review
description: 'Launch parallel adversarial review layers and collect findings.'
---

# Step 2: Review

## RULES

- YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`
- The Blind Hunter subagent receives NO project context — diff only.
- The Edge Case Hunter subagent receives diff and project read access.
- The Acceptance Auditor subagent receives diff, spec, and context docs.

## INSTRUCTIONS

1. Launch parallel subagents. Each subagent gets NO conversation history from this session:

- **Blind Hunter** -- Invoke the `bmad-review-adversarial-general` skill in a subagent. Pass `content` = `{diff_output}` only. No spec, no project access.

- **Edge Case Hunter** -- Invoke the `bmad-review-edge-case-hunter` skill in a subagent. Pass `content` = `{diff_output}`. This subagent has read access to the project.

- **Acceptance Auditor** (only if `{review_mode}` = `"full"`) -- A subagent that receives `{diff_output}`, the content of the file at `{spec_file}`, and any loaded context docs. Its prompt:
> You are an Acceptance Auditor. Review this diff against the spec and context docs. Check for: violations of acceptance criteria, deviations from spec intent, missing implementation of specified behavior, contradictions between spec constraints and actual code. Output findings as a markdown list. Each finding: one-line title, which AC/constraint it violates, and evidence from the diff.
Comment on lines +17 to +24
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Parallel execution has no timeout budget or cancellation rule.

This step can hang indefinitely waiting for a slow subagent, blocking the pipeline without a defined recovery path.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~24-~24: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...s and actual code. Output findings as a markdown list. Each finding: one-line title, whi...

(MARKDOWN_NNP)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-02-review.md`
around lines 17 - 24, The parallel subagent orchestration (Blind Hunter invoking
bmad-review-adversarial-general, Edge Case Hunter invoking
bmad-review-edge-case-hunter, and Acceptance Auditor when {review_mode} ==
"full" using {diff_output} and {spec_file}) lacks any timeout or cancellation
policy and can hang indefinitely; update the step to enforce a per-subagent
timeout and an overall parallel-budget timeout, specify cancellation semantics
(e.g., cancel remaining subagents when the overall timeout elapses or when
quorum of required results is reached), and define fallback behavior (e.g., mark
missing results as timed-out, continue with available outputs, and emit a clear
timeout error). Ensure you reference the subagent names and skills in the text
and add configurable timeout parameters (per_subagent_timeout,
parallel_budget_timeout) and a short note on how to handle partial/failed
responses in the Acceptance Auditor aggregation logic.


2. **Subagent failure handling**: If any subagent fails, times out, or returns empty results, note the failed layer and proceed with findings from the remaining layers. Report the failure to the user in the next step.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Failure handling incorrectly treats empty findings as an error.

A subagent can legitimately return zero findings. Mark only timeout/error/unparseable output as failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-02-review.md`
at line 26, Update the subagent failure handling text so that empty findings are
not treated as a failure: change the rule in the "Subagent failure handling"
step to only mark a layer as failed when a subagent times out, throws an error,
or returns unparseable output, and treat a legitimately empty findings array as
a successful (but empty) response to be included in aggregations and user
reporting; reference the terms "subagent", "layer", "findings", "timeout" and
"unparseable" when making the clarification so reviewers and implementers know
exactly which conditions constitute failure.


3. **Fallback** (if subagents are not available): Generate prompt files in `{implementation_artifacts}` -- one per active reviewer:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these just be 3 separate skills if this is the fallback?

what i would suggest when we later rework the skill is have arguments and paths internally to run the different review types - or we can create a bunch of different review skills. can design this better later - for now though, hmm not sure if this is good UX vs just accepting that its a crap tool and try to just run them 1 at a time...

- `review-blind-hunter.md` (always)
- `review-edge-case-hunter.md` (always)
- `review-acceptance-auditor.md` (only if `{review_mode}` = `"full"`)

HALT. Tell the user to run each prompt in a separate session and paste back findings. When findings are pasted, resume from this point and proceed to step 3.

4. Collect all findings from the completed layers.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Raw layer outputs are not explicitly preserved before aggregation.

Line 35 jumps to “collect all findings” but does not require preserving raw per-layer payloads. That removes forensic traceability when triage reclassifies or drops findings later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-02-review.md`
at line 35, Update the step that currently reads "Collect all findings" (the
action at Line 35) to explicitly require persisting raw per-layer outputs before
any aggregation or triage; instruct the workflow to store each layer's payload
(e.g., raw_layer_output or per-layer JSON) into a designated immutable store or
archive with metadata (layer name, timestamp, source) so that aggregation
functions like "collect all findings" consume copies, not originals, preserving
forensic traceability for reclassification or drops.



## NEXT

Read fully and follow `./step-03-triage.md`
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
name: Triage
description: 'Normalize, deduplicate, and classify all review findings into actionable categories.'
---

# Step 3: Triage

## RULES

- YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`
- Be precise. When uncertain between categories, prefer the more conservative classification.

## INSTRUCTIONS

1. **Normalize** findings into a common format. Expected input formats:
- Adversarial (Blind Hunter): markdown list of descriptions
- Edge Case Hunter: JSON array with `location`, `trigger_condition`, `guard_snippet`, `potential_consequence` fields
- Acceptance Auditor: markdown list with title, AC/constraint reference, and evidence

If a layer's output does not match its expected format, attempt best-effort parsing. Note any parsing issues for the user.

Convert all to a unified list where each finding has:
- `id` -- sequential integer
- `source` -- `blind`, `edge`, `auditor`, or merged sources (e.g., `blind+edge`)
- `title` -- one-line summary
- `detail` -- full description
- `location` -- file and line reference (if available)

2. **Deduplicate.** If two or more findings describe the same issue, merge them into one:
- Use the most specific finding as the base (prefer edge-case JSON with location over adversarial prose).
- Append any unique detail, reasoning, or location references from the other finding(s) into the surviving `detail` field.
- Set `source` to the merged sources (e.g., `blind+edge`).

3. **Classify** each finding into exactly one bucket:
- **intent_gap** -- The spec/intent is incomplete; cannot resolve from existing information. Only possible if `{review_mode}` = `"full"`.
- **bad_spec** -- The spec should have prevented this; spec is wrong or ambiguous. Only possible if `{review_mode}` = `"full"`.
- **patch** -- Code issue that is trivially fixable without human input. Just needs a code change.
- **defer** -- Pre-existing issue not caused by the current change. Real but not actionable now.
- **reject** -- Noise, false positive, or handled elsewhere.

If `{review_mode}` = `"no-spec"` and a finding would otherwise be `intent_gap` or `bad_spec`, reclassify it as `patch` (if code-fixable) or `defer` (if not).

4. **Drop** all `reject` findings. Record the reject count for the summary.

5. If zero findings remain after dropping rejects, note clean review.

6. If any review layer failed or returned empty (noted in step 2), report this to the user now.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Step reference is incorrect and will mislead execution.

Line 44 says failures were “noted in step 2”, but failure/format tracking is introduced in normalization flow (step 1). This creates operator confusion during handoff.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-03-triage.md`
at line 44, Update the incorrect step reference in step-03-triage.md: change the
phrase that reads “noted in step 2” to correctly reference the normalization
flow where failure/format tracking is introduced (e.g., “noted in step 1” or
“noted in the normalization flow (step 1)”) so the triage instruction aligns
with the normalization logic; edit the sentence in the step-03-triage.md content
where that phrase appears to ensure operators are directed to the normalization
flow for failure details.

Comment on lines +45 to +47
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

“Clean review” can be reported even when coverage was partial.

Line 45 allows a clean result after rejects are dropped, but Line 47 may also report failed/empty layers. That combination is contradictory and can be misread as a full-pass review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-03-triage.md`
around lines 45 - 47, The docs allow reporting a "clean review" in step 5 even
when step 2 indicated failed/empty review layers, which is contradictory; update
the logic/text so "clean review" is only reported when zero findings remain AND
no review layers failed or returned empty (i.e., ensure step 5 checks the step 2
status), otherwise emit a distinct "partial coverage / clean after rejects"
message that notes which layers failed or were empty; reference step 2
(failed/empty layers), step 5 ("clean review") and step 6 (report failed/empty
layers) when making this change.



## NEXT

Read fully and follow `./step-04-present.md`
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: Present
description: 'Present triaged findings grouped by category with actionable recommendations.'
---

# Step 4: Present

## RULES

- YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`
- Do NOT auto-fix anything. Present findings and let the user decide next steps.

## INSTRUCTIONS

1. Group remaining findings by category.

2. Present to the user in this order (include a section only if findings exist in that category):

- **Intent Gaps**: "These findings suggest the captured intent is incomplete. Consider clarifying intent before proceeding."
- List each with title + detail.

- **Bad Spec**: "These findings suggest the spec should be amended. Consider regenerating or amending the spec with this context:"
- List each with title + detail + suggested spec amendment.

- **Patch**: "These are fixable code issues:"
- List each with title + detail + location (if available).

Comment on lines +25 to +27
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Patch findings are allowed without a location, which makes them non-actionable.

For patch, location should be mandatory (or explicitly require a concrete reproduction pointer when line-level location is unavailable). Optional location will produce “fix me somewhere” output.

Proposed tightening
-   - **Patch**: "These are fixable code issues:"
-     - List each with title + detail + location (if available).
+   - **Patch**: "These are fixable code issues:"
+     - List each with title + detail + location.
+     - If exact line is unavailable, require at least file + symbol/function/block reference.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Patch**: "These are fixable code issues:"
- List each with title + detail + location (if available).
- **Patch**: "These are fixable code issues:"
- List each with title + detail + location.
- If exact line is unavailable, require at least file + symbol/function/block reference.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bmm/workflows/4-implementation/bmad-code-review/steps/step-04-present.md`
around lines 25 - 27, The "**Patch**" section ("These are fixable code issues:")
currently allows entries without a location; change the spec and validation so
that every patch entry must include a concrete location (line/file range) or an
explicit reproduction pointer. Update the template text under the "**Patch**"
heading to state "Location (file:line or reproduction pointer) is required" and
add validation logic in the patch builder/validator to reject or flag any patch
missing a location for the "patch" type.

- **Defer**: "Pre-existing issues surfaced by this review (not caused by current changes):"
- List each with title + detail.

3. Summary line: **X** intent_gap, **Y** bad_spec, **Z** patch, **W** defer findings. **R** findings rejected as noise.

4. If clean review (zero findings across all layers after triage): state that N findings were raised but all were classified as noise, or that no findings were raised at all (as applicable).

5. Offer the user next steps (recommendations, not automated actions):
- If `patch` findings exist: "You can ask me to apply these patches, or address them manually."
- If `intent_gap` or `bad_spec` findings exist: "Consider running the planning workflow to clarify intent or amend the spec before continuing."
- If only `defer` findings remain: "No action needed for this change. Deferred items are noted for future attention."

Workflow complete.
Loading
Loading