-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
feat(skills): rewrite code-review with sharded step-file architecture #2007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5de8a78
17cd19f
f12e38e
6a91eb6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| --- | ||
| name: Gather Context | ||
| description: 'Determine what to review, construct the diff, and load any spec/context documents.' | ||
| diff_output: '' # set at runtime | ||
| spec_file: '' # set at runtime (path or empty) | ||
| review_mode: '' # set at runtime: "full" or "no-spec" | ||
| --- | ||
|
|
||
| # Step 1: Gather Context | ||
|
|
||
| ## RULES | ||
|
|
||
| - YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}` | ||
| - The prompt that triggered this workflow IS the intent — not a hint. | ||
| - Do not modify any files. This step is read-only. | ||
|
|
||
| ## INSTRUCTIONS | ||
|
|
||
| 1. **Detect review intent from invocation text.** Check the triggering prompt for phrases that map to a review mode: | ||
| - "staged" / "staged changes" → Staged changes only | ||
| - "uncommitted" / "working tree" / "all changes" → Uncommitted changes (staged + unstaged) | ||
| - "branch diff" / "vs main" / "against main" / "compared to {branch}" → Branch diff (extract base branch if mentioned) | ||
| - "commit range" / "last N commits" / "{sha}..{sha}" → Specific commit range | ||
| - "this diff" / "provided diff" / "paste" → User-provided diff (do not match bare "diff" — it appears in other modes) | ||
| - When multiple phrases match, prefer the most specific match (e.g., "branch diff" over bare "diff"). | ||
| - **If a clear match is found:** Announce the detected mode (e.g., "Detected intent: review staged changes only") and proceed directly to constructing `{diff_output}` using the corresponding sub-case from instruction 3. Skip to instruction 4 (spec question). | ||
| - **If no match from invocation text, check sprint tracking.** Look for a sprint status file (`*sprint-status*`) in `{implementation_artifacts}` or `{planning_artifacts}`. If found, scan for any story with status `review`. Handle as follows: | ||
| - **Exactly one `review` story:** Suggest it: "I found story {story-id} in `review` status. Would you like to review its changes? [Y] Yes / [N] No, let me choose". If confirmed, use the story context to determine the diff source (branch name derived from story slug, or uncommitted changes). If declined, fall through to instruction 2. | ||
| - **Multiple `review` stories:** Present them as numbered options alongside a manual choice option. Wait for user selection. Then use the selected story's context to determine the diff source as in the single-story case above, and proceed to instruction 3. | ||
| - **If no match and no sprint tracking:** Fall through to instruction 2. | ||
|
|
||
| 2. Ask the user: **What do you want to review?** Present these options: | ||
| - **Uncommitted changes** (staged + unstaged) | ||
| - **Staged changes only** | ||
| - **Branch diff** vs a base branch (ask which base branch) | ||
| - **Specific commit range** (ask for the range) | ||
| - **Provided diff or file list** (user pastes or provides a path) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. “Provided diff or file list (user pastes or provides a path)” is a bit ambiguous: if the user provides a path to a diff file, step 2 only describes validating pasted diff content. Consider clarifying whether a path here means “read that file and treat its contents as the unified diff”, vs only supporting pasted diffs and separate path-based file lists. Severity: medium 🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage. |
||
|
|
||
| 3. Construct `{diff_output}` from the chosen source. | ||
| - For **branch diff**: verify the base branch exists before running `git diff`. If it does not exist, HALT and ask the user for a valid branch. | ||
| - For **commit range**: verify the range resolves. If it does not, HALT and ask the user for a valid range. | ||
| - For **provided diff**: validate the content is non-empty and parseable as a unified diff. If it is not parseable, HALT and ask the user to provide a valid diff. | ||
| - For **file list**: validate each path exists in the working tree. Construct `{diff_output}` by running `git diff HEAD -- <path1> <path2> ...`. If the diff is empty (files have no uncommitted changes), ask the user whether to review the full file contents or to specify a different baseline. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. File-list fallback is underspecified and cannot be executed deterministically. Line 29 says to “review the full file contents” when the diff is empty, but does not define how to construct Proposed fix- - For **file list**: validate each path exists in the working tree. Construct `{diff_output}` by running `git diff HEAD -- <path1> <path2> ...`. If the diff is empty (files have no uncommitted changes), ask the user whether to review the full file contents or to specify a different baseline.
+ - For **file list**: validate each path exists in the working tree. Construct `{diff_output}` by running `git diff HEAD -- <path1> <path2> ...`.
+ - If that diff is empty, ask user to choose one fallback:
+ 1) **Full file review mode**: construct synthetic review input containing file headers + current file contents for each path.
+ 2) **Alternate baseline**: ask for `<baseline_ref>` and run `git diff <baseline_ref> -- <path1> <path2> ...`.
+ - HALT if neither fallback yields non-empty review input.🤖 Prompt for AI Agents |
||
| - After constructing `{diff_output}`, verify it is non-empty regardless of source type. If empty, HALT and tell the user there is nothing to review. | ||
|
Comment on lines
+29
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. File-list diff baseline is hardcoded to Using only 🤖 Prompt for AI Agents |
||
|
|
||
| 4. Ask the user: **Is there a spec or story file that provides context for these changes?** | ||
| - If yes: set `{spec_file}` to the path provided, verify the file exists and is readable, then set `{review_mode}` = `"full"`. | ||
| - If no: set `{review_mode}` = `"no-spec"`. | ||
|
|
||
| 5. If `{review_mode}` = `"full"` and the file at `{spec_file}` has a `context` field in its frontmatter listing additional docs, load each referenced document. Warn the user about any docs that cannot be found. | ||
|
|
||
|
Comment on lines
+33
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Path intake lacks scope guard and can exfiltrate unrelated local files. Line 33 and Line 36 accept arbitrary paths for Proposed fix- - If yes: set `{spec_file}` to the path provided, verify the file exists and is readable, then set `{review_mode}` = `"full"`.
+ - If yes: set `{spec_file}` to the path provided, verify the file exists, is readable, and resolves inside the current workspace/repository root; otherwise HALT and request a valid in-repo path. Then set `{review_mode}` = `"full"`.
-4. If `{review_mode}` = `"full"` and the file at `{spec_file}` has a `context` field in its frontmatter listing additional docs, load each referenced document. Warn the user about any docs that cannot be found.
+4. If `{review_mode}` = `"full"` and the file at `{spec_file}` has a `context` field in its frontmatter listing additional docs, load each referenced document only if it resolves inside the workspace/repository root. Warn on missing or out-of-scope paths.🧰 Tools🪛 LanguageTool[style] ~36-~36: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE) 🤖 Prompt for AI Agents |
||
| 6. Sanity check: if `{diff_output}` exceeds approximately 3000 lines, warn the user and offer to chunk the review by file group. | ||
| - If the user opts to chunk: agree on the first group, narrow `{diff_output}` accordingly, and list the remaining groups for the user to note for follow-up runs. | ||
| - If the user declines: proceed as-is with the full diff. | ||
|
|
||
| ### CHECKPOINT | ||
|
|
||
| Present a summary before proceeding: diff stats (files changed, lines added/removed), `{review_mode}`, and loaded spec/context docs (if any). HALT and wait for user confirmation to proceed. | ||
|
|
||
|
|
||
| ## NEXT | ||
|
|
||
| Read fully and follow `./step-02-review.md` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| --- | ||
| name: Review | ||
| description: 'Launch parallel adversarial review layers and collect findings.' | ||
| --- | ||
|
|
||
| # Step 2: Review | ||
|
|
||
| ## RULES | ||
|
|
||
| - YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}` | ||
| - The Blind Hunter subagent receives NO project context — diff only. | ||
| - The Edge Case Hunter subagent receives diff and project read access. | ||
| - The Acceptance Auditor subagent receives diff, spec, and context docs. | ||
|
|
||
| ## INSTRUCTIONS | ||
|
|
||
| 1. Launch parallel subagents. Each subagent gets NO conversation history from this session: | ||
|
|
||
| - **Blind Hunter** -- Invoke the `bmad-review-adversarial-general` skill in a subagent. Pass `content` = `{diff_output}` only. No spec, no project access. | ||
|
|
||
| - **Edge Case Hunter** -- Invoke the `bmad-review-edge-case-hunter` skill in a subagent. Pass `content` = `{diff_output}`. This subagent has read access to the project. | ||
|
|
||
| - **Acceptance Auditor** (only if `{review_mode}` = `"full"`) -- A subagent that receives `{diff_output}`, the content of the file at `{spec_file}`, and any loaded context docs. Its prompt: | ||
| > You are an Acceptance Auditor. Review this diff against the spec and context docs. Check for: violations of acceptance criteria, deviations from spec intent, missing implementation of specified behavior, contradictions between spec constraints and actual code. Output findings as a markdown list. Each finding: one-line title, which AC/constraint it violates, and evidence from the diff. | ||
|
Comment on lines
+17
to
+24
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Parallel execution has no timeout budget or cancellation rule. This step can hang indefinitely waiting for a slow subagent, blocking the pipeline without a defined recovery path. 🧰 Tools🪛 LanguageTool[uncategorized] ~24-~24: Did you mean the formatting language “Markdown” (= proper noun)? (MARKDOWN_NNP) 🤖 Prompt for AI Agents |
||
|
|
||
| 2. **Subagent failure handling**: If any subagent fails, times out, or returns empty results, note the failed layer and proceed with findings from the remaining layers. Report the failure to the user in the next step. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Failure handling incorrectly treats empty findings as an error. A subagent can legitimately return zero findings. Mark only timeout/error/unparseable output as failure. 🤖 Prompt for AI Agents |
||
|
|
||
| 3. **Fallback** (if subagents are not available): Generate prompt files in `{implementation_artifacts}` -- one per active reviewer: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should these just be 3 separate skills if this is the fallback? what i would suggest when we later rework the skill is have arguments and paths internally to run the different review types - or we can create a bunch of different review skills. can design this better later - for now though, hmm not sure if this is good UX vs just accepting that its a crap tool and try to just run them 1 at a time... |
||
| - `review-blind-hunter.md` (always) | ||
| - `review-edge-case-hunter.md` (always) | ||
| - `review-acceptance-auditor.md` (only if `{review_mode}` = `"full"`) | ||
|
|
||
| HALT. Tell the user to run each prompt in a separate session and paste back findings. When findings are pasted, resume from this point and proceed to step 3. | ||
|
|
||
| 4. Collect all findings from the completed layers. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Raw layer outputs are not explicitly preserved before aggregation. Line 35 jumps to “collect all findings” but does not require preserving raw per-layer payloads. That removes forensic traceability when triage reclassifies or drops findings later. 🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| ## NEXT | ||
|
|
||
| Read fully and follow `./step-03-triage.md` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| --- | ||
| name: Triage | ||
| description: 'Normalize, deduplicate, and classify all review findings into actionable categories.' | ||
| --- | ||
|
|
||
| # Step 3: Triage | ||
|
|
||
| ## RULES | ||
|
|
||
| - YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}` | ||
| - Be precise. When uncertain between categories, prefer the more conservative classification. | ||
|
|
||
| ## INSTRUCTIONS | ||
|
|
||
| 1. **Normalize** findings into a common format. Expected input formats: | ||
| - Adversarial (Blind Hunter): markdown list of descriptions | ||
| - Edge Case Hunter: JSON array with `location`, `trigger_condition`, `guard_snippet`, `potential_consequence` fields | ||
| - Acceptance Auditor: markdown list with title, AC/constraint reference, and evidence | ||
|
|
||
| If a layer's output does not match its expected format, attempt best-effort parsing. Note any parsing issues for the user. | ||
|
|
||
| Convert all to a unified list where each finding has: | ||
| - `id` -- sequential integer | ||
| - `source` -- `blind`, `edge`, `auditor`, or merged sources (e.g., `blind+edge`) | ||
| - `title` -- one-line summary | ||
| - `detail` -- full description | ||
| - `location` -- file and line reference (if available) | ||
|
|
||
| 2. **Deduplicate.** If two or more findings describe the same issue, merge them into one: | ||
| - Use the most specific finding as the base (prefer edge-case JSON with location over adversarial prose). | ||
| - Append any unique detail, reasoning, or location references from the other finding(s) into the surviving `detail` field. | ||
| - Set `source` to the merged sources (e.g., `blind+edge`). | ||
|
|
||
| 3. **Classify** each finding into exactly one bucket: | ||
| - **intent_gap** -- The spec/intent is incomplete; cannot resolve from existing information. Only possible if `{review_mode}` = `"full"`. | ||
| - **bad_spec** -- The spec should have prevented this; spec is wrong or ambiguous. Only possible if `{review_mode}` = `"full"`. | ||
| - **patch** -- Code issue that is trivially fixable without human input. Just needs a code change. | ||
| - **defer** -- Pre-existing issue not caused by the current change. Real but not actionable now. | ||
| - **reject** -- Noise, false positive, or handled elsewhere. | ||
|
|
||
| If `{review_mode}` = `"no-spec"` and a finding would otherwise be `intent_gap` or `bad_spec`, reclassify it as `patch` (if code-fixable) or `defer` (if not). | ||
|
|
||
| 4. **Drop** all `reject` findings. Record the reject count for the summary. | ||
|
|
||
| 5. If zero findings remain after dropping rejects, note clean review. | ||
|
|
||
| 6. If any review layer failed or returned empty (noted in step 2), report this to the user now. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Step reference is incorrect and will mislead execution. Line 44 says failures were “noted in step 2”, but failure/format tracking is introduced in normalization flow (step 1). This creates operator confusion during handoff. 🤖 Prompt for AI Agents
Comment on lines
+45
to
+47
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. “Clean review” can be reported even when coverage was partial. Line 45 allows a clean result after rejects are dropped, but Line 47 may also report failed/empty layers. That combination is contradictory and can be misread as a full-pass review. 🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| ## NEXT | ||
|
|
||
| Read fully and follow `./step-04-present.md` | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,40 @@ | ||||||||||||||
| --- | ||||||||||||||
| name: Present | ||||||||||||||
| description: 'Present triaged findings grouped by category with actionable recommendations.' | ||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| # Step 4: Present | ||||||||||||||
|
|
||||||||||||||
| ## RULES | ||||||||||||||
|
|
||||||||||||||
| - YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}` | ||||||||||||||
| - Do NOT auto-fix anything. Present findings and let the user decide next steps. | ||||||||||||||
|
|
||||||||||||||
| ## INSTRUCTIONS | ||||||||||||||
|
|
||||||||||||||
| 1. Group remaining findings by category. | ||||||||||||||
|
|
||||||||||||||
| 2. Present to the user in this order (include a section only if findings exist in that category): | ||||||||||||||
|
|
||||||||||||||
| - **Intent Gaps**: "These findings suggest the captured intent is incomplete. Consider clarifying intent before proceeding." | ||||||||||||||
| - List each with title + detail. | ||||||||||||||
|
|
||||||||||||||
| - **Bad Spec**: "These findings suggest the spec should be amended. Consider regenerating or amending the spec with this context:" | ||||||||||||||
| - List each with title + detail + suggested spec amendment. | ||||||||||||||
|
|
||||||||||||||
| - **Patch**: "These are fixable code issues:" | ||||||||||||||
| - List each with title + detail + location (if available). | ||||||||||||||
|
|
||||||||||||||
|
Comment on lines
+25
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Patch findings are allowed without a location, which makes them non-actionable. For Proposed tightening- - **Patch**: "These are fixable code issues:"
- - List each with title + detail + location (if available).
+ - **Patch**: "These are fixable code issues:"
+ - List each with title + detail + location.
+ - If exact line is unavailable, require at least file + symbol/function/block reference.📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||
| - **Defer**: "Pre-existing issues surfaced by this review (not caused by current changes):" | ||||||||||||||
| - List each with title + detail. | ||||||||||||||
|
|
||||||||||||||
| 3. Summary line: **X** intent_gap, **Y** bad_spec, **Z** patch, **W** defer findings. **R** findings rejected as noise. | ||||||||||||||
|
|
||||||||||||||
| 4. If clean review (zero findings across all layers after triage): state that N findings were raised but all were classified as noise, or that no findings were raised at all (as applicable). | ||||||||||||||
|
|
||||||||||||||
| 5. Offer the user next steps (recommendations, not automated actions): | ||||||||||||||
| - If `patch` findings exist: "You can ask me to apply these patches, or address them manually." | ||||||||||||||
| - If `intent_gap` or `bad_spec` findings exist: "Consider running the planning workflow to clarify intent or amend the spec before continuing." | ||||||||||||||
| - If only `defer` findings remain: "No action needed for this change. Deferred items are noted for future attention." | ||||||||||||||
|
|
||||||||||||||
| Workflow complete. | ||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec_fileis described as a “path or empty”, but later instructions treat{spec_file}as loaded content (parse its frontmattercontext, and pass “{spec_file}content” to the Acceptance Auditor). Consider clarifying whether this variable is meant to store the path or the file contents to avoid the auditor/context-loading steps running with the wrong input.Severity: medium
Other Locations
src/bmm/workflows/4-implementation/bmad-code-review/steps/step-02-review.md:23🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.