-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
feat: add optional also_consider input to adversarial review task #1371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
bmadcode
merged 2 commits into
bmad-code-org:main
from
alexeyv:feat/adversarial-review-also-consider
Jan 23, 2026
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| # Adversarial Review Test Suite | ||
|
|
||
| Tests for the `also_consider` optional input in `review-adversarial-general.xml`. | ||
|
|
||
| ## Purpose | ||
|
|
||
| Evaluate whether the `also_consider` input gently nudges the reviewer toward specific areas without overriding normal adversarial analysis. | ||
|
|
||
| ## Test Content | ||
|
|
||
| All tests use `sample-content.md` - a deliberately imperfect User Authentication API doc with: | ||
|
|
||
| - Vague error handling section | ||
| - Missing rate limit details | ||
| - No token expiration info | ||
| - Password in plain text example | ||
| - Missing authentication headers | ||
| - No error response examples | ||
|
|
||
| ## Running Tests | ||
|
|
||
| For each test case in `test-cases.yaml`, invoke the adversarial review task. | ||
|
|
||
| ### Manual Test Invocation | ||
|
|
||
| ``` | ||
| Review this content using the adversarial review task: | ||
|
|
||
| <content> | ||
| [paste sample-content.md] | ||
| </content> | ||
|
|
||
| <also_consider> | ||
| [paste items from test case, or omit for TC01] | ||
| </also_consider> | ||
| ``` | ||
|
|
||
| ## Evaluation Criteria | ||
|
|
||
| For each test, note: | ||
|
|
||
| 1. **Total findings** - Still hitting ~10 issues? | ||
| 2. **Distribution** - Are findings spread across concerns or clustered? | ||
| 3. **Relevance** - Do findings relate to `also_consider` items when provided? | ||
| 4. **Balance** - Are `also_consider` findings elevated over others, or naturally mixed? | ||
| 5. **Quality** - Are findings actionable regardless of source? | ||
|
|
||
| ## Expected Outcomes | ||
|
|
||
| - **TC01 (baseline)**: Generic spread of findings | ||
| - **TC02-TC05 (domain-focused)**: Some findings align with domain, others still organic | ||
| - **TC06 (single item)**: Light influence, not dominant | ||
| - **TC07 (vague items)**: Minimal change from baseline | ||
| - **TC08 (specific items)**: Direct answers if gaps exist | ||
| - **TC09 (mixed)**: Balanced across domains | ||
| - **TC10 (contradictory)**: Graceful handling |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # User Authentication API | ||
|
|
||
| ## Overview | ||
|
|
||
| This API provides endpoints for user authentication and session management. | ||
|
|
||
| ## Endpoints | ||
|
|
||
| ### POST /api/auth/login | ||
|
|
||
| Authenticates a user and returns a token. | ||
|
|
||
| **Request Body:** | ||
| ```json | ||
| { | ||
| "email": "user@example.com", | ||
| "password": "password123" | ||
| } | ||
| ``` | ||
|
|
||
| **Response:** | ||
| ```json | ||
| { | ||
| "token": "eyJhbGciOiJIUzI1NiIs...", | ||
| "user": { | ||
| "id": 1, | ||
| "email": "user@example.com" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### POST /api/auth/logout | ||
|
|
||
| Logs out the current user. | ||
|
|
||
| ### GET /api/auth/me | ||
|
|
||
| Returns the current user's profile. | ||
|
|
||
| ## Error Handling | ||
|
|
||
| Errors return appropriate HTTP status codes. | ||
|
|
||
| ## Rate Limiting | ||
|
|
||
| Rate limiting is applied to prevent abuse. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| # Test Cases for review-adversarial-general.xml with also_consider input | ||
| # | ||
| # Purpose: Evaluate how the optional also_consider input influences review findings | ||
| # Content: All tests use sample-content.md (User Authentication API docs) | ||
| # | ||
| # To run: Manually invoke the task with each configuration and compare outputs | ||
|
|
||
| test_cases: | ||
| # BASELINE - No also_consider | ||
| - id: TC01 | ||
| name: "Baseline - no also_consider" | ||
| description: "Control test with no also_consider input" | ||
| also_consider: null | ||
| expected_behavior: "Generic adversarial findings across all aspects" | ||
|
|
||
| # DOCUMENTATION-FOCUSED | ||
| - id: TC02 | ||
| name: "Documentation - reader confusion" | ||
| description: "Nudge toward documentation UX issues" | ||
| also_consider: | ||
| - What would confuse a first-time reader? | ||
| - What questions are left unanswered? | ||
| - What could be interpreted multiple ways? | ||
| - What jargon is unexplained? | ||
| expected_behavior: "More findings about clarity, completeness, reader experience" | ||
|
|
||
| - id: TC03 | ||
| name: "Documentation - examples and usage" | ||
| description: "Nudge toward practical usage gaps" | ||
| also_consider: | ||
| - Missing code examples | ||
| - Unclear usage patterns | ||
| - Edge cases not documented | ||
| expected_behavior: "More findings about practical application gaps" | ||
|
|
||
| # SECURITY-FOCUSED | ||
| - id: TC04 | ||
| name: "Security review" | ||
| description: "Nudge toward security concerns" | ||
| also_consider: | ||
| - Authentication vulnerabilities | ||
| - Token handling issues | ||
| - Input validation gaps | ||
| - Information disclosure risks | ||
| expected_behavior: "More security-related findings" | ||
|
|
||
| # API DESIGN-FOCUSED | ||
| - id: TC05 | ||
| name: "API design" | ||
| description: "Nudge toward API design best practices" | ||
| also_consider: | ||
| - REST conventions not followed | ||
| - Inconsistent response formats | ||
| - Missing pagination or filtering | ||
| - Versioning concerns | ||
| expected_behavior: "More API design pattern findings" | ||
|
|
||
| # SINGLE ITEM | ||
| - id: TC06 | ||
| name: "Single item - error handling" | ||
| description: "Test with just one also_consider item" | ||
| also_consider: | ||
| - Error handling completeness | ||
| expected_behavior: "Some emphasis on error handling while still covering other areas" | ||
|
|
||
| # BROAD/VAGUE | ||
| - id: TC07 | ||
| name: "Broad items" | ||
| description: "Test with vague also_consider items" | ||
| also_consider: | ||
| - Quality issues | ||
| - Things that seem off | ||
| expected_behavior: "Minimal change from baseline - items too vague to steer" | ||
|
|
||
| # VERY SPECIFIC | ||
| - id: TC08 | ||
| name: "Very specific items" | ||
| description: "Test with highly specific also_consider items" | ||
| also_consider: | ||
| - Is the JWT token expiration documented? | ||
| - Are refresh token mechanics explained? | ||
| - What happens on concurrent sessions? | ||
| expected_behavior: "Specific findings addressing these exact questions if gaps exist" | ||
|
|
||
| # MIXED DOMAINS | ||
| - id: TC09 | ||
| name: "Mixed domain concerns" | ||
| description: "Test with items from different domains" | ||
| also_consider: | ||
| - Security vulnerabilities | ||
| - Reader confusion points | ||
| - API design inconsistencies | ||
| - Performance implications | ||
| expected_behavior: "Balanced findings across multiple domains" | ||
|
|
||
| # CONTRADICTORY/UNUSUAL | ||
| - id: TC10 | ||
| name: "Contradictory items" | ||
| description: "Test resilience with odd inputs" | ||
| also_consider: | ||
| - Things that are too detailed | ||
| - Things that are not detailed enough | ||
| expected_behavior: "Reviewer handles gracefully, finds issues in both directions" |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the JWT-like token to avoid secret-scanner hits.
Gitleaks flagged the token string as a generic API key. Even as sample content, this can fail CI or encourage unsafe copy‑paste. Replace it with an obvious placeholder or redacted pattern.
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 24-24: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents