feat: add test-quality-sentinel agentic workflow#25353
Conversation
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Create test-quality-sentinel workflow that analyzes test quality beyond code coverage percentages on every PR. The workflow: - Triggers on pull_request (opened, synchronize, ready_for_review) - Detects implementation-detail tests, happy-path-only tests, test inflation, and duplication via AST + regex analysis - AI agent answers 3 quality questions per test (design invariant, value if deleted, contract vs. implementation) - Posts Test Quality Score (0-100) comment with per-test feedback - Distinguishes design tests vs. implementation tests - Fails check (REQUEST_CHANGES review) if >30% of new tests are low-value - Supports Python (pytest) and Rust (#[test]) as primary languages - Add .gitignore exception since workflow name starts with 'test-' Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Clarify Step 1: mark Python/Rust as 'analyzed', Go/JS as 'detected but not scored' - Remove Go-only test extraction command from Step 2 - Replace Go mock libraries (mockery/testify/mock) with Rust equivalents (mockall) - Clarify inflation penalty formula as binary (deduct all 10 pts if ANY file exceeds 2:1) - Remove Go-specific t.Errorf reference from calibration guidelines - Add explicit sampling instructions (Step 2 + PR comment note) for the 50-test limit - Scope test inflation mapping to Python and Rust only Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot this codebase operates on Go and JavaScript |
The codebase operates on Go and JavaScript, not Python and Rust.
- Replace Python (pytest) and Rust (#[test]) with Go (*_test.go) and
JavaScript/TypeScript (*.test.*, *.spec.*) as primary analysis targets
- Remove python3/rustfmt bash tool allowances; add node
- Rewrite Step 3 with Go awk analysis (testify/gomock signals, t.Run
table-driven test credit) and JavaScript awk analysis (jest/sinon)
- Update Step 2 diff commands to target _test.go and *.test.{js,ts,cjs}
- Update Red Flags section: reference jest.mock/jest.spyOn and gomock
- Update Step 5 inflation mapping: foo_test.go→foo.go, foo.test.ts→foo.ts
- Update PR comment language section: Go/JS counts
- Update Guidelines: language support, calibration (assert.Error, toThrow,
t.Run table-driven test credit), Analysis Scope
- Fix illustrative example rows in comment template to use Go paths
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/180da1af-6c69-40ea-bcf1-4a3424cfeba7
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Updated in commit |
|
@copilot review the coding guidelines and habits of this report to update the agentic workflows |
Reviewed actual test files in pkg/ and actions/setup/js/ to update the workflow with accurate patterns: Go tests: - Both t.Errorf/t.Fatalf (stdlib) and assert./require. (testify) detected - Track t.Run() for table-driven tests (the preferred pattern) - Track forbidden_mocks (gomock/testify/mock/.EXPECT()/.On()) as hard red flag - Add build tag check: every *_test.go must start with //go:build !integration or //go:build integration - No mock libraries used in Go (project guideline: "no mocks or test suites") JavaScript tests: - Test framework is vitest, NOT jest: vi.fn(), vi.spyOn(), vi.mock() - Primary test extension is .test.cjs (245 files), not .spec.* or .test.ts - scripts/ uses .test.js; drop all .spec.* and .test.ts references - Mocking external I/O (fs, process.stderr, global.core) is legitimate; flag only internal business-logic mocking without behavioral assertions New/updated red flags: - Mock libraries in Go = hard coding-guideline violation (always REQUEST_CHANGES) - Missing //go:build tag on new Go test file = hard violation - No assertion messages in Go tests (guideline: always include context) Scoring / Failure Condition: - Fail on: >30% implementation tests OR any coding-guideline violation - Step 8: three distinct review outcomes (pass, quality-fail, guideline-fail) - PR comment table: add 🚨 Coding-guideline violations row Agent-Logs-Url: https://github.com/github/gh-aw/sessions/a5773904-75eb-4e0c-9ebe-a61f6ffd5e14 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Updated in commit Go patterns (from
JavaScript patterns (from
Scoring / review logic:
|
| description: Analyzes test quality beyond code coverage percentages on every PR, detecting implementation-detail tests, happy-path-only tests, test inflation, and duplication | ||
| on: | ||
| pull_request: | ||
| types: [opened, synchronize, ready_for_review] |
There was a problem hiding this comment.
Done in 9901039 — trigger is now ready_for_review only.
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/92c32c4a-f3e8-4b6a-a0ac-67642f8faa8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new agentic workflow (test-quality-sentinel) intended to assess the quality of newly added/modified tests (Go + JavaScript) on ready_for_review PRs, and adjusts ignore rules so the workflow artifacts are committed.
Changes:
- Added
Test Quality Sentinelworkflow definition (.md) and generated lockfile (.lock.yml). - Updated
.gitignoreto stop ignoring the new workflow artifacts. - Updated
architecture-guardian.lock.ymlto use the same Copilot driver invocation pattern.
Show a summary per file
| File | Description |
|---|---|
.gitignore |
Adds negation rules so test-quality-sentinel workflow files aren’t ignored by test-* globs. |
.github/workflows/test-quality-sentinel.md |
New workflow prompt/instructions for analyzing changed test diffs and producing a PR comment + review. |
.github/workflows/test-quality-sentinel.lock.yml |
Compiled workflow output for the new workflow. |
.github/workflows/architecture-guardian.lock.yml |
Updates Copilot CLI execution command to run via copilot_driver.cjs. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/workflows/test-quality-sentinel.md:161
- The vitest awk scanner for JS tests is unlikely to work on real diffs because it (1) requires
it(/test(to be flush-left after+(no indentation), and (2) ends a test block only on a line matching+}), which won’t match the common+ });form (semicolon + indentation). This can cause missing/incorrect per-test counts. Loosen the regexes to allow whitespace and optional;on the closing line.
git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | awk '
/^\+(it|test)\(/ {
if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks
match($0, /(it|test)\(["'"'"']([^"'"'"']+)/, arr); test_name=arr[2]; assertions=0; errors=0; mocks=0
}
test_name && /^\+.*expect\(/ { assertions++ }
test_name && /^\+.*(\.toThrow|\.rejects|\.toThrowError)/ { errors++ }
test_name && /^\+.*(vi\.mock|vi\.spyOn|vi\.fn)/ { mocks++ }
test_name && /^\+\}\)/ { print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks; test_name="" }
END { if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks }
- Files reviewed: 3/4 changed files
- Comments generated: 2
| git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*_test.go' | grep -E "^\+func Test" | ||
|
|
||
| # For JavaScript (.test.cjs is the primary format; .test.js used in scripts/) | ||
| git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+(it|test|describe)\(" |
There was a problem hiding this comment.
The JS test extraction grep only matches lines that start with +(it|test|describe)( immediately after the +. In typical diffs, test blocks are indented (e.g. + it(...)), so this will miss most JS tests. Consider allowing leading whitespace after + (e.g. ^\+\s*(it|test|describe)\().
This issue also appears on line 152 of the same file.
| git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+(it|test|describe)\(" | |
| git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+\s*(it|test|describe)\(" |
| test_name && /^\+.*(gomock\.|testify\/mock|\.EXPECT\(\)|\.On\(|\.Return\()/ { forbidden_mocks++ } | ||
| test_name && /^\+\}$/ { print test_name, "assertions=" assertions, "errors=" errors, "table_driven=" table_driven, "forbidden_mocks=" forbidden_mocks; test_name="" } | ||
| END { if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "table_driven=" table_driven, "forbidden_mocks=" forbidden_mocks } | ||
| ' | ||
| ``` | ||
|
|
||
| Key signals for Go tests in this codebase: | ||
| - **Assertions (accepted forms)**: | ||
| - testify: `assert.Equal`, `assert.NoError`, `assert.Error`, `require.Equal`, `require.NoError`, etc. | ||
| - stdlib: `t.Errorf(...)`, `t.Fatalf(...)`, `t.Error(...)` | ||
| - **Error coverage**: `assert.Error` / `require.Error`, `assert.NoError` / `require.NoError`, `t.Fatalf` / `t.Errorf` explicitly checking error returns | ||
| - **Table-driven tests**: `t.Run()` over a `tests []struct{...}` slice — the preferred pattern in this codebase; a single table-driven test covers all its sub-cases, so credit all included error / edge-case rows | ||
| - **Assertion messages**: guidelines require a descriptive message argument on every assertion call — e.g. `assert.Equal(t, expected, actual, "descriptive context")` not bare `assert.Equal(t, expected, actual)` | ||
| - **Forbidden**: any use of `gomock`, `testify/mock`, `.EXPECT()`, `.On()`, `.Return()` in Go tests violates the project's "no mocks" guideline; flag immediately | ||
|
|
There was a problem hiding this comment.
The forbidden Go mock detection treats any .Return( as a hard-violation signal (forbidden_mocks++) and documents .Return() as forbidden, but the PR description’s forbidden list is gomock, testify/mock, .EXPECT(), .On() (no .Return()). As written, .Return( could also match unrelated fluent APIs and create false positives. Recommend aligning the forbidden list with the PR description and/or only counting .Return( when it’s clearly part of a mock chain (e.g., alongside gomock/.EXPECT()), rather than matching any .Return( occurrence.
Add entries to the examples documentation for the five new workflows merged on 2026-04-08: - Test Quality Sentinel (#25353): PR-triggered test quality analysis - Approach Validator (#25354): label-triggered multi-agent approach review - Design Decision Gate (#25323): ADR enforcement before implementation merges - Refactoring Cadence (#25352): daily code health tracking with auto-issue - Architecture Guardian (#25334): daily code structure violation detection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
High test counts create false safety — 500+ tests feel reassuring until vibe-coding reveals the design was wrong. This workflow analyzes what tests actually enforce, not just that they exist.
Workflow:
test-quality-sentinelTriggers on
ready_for_reviewPRs. Analyzes new/modified Go (*_test.go) and JavaScript (*.test.cjs,*.test.js) tests; detects but does not score Python/Rust tests.Analysis pipeline
Step 1–2: Identifies changed test files; checks new Go test files for mandatory
//go:build !integration///go:build integrationbuild tags; extracts up to 50 new/modified test functions (newly added prioritized over modified; sampling noted in PR comment)Step 3: Structural analysis via
awkgrep patterns on the diff:t.Errorf/t.Fatalf) and testify (assert.*/require.*) assertions,t.Run()table-driven tests, and forbidden mock library calls (gomock,testify/mock,.EXPECT(),.On())expect()assertions,.toThrow()/.rejectserror paths, and vitest mocking (vi.mock,vi.spyOn,vi.fn)Step 4: AI reviews each test against 3 questions:
Step 5: Test inflation check — flags files where new test lines > 2× new production lines (
foo_test.go → foo.go,foo.test.cjs → foo.cjs)Step 6: Scores 0–100 across four weighted dimensions:
Step 7: Posts PR comment with score, per-test classification table, flagged tests with improvement suggestions, and a 🚨 Coding-guideline violations row
Step 8: Submits
REQUEST_CHANGESreview if >30% of new tests are implementation tests or if any coding-guideline violation is detected;APPROVEotherwiseRed flags detected
fs,process.stderr,global.core) are exemptgomock,testify/mock,.EXPECT(),.On()— project guideline is "no mocks or test suites; test real component interactions"//go:buildtag on new Go test file (hard violation): every*_test.gomust start with//go:build !integrationor//go:build integration.gitignorechangeAdded negation exception for
test-quality-sentinel.*— the existingtest-*glob pattern was treating this production workflow as a temporary test artifact.