feat: add test-quality-sentinel agentic workflow by Copilot · Pull Request #25353 · github/gh-aw

Copilot · 2026-04-08T17:36:34Z

High test counts create false safety — 500+ tests feel reassuring until vibe-coding reveals the design was wrong. This workflow analyzes what tests actually enforce, not just that they exist.

Workflow: `test-quality-sentinel`

Triggers on ready_for_review PRs. Analyzes new/modified Go (*_test.go) and JavaScript (*.test.cjs, *.test.js) tests; detects but does not score Python/Rust tests.

Analysis pipeline

Step 1–2: Identifies changed test files; checks new Go test files for mandatory //go:build !integration / //go:build integration build tags; extracts up to 50 new/modified test functions (newly added prioritized over modified; sampling noted in PR comment)
Step 3: Structural analysis via awk grep patterns on the diff:
- Go: detects both stdlib (t.Errorf/t.Fatalf) and testify (assert.*/require.*) assertions, t.Run() table-driven tests, and forbidden mock library calls (gomock, testify/mock, .EXPECT(), .On())
- JavaScript: detects expect() assertions, .toThrow()/.rejects error paths, and vitest mocking (vi.mock, vi.spyOn, vi.fn)
Step 4: AI reviews each test against 3 questions:
- What design invariant does this test enforce?
- What would break if this test were deleted?
- Behavioral contract or implementation detail?
Step 5: Test inflation check — flags files where new test lines > 2× new production lines (foo_test.go → foo.go, foo.test.cjs → foo.cjs)
Step 6: Scores 0–100 across four weighted dimensions:

Dimension Weight

Behavioral coverage (% design tests) 40 pts

Error/edge case coverage 30 pts

Low duplication 20 pts

Proportional growth (no inflation) 10 pts
Step 7: Posts PR comment with score, per-test classification table, flagged tests with improvement suggestions, and a 🚨 Coding-guideline violations row
Step 8: Submits REQUEST_CHANGES review if >30% of new tests are implementation tests or if any coding-guideline violation is detected; APPROVE otherwise

Red flags detected

Mock-heavy (vitest) with no observable-behavior assertion — external I/O mocks (fs, process.stderr, global.core) are exempt
Go mock libraries used (hard violation): gomock, testify/mock, .EXPECT(), .On() — project guideline is "no mocks or test suites; test real component interactions"
Missing //go:build tag on new Go test file (hard violation): every *_test.go must start with //go:build !integration or //go:build integration
Happy-path only (no error/edge cases)
Test inflation ratio >2:1 (test lines vs. production lines added)
Duplicated assertion patterns across 3+ test functions
Zero assertions in test body
Go assertions missing descriptive message arguments (guideline requires them)

`.gitignore` change

Added negation exception for test-quality-sentinel.* — the existing test-* glob pattern was treating this production workflow as a temporary test artifact.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Create test-quality-sentinel workflow that analyzes test quality beyond code coverage percentages on every PR. The workflow: - Triggers on pull_request (opened, synchronize, ready_for_review) - Detects implementation-detail tests, happy-path-only tests, test inflation, and duplication via AST + regex analysis - AI agent answers 3 quality questions per test (design invariant, value if deleted, contract vs. implementation) - Posts Test Quality Score (0-100) comment with per-test feedback - Distinguishes design tests vs. implementation tests - Fails check (REQUEST_CHANGES review) if >30% of new tests are low-value - Supports Python (pytest) and Rust (#[test]) as primary languages - Add .gitignore exception since workflow name starts with 'test-' Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

- Clarify Step 1: mark Python/Rust as 'analyzed', Go/JS as 'detected but not scored' - Remove Go-only test extraction command from Step 2 - Replace Go mock libraries (mockery/testify/mock) with Rust equivalents (mockall) - Clarify inflation penalty formula as binary (deduct all 10 pts if ANY file exceeds 2:1) - Remove Go-specific t.Errorf reference from calibration guidelines - Add explicit sampling instructions (Step 2 + PR comment note) for the 50-test limit - Scope test inflation mapping to Python and Rust only Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-04-08T18:10:20Z

@copilot this codebase operates on Go and JavaScript

The codebase operates on Go and JavaScript, not Python and Rust. - Replace Python (pytest) and Rust (#[test]) with Go (*_test.go) and JavaScript/TypeScript (*.test.*, *.spec.*) as primary analysis targets - Remove python3/rustfmt bash tool allowances; add node - Rewrite Step 3 with Go awk analysis (testify/gomock signals, t.Run table-driven test credit) and JavaScript awk analysis (jest/sinon) - Update Step 2 diff commands to target _test.go and *.test.{js,ts,cjs} - Update Red Flags section: reference jest.mock/jest.spyOn and gomock - Update Step 5 inflation mapping: foo_test.go→foo.go, foo.test.ts→foo.ts - Update PR comment language section: Go/JS counts - Update Guidelines: language support, calibration (assert.Error, toThrow, t.Run table-driven test credit), Analysis Scope - Fix illustrative example rows in comment template to use Go paths Agent-Logs-Url: https://github.com/github/gh-aw/sessions/180da1af-6c69-40ea-bcf1-4a3424cfeba7 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-08T18:19:56Z

@copilot this codebase operates on Go and JavaScript

Updated in commit 2674b42. The primary analysis targets are now Go (*_test.go — using testify/gomock signal detection, with credit for table-driven t.Run tests) and JavaScript/TypeScript (*.test.*, *.spec.* — using jest/sinon signal detection). Python and Rust are now detected-only (noted in the report but not scored).

pelikhan · 2026-04-08T18:23:44Z

@copilot review the coding guidelines and habits of this report to update the agentic workflows

Reviewed actual test files in pkg/ and actions/setup/js/ to update the workflow with accurate patterns: Go tests: - Both t.Errorf/t.Fatalf (stdlib) and assert./require. (testify) detected - Track t.Run() for table-driven tests (the preferred pattern) - Track forbidden_mocks (gomock/testify/mock/.EXPECT()/.On()) as hard red flag - Add build tag check: every *_test.go must start with //go:build !integration or //go:build integration - No mock libraries used in Go (project guideline: "no mocks or test suites") JavaScript tests: - Test framework is vitest, NOT jest: vi.fn(), vi.spyOn(), vi.mock() - Primary test extension is .test.cjs (245 files), not .spec.* or .test.ts - scripts/ uses .test.js; drop all .spec.* and .test.ts references - Mocking external I/O (fs, process.stderr, global.core) is legitimate; flag only internal business-logic mocking without behavioral assertions New/updated red flags: - Mock libraries in Go = hard coding-guideline violation (always REQUEST_CHANGES) - Missing //go:build tag on new Go test file = hard violation - No assertion messages in Go tests (guideline: always include context) Scoring / Failure Condition: - Fail on: >30% implementation tests OR any coding-guideline violation - Step 8: three distinct review outcomes (pass, quality-fail, guideline-fail) - PR comment table: add 🚨 Coding-guideline violations row Agent-Logs-Url: https://github.com/github/gh-aw/sessions/a5773904-75eb-4e0c-9ebe-a61f6ffd5e14 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-08T18:44:06Z

@copilot review the coding guidelines and habits of this report to update the agentic workflows

Updated in commit 167a9ef. Changes based on reviewing the actual test files and coding guidelines:

Go patterns (from pkg/):

Awk now tracks both stdlib (t.Errorf/t.Fatalf) and testify (assert.*/require.*) assertions — both styles are used
t.Run() table-driven tests get per-row credit (this is the preferred pattern per guidelines)
gomock/testify/mock/.EXPECT()/.On() are now a hard red flag — the project guideline is "no mocks or test suites; test real component interactions"
New: build tag check — every *_test.go must start with //go:build !integration or //go:build integration; missing tag = hard violation → REQUEST_CHANGES
New: flags assertion calls missing descriptive message arguments (guidelines require them)

JavaScript patterns (from actions/setup/js/):

vitest throughout (not jest): mock primitives updated to vi.fn(), vi.spyOn(), vi.mock()
Primary test extension is .test.cjs (245 files); .test.js for scripts; removed all .spec.*/.test.ts references
External I/O mocking (fs, process.stderr, global.core) is explicitly noted as legitimate; only flags internal business-logic mocking

Scoring / review logic:

Failure now triggers on >30% implementation tests or any coding-guideline violation (mock library in Go, missing build tag)
Three distinct REQUEST_CHANGES messages: quality fail, guideline fail

pelikhan · 2026-04-08T18:46:48Z

.github/workflows/test-quality-sentinel.md

+description: Analyzes test quality beyond code coverage percentages on every PR, detecting implementation-detail tests, happy-path-only tests, test inflation, and duplication
+on:
+  pull_request:
+    types: [opened, synchronize, ready_for_review]


@copilot remove opened, synchronize

Done in 9901039 — trigger is now ready_for_review only.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/92c32c4a-f3e8-4b6a-a0ac-67642f8faa8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Adds a new agentic workflow (test-quality-sentinel) intended to assess the quality of newly added/modified tests (Go + JavaScript) on ready_for_review PRs, and adjusts ignore rules so the workflow artifacts are committed.

Changes:

Added Test Quality Sentinel workflow definition (.md) and generated lockfile (.lock.yml).
Updated .gitignore to stop ignoring the new workflow artifacts.
Updated architecture-guardian.lock.yml to use the same Copilot driver invocation pattern.

Show a summary per file

File	Description
`.gitignore`	Adds negation rules so `test-quality-sentinel` workflow files aren’t ignored by `test-*` globs.
`.github/workflows/test-quality-sentinel.md`	New workflow prompt/instructions for analyzing changed test diffs and producing a PR comment + review.
`.github/workflows/test-quality-sentinel.lock.yml`	Compiled workflow output for the new workflow.
`.github/workflows/architecture-guardian.lock.yml`	Updates Copilot CLI execution command to run via `copilot_driver.cjs`.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/workflows/test-quality-sentinel.md:161

The vitest awk scanner for JS tests is unlikely to work on real diffs because it (1) requires it( / test( to be flush-left after + (no indentation), and (2) ends a test block only on a line matching +}), which won’t match the common + }); form (semicolon + indentation). This can cause missing/incorrect per-test counts. Loosen the regexes to allow whitespace and optional ; on the closing line.

git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | awk '
/^\+(it|test)\(/ {
  if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks
  match($0, /(it|test)\(["'"'"']([^"'"'"']+)/, arr); test_name=arr[2]; assertions=0; errors=0; mocks=0
}
test_name && /^\+.*expect\(/ { assertions++ }
test_name && /^\+.*(\.toThrow|\.rejects|\.toThrowError)/ { errors++ }
test_name && /^\+.*(vi\.mock|vi\.spyOn|vi\.fn)/ { mocks++ }
test_name && /^\+\}\)/ { print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks; test_name="" }
END { if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "mocks=" mocks }

Files reviewed: 3/4 changed files
Comments generated: 2

Copilot · 2026-04-08T19:19:30Z

.github/workflows/test-quality-sentinel.md

+git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*_test.go' | grep -E "^\+func Test"
+
+# For JavaScript (.test.cjs is the primary format; .test.js used in scripts/)
+git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+(it|test|describe)\("


The JS test extraction grep only matches lines that start with +(it|test|describe)( immediately after the +. In typical diffs, test blocks are indented (e.g. + it(...)), so this will miss most JS tests. Consider allowing leading whitespace after + (e.g. ^\+\s*(it|test|describe)\().

This issue also appears on line 152 of the same file.

Suggested change

git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+(it|test|describe)\("

git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '*.test.cjs' '*.test.js' | grep -E "^\+\s*(it|test|describe)\("

Copilot · 2026-04-08T19:19:31Z

.github/workflows/test-quality-sentinel.md

+test_name && /^\+.*(gomock\.|testify\/mock|\.EXPECT\(\)|\.On\(|\.Return\()/ { forbidden_mocks++ }
+test_name && /^\+\}$/ { print test_name, "assertions=" assertions, "errors=" errors, "table_driven=" table_driven, "forbidden_mocks=" forbidden_mocks; test_name="" }
+END { if (test_name) print test_name, "assertions=" assertions, "errors=" errors, "table_driven=" table_driven, "forbidden_mocks=" forbidden_mocks }
+'
+```
+
+Key signals for Go tests in this codebase:
+- **Assertions (accepted forms)**:
+  - testify: `assert.Equal`, `assert.NoError`, `assert.Error`, `require.Equal`, `require.NoError`, etc.
+  - stdlib: `t.Errorf(...)`, `t.Fatalf(...)`, `t.Error(...)`
+- **Error coverage**: `assert.Error` / `require.Error`, `assert.NoError` / `require.NoError`, `t.Fatalf` / `t.Errorf` explicitly checking error returns
+- **Table-driven tests**: `t.Run()` over a `tests []struct{...}` slice — the preferred pattern in this codebase; a single table-driven test covers all its sub-cases, so credit all included error / edge-case rows
+- **Assertion messages**: guidelines require a descriptive message argument on every assertion call — e.g. `assert.Equal(t, expected, actual, "descriptive context")` not bare `assert.Equal(t, expected, actual)`
+- **Forbidden**: any use of `gomock`, `testify/mock`, `.EXPECT()`, `.On()`, `.Return()` in Go tests violates the project's "no mocks" guideline; flag immediately
+


The forbidden Go mock detection treats any .Return( as a hard-violation signal (forbidden_mocks++) and documents .Return() as forbidden, but the PR description’s forbidden list is gomock, testify/mock, .EXPECT(), .On() (no .Return()). As written, .Return( could also match unrelated fluent APIs and create false positives. Recommend aligning the forbidden list with the PR description and/or only counting .Return( when it’s clearly part of a mock chain (e.g., alongside gomock/.EXPECT()), rather than matching any .Return( occurrence.

Add entries to the examples documentation for the five new workflows merged on 2026-04-08: - Test Quality Sentinel (#25353): PR-triggered test quality analysis - Approach Validator (#25354): label-triggered multi-agent approach review - Design Decision Gate (#25323): ADR enforcement before implementation merges - Refactoring Cadence (#25352): daily code health tracking with auto-issue - Architecture Guardian (#25334): daily code structure violation detection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Initial plan

0d91476

Copilot AI assigned Copilot and pelikhan Apr 8, 2026

Copilot started work on behalf of pelikhan April 8, 2026 17:36 View session

Copilot AI linked an issue Apr 8, 2026 that may be closed by this pull request

[plan] Create "test-quality-sentinel" agentic workflow #25320

Closed

9 tasks

Copilot AI and others added 3 commits April 8, 2026 17:41

Initial plan for test-quality-sentinel workflow

ec2ae4e

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1fbdf81c-aa8b-4755-923c-c922412fb447 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Create test-quality-sentinel agentic workflow~~ feat: add test-quality-sentinel agentic workflow Apr 8, 2026

Copilot AI requested a review from pelikhan April 8, 2026 17:59

Copilot finished work on behalf of pelikhan April 8, 2026 17:59

Copilot started work on behalf of pelikhan April 8, 2026 18:10 View session

Copilot finished work on behalf of pelikhan April 8, 2026 18:21

Copilot started work on behalf of pelikhan April 8, 2026 18:23 View session

github-actions bot mentioned this pull request Apr 8, 2026

[aw] No-Op Runs #25214

Open

pelikhan reviewed Apr 8, 2026

View reviewed changes

Copilot finished work on behalf of pelikhan April 8, 2026 18:47

Copilot AI requested a review from pelikhan April 8, 2026 18:47

Copilot started work on behalf of pelikhan April 8, 2026 18:47 View session

fix: trigger test-quality-sentinel on ready_for_review only

9901039

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/92c32c4a-f3e8-4b6a-a0ac-67642f8faa8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan April 8, 2026 18:54

pelikhan approved these changes Apr 8, 2026

View reviewed changes

pelikhan marked this pull request as ready for review April 8, 2026 19:14

Copilot AI review requested due to automatic review settings April 8, 2026 19:14

pelikhan merged commit cd97145 into main Apr 8, 2026

pelikhan deleted the copilot/create-test-quality-sentinel-workflow branch April 8, 2026 19:14

Copilot started reviewing on behalf of pelikhan April 8, 2026 19:15 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add test-quality-sentinel agentic workflow#25353

feat: add test-quality-sentinel agentic workflow#25353
pelikhan merged 7 commits intomainfrom
copilot/create-test-quality-sentinel-workflow

Copilot AI commented Apr 8, 2026 •

edited

Loading

Uh oh!

pelikhan commented Apr 8, 2026

Uh oh!

Copilot AI commented Apr 8, 2026

Uh oh!

pelikhan commented Apr 8, 2026

Uh oh!

Copilot AI commented Apr 8, 2026

Uh oh!

pelikhan Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dimension	Weight
Behavioral coverage (% design tests)	40 pts
Error/edge case coverage	30 pts
Low duplication	20 pts
Proportional growth (no inflation)	10 pts

	git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '.test.cjs' '.test.js' \| grep -E "^\+(it\|test\|describe)\("
	git diff ${{ github.event.pull_request.base.sha }}...HEAD -- '.test.cjs' '.test.js' \| grep -E "^\+\s*(it\|test\|describe)\("

Conversation

Copilot AI commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Workflow: test-quality-sentinel

Analysis pipeline

Red flags detected

.gitignore change

Uh oh!

pelikhan commented Apr 8, 2026

Uh oh!

Copilot AI commented Apr 8, 2026

Uh oh!

pelikhan commented Apr 8, 2026

Uh oh!

Copilot AI commented Apr 8, 2026

Uh oh!

pelikhan Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 8, 2026 •

edited

Loading

Workflow: `test-quality-sentinel`

`.gitignore` change