test: expand script coverage and add dev setup bootstrap by aryeko · Pull Request #36 · go-modkit/benchmarks

aryeko · 2026-02-07T22:22:10Z

Type

Summary

This PR expands parity and benchmark-script test coverage, refactors duplicated script helper logic, and improves local/CI setup ergonomics with a bootstrap script and Make wrappers. It also updates CI to use Make-based setup instead of inline install scripts and removes actionlint from CI per request.

Changes

Added Go parity tests in cmd/parity-test/main_test.go for:
- seedTarget disabled/missing/non-2xx paths
- runScenario status/header/invalid-JSON/default-method paths
- matcher interpolation edge cases
Added deterministic fixtures and expected outputs under tests/fixtures/.
Added Python unit tests and shell integration tests under tests/unit/ and tests/integration/.
Added test infrastructure/config:
- pytest.ini, .coveragerc, .bats-version
- Make targets: test-go, test-python, test-shell, test-scripts, test-coverage-python
Refactored shared script helpers:
- shell: scripts/lib.sh
- python: scripts/benchlib/io_utils.py
- updated consumers in benchmark scripts.
Added JSON Schema validation via jsonschema in scripts/validate-result-schemas.py.
Added policy JSON source stats-policy.json and JSON-first loading with YAML fallback compatibility.
Added scripts/setup-dev-env.sh with:
- --ci mode
- --subset install groups
- Make wrappers: setup-dev-env, setup-dev-env-ci, setup-dev-env-ci-scripts
Updated CI scripts job to use Make entrypoints (no inline install scripts).
Updated docs to reflect setup/testing/policy changes.

Follow-up fixes from review comments

Fixed goroutine-unsafe t.Fatalf usage in TestRunScenario_DefaultMethodIsGET by asserting request method on test goroutine via channel.
Fixed potential mixed-type sort crash in JSON Schema error sorting by stringifying err.path parts.
Hardened scripts/setup-dev-env.sh install helpers so failed brew/apt installs return cleanly under set -e instead of aborting unexpectedly.

Validation

go test ./...
.venv/bin/pytest tests/unit
make test-shell
PATH="$PWD/.venv/bin:$PATH" python3 scripts/generate-report.py
PATH="$PWD/.venv/bin:$PATH" make benchmark-schema-validate
PATH="$PWD/.venv/bin:$PATH" make ci-benchmark-quality-check
PATH="$PWD/.venv/bin:$PATH" make report-disclaimer-check
PATH="$PWD/.venv/bin:$PATH" make methodology-changelog-check
PATH="$PWD/.venv/bin:$PATH" make publication-sync-check
bash -n scripts/setup-dev-env.sh
bash scripts/setup-dev-env.sh --help

Checklist

Code and docs follow project conventions
Tests updated/added for behavior changes
Parity contract/fixtures updated if API behavior changed
Related design/docs updated if matcher semantics changed

Resolves

Resolves #

Summary by CodeRabbit

New Features
- Added multi-language test support (Python + shell), deterministic test fixtures, and a one-command local dev environment bootstrap.
Documentation
- Switched policy/config references to JSON with YAML fallback; expanded CONTRIBUTING and tooling docs; added fixture and test guidance; removed legacy rollout and repository-metadata guides.
Chores
- Enhanced CI and Make targets to run tests and coverage; added result/schema validation, path-safety checks, and a quality policy JSON.

coderabbitai · 2026-02-07T22:22:27Z

📝 Walkthrough

Walkthrough

Adds multi-language tests and fixtures, centralizes path/IO utilities (Bash + Python), prefers stats-policy.json with YAML fallback, expands Makefile/CI to run script tests and coverage, introduces a dev environment bootstrap script, and updates many scripts to use the shared utilities and JSON policy loader.

Changes

Cohort / File(s)	Summary
Testing infra & CI `\.bats-version`, `pytest.ini`, `.github/workflows/ci.yml`, `Makefile`	Add Bats/Pytest config; CI step replaced to install script toolchain and run script unit tests; many Makefile targets added for test-go/python/shell/scripts and coverage aggregation.
Dev bootstrap & tooling `scripts/setup-dev-env.sh`, `README.md`, `.gitignore`	New bootstrap script to install/select tooling (Go/Python/bats/hyperfine/etc.), README updated with usage and setup examples; `.venv/` ignored.
Shared IO & path utilities `scripts/lib.sh`, `scripts/benchlib/io_utils.py`	Introduce Bash helpers (resolve_path, ensure_path_under_root) and Python benchlib I/O (ensure_under_root, read_json, write_json, load_json_policy).
Script refactor: path & policy handling `scripts/run-all.sh`, `scripts/run-single.sh`, `scripts/benchmark-measure.py`, `scripts/benchmark-quality-check.py`, `scripts/environment-manifest.py`, `scripts/validate-result-schemas.py`	Replace ad-hoc path and JSON/YAML handling with shared utilities; prefer `stats-policy.json` with YAML fallback; add JSON Schema validation integration; centralize safe JSON writes and under-root checks.
Policy & docs `stats-policy.json`, `METHODOLOGY.md`, `docs/architecture.md`, `docs/guides/benchmark-publication-policy.md`, `docs/guides/benchmark-workflow.md`	Add JSON policy file and update docs to reference `stats-policy.json` while noting YAML backward compatibility.
New tests & fixtures (unit, integration, parity) `tests/unit/...`, `tests/integration/...`, `tests/fixtures/...`, `cmd/parity-test/main_test.go`	Add many pytest unit tests, bats integration tests, deterministic fixtures (raw/summary), and extensive Go parity tests exercising HTTP scenarios and seed behavior.
Environment manifest & helpers `scripts/environment-manifest.py`, `tests/unit/test_environment_manifest.py`, `tests/integration/helpers/load.bash`	Move manifest I/O to benchlib helpers, add write_json_safe, and add test helpers for temp results directories.
Removed docs `ROLLOUT_CHECKLIST.md`, `docs/guides/repository-metadata.md`, `docs/guides/AGENTS.md`	Delete rollout checklist and repository-metadata guide; remove one AGENTS.md row entry.
Test runner helpers & integration tests `tests/integration/test_run_scripts.bats`, `tests/integration/helpers/load.bash`, `scripts/run-all.sh`, `scripts/run-single.sh`, `scripts/lib.sh`	Centralize path resolution/validation in `lib.sh`, update run scripts to use it, and add bats tests asserting argument and path-safety guards.
Fixtures & expected outputs `tests/fixtures/...`, `tests/fixtures/summary/expected-summary.json`, `tests/fixtures/summary/expected-report.md`	Add deterministic raw fixtures (ok/skipped/invalid), summary/expected report fixtures used by unit tests and report generation tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: migrate benchmark quality pipeline to policy + OSS tooling (#4) #29: Overlaps on benchmark quality checker, Makefile CI targets, and stats-policy handling — closely related to policy/CI changes here.
feat: implement benchmark environment fingerprinting and manifest artifacts #28: Refactors environment-manifest and I/O helpers; overlaps with benchlib.io_utils and manifest changes.
feat: add benchmark result schemas and validation gates #30: Touches schema validation and validate-result-schemas integration; related to jsonschema additions and schema handling.

Poem

🐰 I hopped through scripts and fixtures bright,

Pytest and Bats keeping tests in sight,
JSON policies snug and paths checked right,
CI hums softly through the night,
The little rabbit cheers the build’s green light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: expanding test coverage for scripts and adding a development environment bootstrap script.
Description check	✅ Passed	The PR description is comprehensive, well-structured, and follows the template with all required sections completed: Type, Summary, Changes, Validation, Checklist, and Resolves.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs/remove-redundant-rollout-metadata

No actionable comments were generated in the recent review. 🎉

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-07T22:23:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@cmd/parity-test/main_test.go`:
- Around line 245-267: The handler in TestRunScenario_DefaultMethodIsGET calls
t.Fatalf from the httptest handler goroutine which is invalid; change the test
to capture the request method via a channel (e.g., create a methodCh before
creating httptest.NewServer, have the handler send r.Method into methodCh
instead of calling t.Fatalf), then in the main test goroutine receive from
methodCh (with a select + timeout or using t.Fatal on timeout) and assert that
the received method equals http.MethodGet; update the httptest handler used in
TestRunScenario_DefaultMethodIsGET and keep runScenario call the same.

In `@scripts/validate-result-schemas.py`:
- Around line 74-81: The sort key for validation errors currently uses
key=lambda err: list(err.path), which can mix ints and strs and raise TypeError;
update the sorting to convert each path element to string (e.g., key=lambda err:
[str(p) for p in err.path]) so comparisons are consistent, keeping the rest of
the logic (Draft202012Validator, validator.iter_errors, errors, first.path,
path_suffix) unchanged.

🧹 Nitpick comments (5)

tests/fixtures/README.md (1)

7-12: Add language specifier to fenced code block.

The directory structure block should have a language identifier for consistent markdown linting.

📝 Suggested fix

-```
+```text
 tests/fixtures/
 ├── raw/           # Raw benchmark result JSON files
 ├── summary/       # Expected summary.json outputs
 └── compose/       # Docker compose test files

</details>

</blockquote></details>
<details>
<summary>scripts/lib.sh (1)</summary><blockquote>

`1-2`: **Consider adding strict mode for safety if sourced after variable expansion.**

While library files sourced early typically inherit strict mode from the parent script, adding `set -euo pipefail` here would provide defense-in-depth if this file is ever sourced after errors have already occurred or if sourced in isolation during testing.


<details>
<summary>♻️ Optional: Add strict mode</summary>

```diff
 #!/usr/bin/env bash
+set -euo pipefail
 
 resolve_path() {

tests/integration/test_run_scripts.bats (1)

13-21: Consider adding output assertions for more robust test validation.

The tests verify exit status but don't validate error messages. Adding output checks would catch regressions in error messaging.
♻️ Optional: Add output assertions
 `@test` "run-single rejects unknown framework" {
   run bash scripts/run-single.sh unknown-framework
   [ "$status" -ne 0 ]
+  [[ "$output" == *"Unknown framework"* ]]
 }
 
 `@test` "run-all rejects raw dir outside results/latest" {
   run env RESULTS_RAW_DIR="/tmp" bash scripts/run-all.sh
   [ "$status" -ne 0 ]
+  [[ "$output" == *"must"* ]] # Validates path rejection message
 }

scripts/benchlib/io_utils.py (1)

15-28: Consider friendlier JSON errors for CLI callers.
json.loads(...) will surface stack traces on missing/malformed files. Wrapping with SystemExit keeps CLI output consistent.

🔧 Possible refinement

-def read_json(path: Path):
-    return json.loads(path.read_text(encoding="utf-8"))
+def read_json(path: Path):
+    try:
+        return json.loads(path.read_text(encoding="utf-8"))
+    except FileNotFoundError as exc:
+        raise SystemExit(f"JSON file not found: {path}") from exc
+    except json.JSONDecodeError as exc:
+        raise SystemExit(
+            f"Malformed JSON in {path}: {exc.msg} at line {exc.lineno}, column {exc.colno}"
+        ) from exc
@@
 def load_json_policy(path: Path, default_on_missing=None):
     if not path.exists():
         if default_on_missing is not None:
             return default_on_missing
         raise SystemExit(f"Policy file not found: {path}")
-    return json.loads(path.read_text(encoding="utf-8"))
+    return read_json(path)

cmd/parity-test/main_test.go (1)

280-316: Make negative contract cases assert the shared validation logic.
Right now the test re-checks field values directly, so it will pass even if the fixture contract checks change. Consider extracting the contract validation into a helper and asserting the helper rejects these cases.
♻️ Suggested change for this test
-			if tc.scenario.Request.Path != "" && tc.scenario.Response.Status > 0 {
-				t.Fatalf("expected invalid fixture contract for case %q", tc.name)
-			}
+			if isValidScenarioContract(tc.scenario) {
+				t.Fatalf("expected invalid fixture contract for case %q", tc.name)
+			}
Example helper to add near TestParityFixtures_AreWellFormed and reuse there too:
func isValidScenarioContract(s Scenario) bool {
	if s.Name == "" {
		return false
	}
	if s.Request.Path == "" {
		return false
	}
	if s.Response.Status <= 0 {
		return false
	}
	return true
}

cmd/parity-test/main_test.go

scripts/setup-dev-env.sh

scripts/validate-result-schemas.py

aryeko added 4 commits February 7, 2026 22:59

docs: remove redundant rollout and metadata guides

035962c

test: expand parity and benchmark script test coverage

b18d2c3

refactor: consolidate benchmark script helpers and schema checks

4010b35

chore: add dev bootstrap script and refresh docs

5a0b5d9

coderabbitai bot requested changes Feb 7, 2026

View reviewed changes

cmd/parity-test/main_test.go Show resolved Hide resolved

scripts/setup-dev-env.sh Show resolved Hide resolved

scripts/validate-result-schemas.py Show resolved Hide resolved

aryeko added 2 commits February 8, 2026 00:36

chore: route CI installs through make and setup script modes

350c88c

fix: address reviewer-reported test and setup edge cases

95192d9

coderabbitai bot approved these changes Feb 7, 2026

View reviewed changes

aryeko merged commit 062de02 into main Feb 7, 2026
8 checks passed

aryeko deleted the docs/remove-redundant-rollout-metadata branch February 7, 2026 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: expand script coverage and add dev setup bootstrap#36

test: expand script coverage and add dev setup bootstrap#36
aryeko merged 6 commits intomainfrom
docs/remove-redundant-rollout-metadata

aryeko commented Feb 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Feb 7, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aryeko commented Feb 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type

Summary

Changes

Follow-up fixes from review comments

Validation

Checklist

Resolves

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov bot commented Feb 7, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aryeko commented Feb 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 7, 2026 •

edited

Loading