Skip to content

test: expand script coverage and add dev setup bootstrap#36

Merged
aryeko merged 6 commits intomainfrom
docs/remove-redundant-rollout-metadata
Feb 7, 2026
Merged

test: expand script coverage and add dev setup bootstrap#36
aryeko merged 6 commits intomainfrom
docs/remove-redundant-rollout-metadata

Conversation

@aryeko
Copy link
Contributor

@aryeko aryeko commented Feb 7, 2026

Type

  • feat - New capability
  • fix - Bug fix
  • docs - Documentation only
  • test - Test changes
  • chore - Tooling, CI, maintenance

Summary

This PR expands parity and benchmark-script test coverage, refactors duplicated script helper logic, and improves local/CI setup ergonomics with a bootstrap script and Make wrappers. It also updates CI to use Make-based setup instead of inline install scripts and removes actionlint from CI per request.

Changes

  • Added Go parity tests in cmd/parity-test/main_test.go for:
    • seedTarget disabled/missing/non-2xx paths
    • runScenario status/header/invalid-JSON/default-method paths
    • matcher interpolation edge cases
  • Added deterministic fixtures and expected outputs under tests/fixtures/.
  • Added Python unit tests and shell integration tests under tests/unit/ and tests/integration/.
  • Added test infrastructure/config:
    • pytest.ini, .coveragerc, .bats-version
    • Make targets: test-go, test-python, test-shell, test-scripts, test-coverage-python
  • Refactored shared script helpers:
    • shell: scripts/lib.sh
    • python: scripts/benchlib/io_utils.py
    • updated consumers in benchmark scripts.
  • Added JSON Schema validation via jsonschema in scripts/validate-result-schemas.py.
  • Added policy JSON source stats-policy.json and JSON-first loading with YAML fallback compatibility.
  • Added scripts/setup-dev-env.sh with:
    • --ci mode
    • --subset install groups
    • Make wrappers: setup-dev-env, setup-dev-env-ci, setup-dev-env-ci-scripts
  • Updated CI scripts job to use Make entrypoints (no inline install scripts).
  • Updated docs to reflect setup/testing/policy changes.

Follow-up fixes from review comments

  • Fixed goroutine-unsafe t.Fatalf usage in TestRunScenario_DefaultMethodIsGET by asserting request method on test goroutine via channel.
  • Fixed potential mixed-type sort crash in JSON Schema error sorting by stringifying err.path parts.
  • Hardened scripts/setup-dev-env.sh install helpers so failed brew/apt installs return cleanly under set -e instead of aborting unexpectedly.

Validation

go test ./...
.venv/bin/pytest tests/unit
make test-shell
PATH="$PWD/.venv/bin:$PATH" python3 scripts/generate-report.py
PATH="$PWD/.venv/bin:$PATH" make benchmark-schema-validate
PATH="$PWD/.venv/bin:$PATH" make ci-benchmark-quality-check
PATH="$PWD/.venv/bin:$PATH" make report-disclaimer-check
PATH="$PWD/.venv/bin:$PATH" make methodology-changelog-check
PATH="$PWD/.venv/bin:$PATH" make publication-sync-check
bash -n scripts/setup-dev-env.sh
bash scripts/setup-dev-env.sh --help

Checklist

  • Code and docs follow project conventions
  • Tests updated/added for behavior changes
  • Parity contract/fixtures updated if API behavior changed
  • Related design/docs updated if matcher semantics changed

Resolves

Resolves #

Summary by CodeRabbit

  • New Features

    • Added multi-language test support (Python + shell), deterministic test fixtures, and a one-command local dev environment bootstrap.
  • Documentation

    • Switched policy/config references to JSON with YAML fallback; expanded CONTRIBUTING and tooling docs; added fixture and test guidance; removed legacy rollout and repository-metadata guides.
  • Chores

    • Enhanced CI and Make targets to run tests and coverage; added result/schema validation, path-safety checks, and a quality policy JSON.

@coderabbitai
Copy link

coderabbitai bot commented Feb 7, 2026

📝 Walkthrough

Walkthrough

Adds multi-language tests and fixtures, centralizes path/IO utilities (Bash + Python), prefers stats-policy.json with YAML fallback, expands Makefile/CI to run script tests and coverage, introduces a dev environment bootstrap script, and updates many scripts to use the shared utilities and JSON policy loader.

Changes

Cohort / File(s) Summary
Testing infra & CI
\.bats-version, pytest.ini, .github/workflows/ci.yml, Makefile
Add Bats/Pytest config; CI step replaced to install script toolchain and run script unit tests; many Makefile targets added for test-go/python/shell/scripts and coverage aggregation.
Dev bootstrap & tooling
scripts/setup-dev-env.sh, README.md, .gitignore
New bootstrap script to install/select tooling (Go/Python/bats/hyperfine/etc.), README updated with usage and setup examples; .venv/ ignored.
Shared IO & path utilities
scripts/lib.sh, scripts/benchlib/io_utils.py
Introduce Bash helpers (resolve_path, ensure_path_under_root) and Python benchlib I/O (ensure_under_root, read_json, write_json, load_json_policy).
Script refactor: path & policy handling
scripts/run-all.sh, scripts/run-single.sh, scripts/benchmark-measure.py, scripts/benchmark-quality-check.py, scripts/environment-manifest.py, scripts/validate-result-schemas.py
Replace ad-hoc path and JSON/YAML handling with shared utilities; prefer stats-policy.json with YAML fallback; add JSON Schema validation integration; centralize safe JSON writes and under-root checks.
Policy & docs
stats-policy.json, METHODOLOGY.md, docs/architecture.md, docs/guides/benchmark-publication-policy.md, docs/guides/benchmark-workflow.md
Add JSON policy file and update docs to reference stats-policy.json while noting YAML backward compatibility.
New tests & fixtures (unit, integration, parity)
tests/unit/..., tests/integration/..., tests/fixtures/..., cmd/parity-test/main_test.go
Add many pytest unit tests, bats integration tests, deterministic fixtures (raw/summary), and extensive Go parity tests exercising HTTP scenarios and seed behavior.
Environment manifest & helpers
scripts/environment-manifest.py, tests/unit/test_environment_manifest.py, tests/integration/helpers/load.bash
Move manifest I/O to benchlib helpers, add write_json_safe, and add test helpers for temp results directories.
Removed docs
ROLLOUT_CHECKLIST.md, docs/guides/repository-metadata.md, docs/guides/AGENTS.md
Delete rollout checklist and repository-metadata guide; remove one AGENTS.md row entry.
Test runner helpers & integration tests
tests/integration/test_run_scripts.bats, tests/integration/helpers/load.bash, scripts/run-all.sh, scripts/run-single.sh, scripts/lib.sh
Centralize path resolution/validation in lib.sh, update run scripts to use it, and add bats tests asserting argument and path-safety guards.
Fixtures & expected outputs
tests/fixtures/..., tests/fixtures/summary/expected-summary.json, tests/fixtures/summary/expected-report.md
Add deterministic raw fixtures (ok/skipped/invalid), summary/expected report fixtures used by unit tests and report generation tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through scripts and fixtures bright,

Pytest and Bats keeping tests in sight,
JSON policies snug and paths checked right,
CI hums softly through the night,
The little rabbit cheers the build’s green light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: expanding test coverage for scripts and adding a development environment bootstrap script.
Description check ✅ Passed The PR description is comprehensive, well-structured, and follows the template with all required sections completed: Type, Summary, Changes, Validation, Checklist, and Resolves.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/remove-redundant-rollout-metadata

No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@cmd/parity-test/main_test.go`:
- Around line 245-267: The handler in TestRunScenario_DefaultMethodIsGET calls
t.Fatalf from the httptest handler goroutine which is invalid; change the test
to capture the request method via a channel (e.g., create a methodCh before
creating httptest.NewServer, have the handler send r.Method into methodCh
instead of calling t.Fatalf), then in the main test goroutine receive from
methodCh (with a select + timeout or using t.Fatal on timeout) and assert that
the received method equals http.MethodGet; update the httptest handler used in
TestRunScenario_DefaultMethodIsGET and keep runScenario call the same.

In `@scripts/validate-result-schemas.py`:
- Around line 74-81: The sort key for validation errors currently uses
key=lambda err: list(err.path), which can mix ints and strs and raise TypeError;
update the sorting to convert each path element to string (e.g., key=lambda err:
[str(p) for p in err.path]) so comparisons are consistent, keeping the rest of
the logic (Draft202012Validator, validator.iter_errors, errors, first.path,
path_suffix) unchanged.
🧹 Nitpick comments (5)
tests/fixtures/README.md (1)

7-12: Add language specifier to fenced code block.

The directory structure block should have a language identifier for consistent markdown linting.

📝 Suggested fix
-```
+```text
 tests/fixtures/
 ├── raw/           # Raw benchmark result JSON files
 ├── summary/       # Expected summary.json outputs
 └── compose/       # Docker compose test files
</details>

</blockquote></details>
<details>
<summary>scripts/lib.sh (1)</summary><blockquote>

`1-2`: **Consider adding strict mode for safety if sourced after variable expansion.**

While library files sourced early typically inherit strict mode from the parent script, adding `set -euo pipefail` here would provide defense-in-depth if this file is ever sourced after errors have already occurred or if sourced in isolation during testing.


<details>
<summary>♻️ Optional: Add strict mode</summary>

```diff
 #!/usr/bin/env bash
+set -euo pipefail
 
 resolve_path() {
tests/integration/test_run_scripts.bats (1)

13-21: Consider adding output assertions for more robust test validation.

The tests verify exit status but don't validate error messages. Adding output checks would catch regressions in error messaging.

♻️ Optional: Add output assertions
 `@test` "run-single rejects unknown framework" {
   run bash scripts/run-single.sh unknown-framework
   [ "$status" -ne 0 ]
+  [[ "$output" == *"Unknown framework"* ]]
 }
 
 `@test` "run-all rejects raw dir outside results/latest" {
   run env RESULTS_RAW_DIR="/tmp" bash scripts/run-all.sh
   [ "$status" -ne 0 ]
+  [[ "$output" == *"must"* ]] # Validates path rejection message
 }
scripts/benchlib/io_utils.py (1)

15-28: Consider friendlier JSON errors for CLI callers.
json.loads(...) will surface stack traces on missing/malformed files. Wrapping with SystemExit keeps CLI output consistent.

🔧 Possible refinement
-def read_json(path: Path):
-    return json.loads(path.read_text(encoding="utf-8"))
+def read_json(path: Path):
+    try:
+        return json.loads(path.read_text(encoding="utf-8"))
+    except FileNotFoundError as exc:
+        raise SystemExit(f"JSON file not found: {path}") from exc
+    except json.JSONDecodeError as exc:
+        raise SystemExit(
+            f"Malformed JSON in {path}: {exc.msg} at line {exc.lineno}, column {exc.colno}"
+        ) from exc
@@
 def load_json_policy(path: Path, default_on_missing=None):
     if not path.exists():
         if default_on_missing is not None:
             return default_on_missing
         raise SystemExit(f"Policy file not found: {path}")
-    return json.loads(path.read_text(encoding="utf-8"))
+    return read_json(path)
cmd/parity-test/main_test.go (1)

280-316: Make negative contract cases assert the shared validation logic.
Right now the test re-checks field values directly, so it will pass even if the fixture contract checks change. Consider extracting the contract validation into a helper and asserting the helper rejects these cases.

♻️ Suggested change for this test
-			if tc.scenario.Request.Path != "" && tc.scenario.Response.Status > 0 {
-				t.Fatalf("expected invalid fixture contract for case %q", tc.name)
-			}
+			if isValidScenarioContract(tc.scenario) {
+				t.Fatalf("expected invalid fixture contract for case %q", tc.name)
+			}

Example helper to add near TestParityFixtures_AreWellFormed and reuse there too:

func isValidScenarioContract(s Scenario) bool {
	if s.Name == "" {
		return false
	}
	if s.Request.Path == "" {
		return false
	}
	if s.Response.Status <= 0 {
		return false
	}
	return true
}

@aryeko aryeko merged commit 062de02 into main Feb 7, 2026
8 checks passed
@aryeko aryeko deleted the docs/remove-redundant-rollout-metadata branch February 7, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant