Add Accuracy Gate CI workflow with error-handled scripts for improved PR reliability #40

Copilot · 2025-09-30T18:40:17Z

Overview

This PR implements an Accuracy Gate CI workflow and supporting infrastructure to improve the reliability and auditability of pull requests in the codesandbox-client repository.

Changes

GitHub Actions Workflow

.github/workflows/accuracy-gate.yml: New workflow that runs on every pull request
- Uses Node.js 20 with npm caching for faster builds
- Executes verification steps with a 25-minute timeout
- Uploads logs as artifacts even on failure for debugging and audit purposes

Scripts

scripts/common.sh: Shared bash utilities providing:
- Strict error handling with set -Eeuo pipefail
- Timestamped logging to ./logs directory
- Error and exit traps for comprehensive audit trails
- Retry function with exponential backoff for network resilience
scripts/verify.sh: Main verification script that:
- Runs npm ci with retry logic to handle transient network issues
- Executes lint, typecheck, and test commands defensively using --if-present flags
- Includes fallback test execution if jest-junit reporter fails
- Supports deterministic testing with SEED environment variable
scripts/run.js: Problem-solving framework implementing:
- Structured define-plan-execute-validate workflow
- Global error handlers for unhandled rejections and exceptions
- Audit trail with timestamps for tracking execution

Documentation

docs/problem-solving-checklist.md: Accuracy checklist documenting best practices for:
- Problem definition and planning
- Evidence gathering and validation
- Execution with deterministic seeds and bounded retries
- Documentation and artifact attachment

Key Features

✅ No Breaking Changes: Scripts use --if-present flags, making them compatible with any repository structure without requiring package.json modifications

✅ Robust Error Handling: Strict bash error handling and JavaScript error listeners ensure failures are caught and logged

✅ Always Upload Logs: The workflow uses if: always() to ensure diagnostic logs are available even when builds fail

✅ Defensive Design: Retry logic and fallback mechanisms handle transient failures gracefully

Testing

The workflow will run automatically on this PR, demonstrating the accuracy gate in action. Logs will be uploaded as artifacts for inspection.

Future Work

These files can be replicated to other active repositories (mithril, smoke-tests, updatecli, WasabiDoc, coinbase-pro-node, DefiLlama-Adapters, zodiac-modifier-roles, eas-sdk, stacks-core, api-docs, mempool, pancake-frontend, lodestar) with minimal adjustments (primarily Node.js version configuration).

Original prompt

Implement an Accuracy Gate CI workflow and supporting, error-handled scripts to improve reliability and auditability of PRs.

Add the following files with exact contents:

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'
LOG_DIR="${LOG_DIR:-./logs}"; mkdir -p "$LOG_DIR"
LOG_FILE="${LOG_FILE:-$LOG_DIR/run_$(date -u +%Y%m%dT%H%M%SZ).log}"

log() { printf "[%s] %s\n" "$(date -u +%FT%TZ)" "$*" | tee -a "$LOG_FILE"; }
trap 'status=$?; line=${BASH_LINENO[0]:-?}; log "ERR status=$status line=$line cmd=${BASH_COMMAND}"; exit $status' ERR
trap 'log "EXIT status=$?"' EXIT

retry() { local tries="${2:-3}" delay="${3:-2}" n=0; until "$1"; do n=$((n+1)); (( n>=tries )) && return 1; sleep $((delay*n)); done; }

#!/usr/bin/env bash
set -Eeuo pipefail
source "$(dirname "$0")/common.sh"
log "Start verification"
: "${CI:=false}" "${SEED:=42}"
export NODE_OPTIONS="--max-old-space-size=4096"
retry "npm ci" 3 3
npm run lint --if-present
npm run typecheck --if-present
npm test --if-present -- --ci --runInBand --seed="$SEED" --reporters=default --reporters=jest-junit || npm test --if-present -- --ci --runInBand
log "Verification complete"

process.on('unhandledRejection', e => { console.error('[unhandledRejection]', e); process.exit(1); });
process.on('uncaughtException', e => { console.error('[uncaughtException]', e); process.exit(1); });
export async function solve({define, plan, execute, validate}) {
  const ctx = { startedAt: new Date().toISOString() };
  const spec = await define(); if (!spec?.goal || !spec?.constraints) throw new Error('Invalid spec');
  const steps = await plan(spec); if (!Array.isArray(steps) || !steps.length) throw new Error('Empty plan');
  const result = await execute(steps, spec);
  const verdict = await validate(result, spec); if (verdict !== true) throw new Error('Validation failed');
  return { result, audit: { spec, steps, endedAt: new Date().toISOString() } };
}

name: Accuracy Gate
on: [pull_request]
jobs:
  verify:
    runs-on: ubuntu-latest
    timeout-minutes: 25
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - name: Verify
        run: |
          chmod +x scripts/*.sh || true
          ./scripts/verify.sh
      - uses: actions/upload-artifact@v4
        if: always()
        with: { name: logs, path: logs }

# Problem-Solving Accuracy Checklist
- Define: goal, constraints, assumptions, stakeholders, success criteria, deadline.
- Evidence: inputs, data sources, error budgets, known edge cases.
- Plan: options with trade-offs, chosen path, rollback.
- Execute: deterministic seed, idempotent steps, timeouts, retries (bounded).
- Validate: assertions, golden tests, invariants, acceptance criteria met.
- Document: decisions, risks, follow-ups; attach logs/artifacts.

Acceptance criteria:

Workflow runs on every PR and uploads logs even on failure.
Scripts are executable and use strict error handling.
No changes to existing build/test commands required; script is defensive (uses --if-present).
Passing CI on a smoke PR demonstrates the gate works.

After merging this PR, replicate the same files to other active repos (those with recent PRs: mithril, smoke-tests, updatecli, WasabiDoc, coinbase-pro-node, DefiLlama-Adapters, zodiac-modifier-roles, eas-sdk, stacks-core, api-docs, mempool, pancake-frontend, lodestar). We'll open follow-up PRs with identical contents, adjusting Node version if repo requires different runtime.

*This pull request was created as a result of the following prompt from Copilot chat.* > Implement an Accuracy Gate CI workflow and supporting, error-handled scripts to improve reliability and auditability of PRs. > > Add the following files with exact contents: > > ```bash name=scripts/common.sh > #!/usr/bin/env bash > set -Eeuo pipefail > IFS=$'\n\t' > LOG_DIR="${LOG_DIR:-./logs}"; mkdir -p "$LOG_DIR" > LOG_FILE="${LOG_FILE:-$LOG_DIR/run_$(date -u +%Y%m%dT%H%M%SZ).log}" > > log() { printf "[%s] %s\n" "$(date -u +%FT%TZ)" "$*" | tee -a "$LOG_FILE"; } > trap 'status=$?; line=${BASH_LINENO[0]:-?}; log "ERR status=$status line=$line cmd=${BASH_COMMAND}"; exit $status' ERR > trap 'log "EXIT status=$?"' EXIT > > retry() { local tries="${2:-3}" delay="${3:-2}" n=0; until "$1"; do n=$((n+1)); (( n>=tries )) && return 1; sleep $((delay*n)); done; } > ``` > > ```bash name=scripts/verify.sh > #!/usr/bin/env bash > set -Eeuo pipefail > source "$(dirname "$0")/common.sh" > log "Start verification" > : "${CI:=false}" "${SEED:=42}" > export NODE_OPTIONS="--max-old-space-size=4096" > retry "npm ci" 3 3 > npm run lint --if-present > npm run typecheck --if-present > npm test --if-present -- --ci --runInBand --seed="$SEED" --reporters=default --reporters=jest-junit || npm test --if-present -- --ci --runInBand > log "Verification complete" > ``` > > ```javascript name=scripts/run.js > process.on('unhandledRejection', e => { console.error('[unhandledRejection]', e); process.exit(1); }); > process.on('uncaughtException', e => { console.error('[uncaughtException]', e); process.exit(1); }); > export async function solve({define, plan, execute, validate}) { > const ctx = { startedAt: new Date().toISOString() }; > const spec = await define(); if (!spec?.goal || !spec?.constraints) throw new Error('Invalid spec'); > const steps = await plan(spec); if (!Array.isArray(steps) || !steps.length) throw new Error('Empty plan'); > const result = await execute(steps, spec); > const verdict = await validate(result, spec); if (verdict !== true) throw new Error('Validation failed'); > return { result, audit: { spec, steps, endedAt: new Date().toISOString() } }; > } > ``` > > ```yaml name=.github/workflows/accuracy-gate.yml > name: Accuracy Gate > on: [pull_request] > jobs: > verify: > runs-on: ubuntu-latest > timeout-minutes: 25 > steps: > - uses: actions/checkout@v4 > - uses: actions/setup-node@v4 > with: { node-version: '20', cache: 'npm' } > - name: Verify > run: | > chmod +x scripts/*.sh || true > ./scripts/verify.sh > - uses: actions/upload-artifact@v4 > if: always() > with: { name: logs, path: logs } > ``` > > ````markdown name=docs/problem-solving-checklist.md > # Problem-Solving Accuracy Checklist > - Define: goal, constraints, assumptions, stakeholders, success criteria, deadline. > - Evidence: inputs, data sources, error budgets, known edge cases. > - Plan: options with trade-offs, chosen path, rollback. > - Execute: deterministic seed, idempotent steps, timeouts, retries (bounded). > - Validate: assertions, golden tests, invariants, acceptance criteria met. > - Document: decisions, risks, follow-ups; attach logs/artifacts. > ```` > > Acceptance criteria: > - Workflow runs on every PR and uploads logs even on failure. > - Scripts are executable and use strict error handling. > - No changes to existing build/test commands required; script is defensive (uses --if-present). > - Passing CI on a smoke PR demonstrates the gate works. > > After merging this PR, replicate the same files to other active repos (those with recent PRs: mithril, smoke-tests, updatecli, WasabiDoc, coinbase-pro-node, DefiLlama-Adapters, zodiac-modifier-roles, eas-sdk, stacks-core, api-docs, mempool, pancake-frontend, lodestar). We'll open follow-up PRs with identical contents, adjusting Node version if repo requires different runtime. >

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

snyk-io · 2025-09-30T18:42:31Z

🎉 Snyk checks have passed. No issues have been found so far.

✅ security/snyk check is complete. No issues have been found. (View Details)

Co-authored-by: ivan09069 <202161141+ivan09069@users.noreply.github.com>

Initial plan

8c1a796

Copilot AI assigned Copilot and ivan09069 Sep 30, 2025

Copilot started work on behalf of ivan09069 September 30, 2025 18:40 View session

Add Accuracy Gate CI workflow and supporting scripts

da844e9

Co-authored-by: ivan09069 <202161141+ivan09069@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add Accuracy Gate CI and error-handled scripts~~ Add Accuracy Gate CI workflow with error-handled scripts for improved PR reliability Sep 30, 2025

Copilot AI requested a review from ivan09069 September 30, 2025 18:46

Copilot finished work on behalf of ivan09069 September 30, 2025 18:46

ivan09069 marked this pull request as ready for review October 1, 2025 01:43

ivan09069 merged commit f0cef25 into main Oct 1, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Accuracy Gate CI workflow with error-handled scripts for improved PR reliability #40

Add Accuracy Gate CI workflow with error-handled scripts for improved PR reliability #40

Uh oh!

Copilot AI commented Sep 30, 2025 •

edited

Loading

Uh oh!

snyk-io bot commented Sep 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Accuracy Gate CI workflow with error-handled scripts for improved PR reliability #40

Add Accuracy Gate CI workflow with error-handled scripts for improved PR reliability #40

Uh oh!

Conversation

Copilot AI commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

GitHub Actions Workflow

Scripts

Documentation

Key Features

Testing

Future Work

Uh oh!

snyk-io bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Snyk checks have passed. No issues have been found so far.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Sep 30, 2025 •

edited

Loading

snyk-io bot commented Sep 30, 2025 •

edited

Loading