Skip to content

Conversation

Copy link

Copilot AI commented Sep 30, 2025

Overview

This PR implements an Accuracy Gate CI workflow and supporting infrastructure to improve the reliability and auditability of pull requests in the codesandbox-client repository.

Changes

GitHub Actions Workflow

  • .github/workflows/accuracy-gate.yml: New workflow that runs on every pull request
    • Uses Node.js 20 with npm caching for faster builds
    • Executes verification steps with a 25-minute timeout
    • Uploads logs as artifacts even on failure for debugging and audit purposes

Scripts

  • scripts/common.sh: Shared bash utilities providing:

    • Strict error handling with set -Eeuo pipefail
    • Timestamped logging to ./logs directory
    • Error and exit traps for comprehensive audit trails
    • Retry function with exponential backoff for network resilience
  • scripts/verify.sh: Main verification script that:

    • Runs npm ci with retry logic to handle transient network issues
    • Executes lint, typecheck, and test commands defensively using --if-present flags
    • Includes fallback test execution if jest-junit reporter fails
    • Supports deterministic testing with SEED environment variable
  • scripts/run.js: Problem-solving framework implementing:

    • Structured define-plan-execute-validate workflow
    • Global error handlers for unhandled rejections and exceptions
    • Audit trail with timestamps for tracking execution

Documentation

  • docs/problem-solving-checklist.md: Accuracy checklist documenting best practices for:
    • Problem definition and planning
    • Evidence gathering and validation
    • Execution with deterministic seeds and bounded retries
    • Documentation and artifact attachment

Key Features

No Breaking Changes: Scripts use --if-present flags, making them compatible with any repository structure without requiring package.json modifications

Robust Error Handling: Strict bash error handling and JavaScript error listeners ensure failures are caught and logged

Always Upload Logs: The workflow uses if: always() to ensure diagnostic logs are available even when builds fail

Defensive Design: Retry logic and fallback mechanisms handle transient failures gracefully

Testing

The workflow will run automatically on this PR, demonstrating the accuracy gate in action. Logs will be uploaded as artifacts for inspection.

Future Work

These files can be replicated to other active repositories (mithril, smoke-tests, updatecli, WasabiDoc, coinbase-pro-node, DefiLlama-Adapters, zodiac-modifier-roles, eas-sdk, stacks-core, api-docs, mempool, pancake-frontend, lodestar) with minimal adjustments (primarily Node.js version configuration).

Original prompt

Implement an Accuracy Gate CI workflow and supporting, error-handled scripts to improve reliability and auditability of PRs.

Add the following files with exact contents:

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'
LOG_DIR="${LOG_DIR:-./logs}"; mkdir -p "$LOG_DIR"
LOG_FILE="${LOG_FILE:-$LOG_DIR/run_$(date -u +%Y%m%dT%H%M%SZ).log}"

log() { printf "[%s] %s\n" "$(date -u +%FT%TZ)" "$*" | tee -a "$LOG_FILE"; }
trap 'status=$?; line=${BASH_LINENO[0]:-?}; log "ERR status=$status line=$line cmd=${BASH_COMMAND}"; exit $status' ERR
trap 'log "EXIT status=$?"' EXIT

retry() { local tries="${2:-3}" delay="${3:-2}" n=0; until "$1"; do n=$((n+1)); (( n>=tries )) && return 1; sleep $((delay*n)); done; }
#!/usr/bin/env bash
set -Eeuo pipefail
source "$(dirname "$0")/common.sh"
log "Start verification"
: "${CI:=false}" "${SEED:=42}"
export NODE_OPTIONS="--max-old-space-size=4096"
retry "npm ci" 3 3
npm run lint --if-present
npm run typecheck --if-present
npm test --if-present -- --ci --runInBand --seed="$SEED" --reporters=default --reporters=jest-junit || npm test --if-present -- --ci --runInBand
log "Verification complete"
process.on('unhandledRejection', e => { console.error('[unhandledRejection]', e); process.exit(1); });
process.on('uncaughtException', e => { console.error('[uncaughtException]', e); process.exit(1); });
export async function solve({define, plan, execute, validate}) {
  const ctx = { startedAt: new Date().toISOString() };
  const spec = await define(); if (!spec?.goal || !spec?.constraints) throw new Error('Invalid spec');
  const steps = await plan(spec); if (!Array.isArray(steps) || !steps.length) throw new Error('Empty plan');
  const result = await execute(steps, spec);
  const verdict = await validate(result, spec); if (verdict !== true) throw new Error('Validation failed');
  return { result, audit: { spec, steps, endedAt: new Date().toISOString() } };
}
name: Accuracy Gate
on: [pull_request]
jobs:
  verify:
    runs-on: ubuntu-latest
    timeout-minutes: 25
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - name: Verify
        run: |
          chmod +x scripts/*.sh || true
          ./scripts/verify.sh
      - uses: actions/upload-artifact@v4
        if: always()
        with: { name: logs, path: logs }
# Problem-Solving Accuracy Checklist
- Define: goal, constraints, assumptions, stakeholders, success criteria, deadline.
- Evidence: inputs, data sources, error budgets, known edge cases.
- Plan: options with trade-offs, chosen path, rollback.
- Execute: deterministic seed, idempotent steps, timeouts, retries (bounded).
- Validate: assertions, golden tests, invariants, acceptance criteria met.
- Document: decisions, risks, follow-ups; attach logs/artifacts.

Acceptance criteria:

  • Workflow runs on every PR and uploads logs even on failure.
  • Scripts are executable and use strict error handling.
  • No changes to existing build/test commands required; script is defensive (uses --if-present).
  • Passing CI on a smoke PR demonstrates the gate works.

After merging this PR, replicate the same files to other active repos (those with recent PRs: mithril, smoke-tests, updatecli, WasabiDoc, coinbase-pro-node, DefiLlama-Adapters, zodiac-modifier-roles, eas-sdk, stacks-core, api-docs, mempool, pancake-frontend, lodestar). We'll open follow-up PRs with identical contents, adjusting Node version if repo requires different runtime.

*This pull request was created as a result of the following prompt from Copilot chat.* > Implement an Accuracy Gate CI workflow and supporting, error-handled scripts to improve reliability and auditability of PRs. > > Add the following files with exact contents: > > ```bash name=scripts/common.sh > #!/usr/bin/env bash > set -Eeuo pipefail > IFS=$'\n\t' > LOG_DIR="${LOG_DIR:-./logs}"; mkdir -p "$LOG_DIR" > LOG_FILE="${LOG_FILE:-$LOG_DIR/run_$(date -u +%Y%m%dT%H%M%SZ).log}" > > log() { printf "[%s] %s\n" "$(date -u +%FT%TZ)" "$*" | tee -a "$LOG_FILE"; } > trap 'status=$?; line=${BASH_LINENO[0]:-?}; log "ERR status=$status line=$line cmd=${BASH_COMMAND}"; exit $status' ERR > trap 'log "EXIT status=$?"' EXIT > > retry() { local tries="${2:-3}" delay="${3:-2}" n=0; until "$1"; do n=$((n+1)); (( n>=tries )) && return 1; sleep $((delay*n)); done; } > ``` > > ```bash name=scripts/verify.sh > #!/usr/bin/env bash > set -Eeuo pipefail > source "$(dirname "$0")/common.sh" > log "Start verification" > : "${CI:=false}" "${SEED:=42}" > export NODE_OPTIONS="--max-old-space-size=4096" > retry "npm ci" 3 3 > npm run lint --if-present > npm run typecheck --if-present > npm test --if-present -- --ci --runInBand --seed="$SEED" --reporters=default --reporters=jest-junit || npm test --if-present -- --ci --runInBand > log "Verification complete" > ``` > > ```javascript name=scripts/run.js > process.on('unhandledRejection', e => { console.error('[unhandledRejection]', e); process.exit(1); }); > process.on('uncaughtException', e => { console.error('[uncaughtException]', e); process.exit(1); }); > export async function solve({define, plan, execute, validate}) { > const ctx = { startedAt: new Date().toISOString() }; > const spec = await define(); if (!spec?.goal || !spec?.constraints) throw new Error('Invalid spec'); > const steps = await plan(spec); if (!Array.isArray(steps) || !steps.length) throw new Error('Empty plan'); > const result = await execute(steps, spec); > const verdict = await validate(result, spec); if (verdict !== true) throw new Error('Validation failed'); > return { result, audit: { spec, steps, endedAt: new Date().toISOString() } }; > } > ``` > > ```yaml name=.github/workflows/accuracy-gate.yml > name: Accuracy Gate > on: [pull_request] > jobs: > verify: > runs-on: ubuntu-latest > timeout-minutes: 25 > steps: > - uses: actions/checkout@v4 > - uses: actions/setup-node@v4 > with: { node-version: '20', cache: 'npm' } > - name: Verify > run: | > chmod +x scripts/*.sh || true > ./scripts/verify.sh > - uses: actions/upload-artifact@v4 > if: always() > with: { name: logs, path: logs } > ``` > > ````markdown name=docs/problem-solving-checklist.md > # Problem-Solving Accuracy Checklist > - Define: goal, constraints, assumptions, stakeholders, success criteria, deadline. > - Evidence: inputs, data sources, error budgets, known edge cases. > - Plan: options with trade-offs, chosen path, rollback. > - Execute: deterministic seed, idempotent steps, timeouts, retries (bounded). > - Validate: assertions, golden tests, invariants, acceptance criteria met. > - Document: decisions, risks, follow-ups; attach logs/artifacts. > ```` > > Acceptance criteria: > - Workflow runs on every PR and uploads logs even on failure. > - Scripts are executable and use strict error handling. > - No changes to existing build/test commands required; script is defensive (uses --if-present). > - Passing CI on a smoke PR demonstrates the gate works. > > After merging this PR, replicate the same files to other active repos (those with recent PRs: mithril, smoke-tests, updatecli, WasabiDoc, coinbase-pro-node, DefiLlama-Adapters, zodiac-modifier-roles, eas-sdk, stacks-core, api-docs, mempool, pancake-frontend, lodestar). We'll open follow-up PRs with identical contents, adjusting Node version if repo requires different runtime. >

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@snyk-io
Copy link

snyk-io bot commented Sep 30, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

Co-authored-by: ivan09069 <202161141+ivan09069@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Accuracy Gate CI and error-handled scripts Add Accuracy Gate CI workflow with error-handled scripts for improved PR reliability Sep 30, 2025
Copilot AI requested a review from ivan09069 September 30, 2025 18:46
Copilot finished work on behalf of ivan09069 September 30, 2025 18:46
@ivan09069 ivan09069 marked this pull request as ready for review October 1, 2025 01:43
@ivan09069 ivan09069 merged commit f0cef25 into main Oct 1, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants