Skip to content

Expand regression coverage and harden Electron E2E#1918

Open
nwparker wants to merge 1 commit into
stablyai:mainfrom
nwparker:nwparker/regression-tests
Open

Expand regression coverage and harden Electron E2E#1918
nwparker wants to merge 1 commit into
stablyai:mainfrom
nwparker:nwparker/regression-tests

Conversation

@nwparker
Copy link
Copy Markdown
Contributor

@nwparker nwparker commented May 15, 2026

Summary

This PR expands Orca's regression coverage across main-process services, renderer stores/UI harnesses, pane/terminal infrastructure, and Electron E2E tests. It also hardens the Electron Playwright suite around deterministic setup, platform-correct shortcuts, explicit readiness checks, and isolated seeded repositories.

The branch is test-focused. No production runtime behavior is intended to change outside the test/coverage tooling and dependency updates needed to support the expanded regression suite.

Rebase Note

Rebased onto origin/main after PR #2070 landed.

  • Current branch head: 07d5ca0657
  • Current base includes 6eae8fa540 ci: shard electron e2e workflow (#2070)
  • Preserved the Shard Electron E2E workflow under CI time cap #2070 E2E speedups: shared build artifact workflow, 5-way Electron E2E sharding in CI, Linux E2E GPU guardrails, and lazy electron-updater behavior are all now inherited from main rather than duplicated in this branch.
  • Conflict resolution kept both sides where compatible, then updated stale test expectations to match the post-Shard Electron E2E workflow under CI time cap #2070/current-main APIs.

Coverage Snapshot

Latest verified command:

pnpm run coverage

Result:

Test Files  693 passed | 1 skipped (694)
Tests       7457 passed | 10 skipped (7467)

Coverage summary from V8:

Metric Covered Total Coverage
Statements 46,774 68,085 68.69%
Branches 27,877 46,577 59.85%
Functions 8,763 13,214 66.31%
Lines 45,727 66,300 68.96%
xychart-beta
    title "Post-Rebase Vitest Coverage"
    x-axis ["Statements", "Branches", "Functions", "Lines"]
    y-axis "Coverage %" 0 --> 100
    bar [68.69, 59.85, 66.31, 68.96]
Loading
flowchart LR
    A[Regression Coverage PR] --> B[Main Process]
    A --> C[Renderer Stores + Hooks]
    A --> D[Terminal + Pane Infrastructure]
    A --> E[Electron E2E Harness]
    B --> B1[Browser, GitHub, Linear, Usage, Runtime, Filesystem, Speech]
    C --> C1[Browser, GitHub, Linear, UI slices, issue metadata]
    D --> D1[PaneManager, TerminalPane lifecycle, keyboard, context menu, layout]
    E --> E1[Seeded repos, platform shortcuts, readiness polling, restart flows]
Loading

The previous local regression-test baseline was roughly 65.19% line coverage; this branch now verifies 68.96% lines after rebasing onto the larger current main test surface.

What Changed

Regression coverage manifest and tooling

  • Added tests/regression-coverage-manifest.json as a curated map of target areas and coverage intent.
  • Added tools/regression-coverage-report.mjs plus package script wiring for reporting regression coverage progress.
  • Updated package metadata/lockfile for the test/coverage tooling changes.

Main-process regression coverage

Added or expanded tests for high-risk main-process contracts, including:

  • Browser cookie import, browser profile selection, CDP bridge behavior, and browser IPC.
  • Claude/Codex usage scanners and persisted usage stores.
  • GitHub project-view parsing and fallback behavior.
  • Filesystem IPC, watcher behavior, local git routing, and SSH git provider routing.
  • Linear workspace-aware client storage plus issue/team service behavior.
  • Rate-limit service behavior.
  • Runtime browser/git/file command routing, including local and SSH paths.
  • Speech model-manager state transitions, archive integrity checks, HTTPS enforcement, extraction, cancellation, and deletion.

Renderer and store regression coverage

Added or expanded tests for renderer-facing behavior, including:

  • Browser, GitHub, Linear, and UI Zustand slices.
  • Source Control render behavior.
  • Manage Sessions settings rendering.
  • TerminalPane render, lifecycle, context menu, keyboard handler, expand/collapse, and layout behavior.
  • Pane manager tree/DOM operations and stable pane identity behavior.
  • Issue metadata hooks and cache invalidation.
  • Rich markdown keyboard handling.

Several renderer tests intentionally use narrow fake DOM/React harnesses. Where that is the case, comments explain why: these are structural regression tests around large, hard-to-mount surfaces, not replacements for user-flow E2E coverage.

Electron E2E hardening

Reviewed and improved the Electron E2E tests with a focus on deterministic Playwright behavior:

  • Centralized Cmd/Ctrl shortcut handling in tests/e2e/helpers/shortcuts.ts so specs follow renderer platform logic instead of hardcoding macOS assumptions.
  • Moved seeded git repo creation into tests/e2e/helpers/seeded-git-repo.ts and reused it from global setup and fixtures.
  • Replaced shell-string setup commands with execFileSync argument arrays where practical, including Electron build and git setup paths.
  • Replaced Electron readiness waitForFunction calls with expect.poll assertions that have explicit failure messages.
  • Removed fixed sleeps from touched specs and replaced them with state, event, or renderer-frame checks where possible.
  • Kept Electron-specific lifecycle cleanup intact: each test still gets isolated userData, and restart-persistence tests still own their persistent launch session explicitly.
  • Kept the E2E fixture compatible with the faster post-Shard Electron E2E workflow under CI time cap #2070 parallel headless run.

Validation

Check Command Result
Lint pnpm exec oxlint Passed: 0 warnings, 0 errors
Typecheck pnpm run tc Passed
Full Vitest coverage pnpm run coverage Passed: 693 files, 7457 tests
Focused post-rebase Vitest fallout pnpm exec vitest run --config config/vitest.config.ts ... Passed: 8 files, 104 tests
Full Electron headless E2E SKIP_BUILD=1 pnpm exec playwright test --config tests/playwright.config.ts --project=electron-headless Passed: 94 passed, 1 skipped, 2.1m
Full Electron headful E2E SKIP_BUILD=1 pnpm exec playwright test --config tests/playwright.config.ts --project=electron-headful Passed: 7 passed, 44.8s

Notes:

  • pnpm run test:e2e was also run to rebuild the Electron E2E output after the rebase. That run exposed a focused terminal-attention marker issue introduced during conflict resolution; the test was fixed and then the full headless/headful Electron projects were rerun successfully with SKIP_BUILD=1 against that build.
  • The local stably binary was checked earlier, but this installed CLI only exposes auth and dev, not stably test, so validation used the repository Playwright runner.

Review Guidance

Recommended review order:

  1. Start with the E2E harness changes:
    • tests/e2e/helpers/seeded-git-repo.ts
    • tests/e2e/global-setup.ts
    • tests/e2e/helpers/orca-app.ts
    • tests/e2e/helpers/orca-restart.ts
    • tests/e2e/helpers/shortcuts.ts
  2. Review touched E2E specs for waiting and shortcut changes.
  3. Review new test-only coverage additions by domain:
    • main runtime/browser/git/files tests
    • usage stores/scanners
    • renderer terminal/pane-manager tests
    • store slice tests
  4. Treat long test files as intentional regression harnesses. They include max-line disables where splitting would duplicate setup or hide the cross-case contract being tested.

Risk

Primary risk is test-maintenance cost, not runtime behavior. This PR adds many tests and broadens mocked harness coverage, so future refactors may need to update tests that now lock down previously untested contracts. The E2E changes are intended to reduce flake risk by removing arbitrary sleeps and using explicit readiness/state checks.

@nwparker nwparker changed the title Add regression coverage measurement and settings E2E Expand regression coverage and harden Electron E2E May 15, 2026
- test: expand regression coverage and harden e2e

- Add regression coverage measurement and settings E2E
@nwparker nwparker force-pushed the nwparker/regression-tests branch from da118e5 to 07d5ca0 Compare May 16, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant