Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 29, 2025

E2E Benchmark Regression Testing Implementation

This PR implements e2e benchmark regression testing as requested in the issue.

Completed Tasks

  • Study existing benchmark infrastructure (workspace project benchmarks, jest-bench.config.ts)
  • Create benchmark infrastructure in workspace-e2e project
    • Create jest-bench-e2e.config.ts for e2e benchmarks
    • Create tsconfig.bench.json in workspace-e2e
    • Add benchmark target to workspace-e2e project.json
    • Update nx.json with shared benchmark target configuration
    • Configure global setup/teardown for local registry
    • Add moduleNameMapper for test-utils package
  • Migrate e2e performance tests to benchmark format
    • Convert performance-benchmark.spec.ts to performance-benchmark.bench.ts
    • Convert performance-stress-test.spec.ts to performance-stress-test.bench.ts
    • Remove old .spec.ts files
    • Ensure tests use workspace nx version for create-nx-workspace
    • Use beforeEachIteration to reset state before runs
    • Remove all logging and assertions from benchmarks
    • Use stdio inherit for all execSync calls
    • Use randomBytes for library name suffixes
  • Replace lodash uniqueId with crypto-based uniqueId
    • Use uniqueId from @internal/test-util package (test-utils)
    • Replaced all uses in workspace-e2e
  • Update CI workflow
    • Add e2e-benchmark job with OS matrix
    • Configure Ubuntu-only for PRs, all OSes for push/workflow_dispatch
    • Configure benchmark-action with 140% threshold
    • Update e2e job to exclude .bench.ts files only
    • Update test job to exclude .bench.ts files
  • Update .gitignore to only ignore individual benchmark files
  • Format code and verify commands locally
  • Fix CI failures with global setup/teardown
  • Fix library naming issues in benchmarks
  • Fix benchmark iteration state management
  • Fix module resolution for test-utils package

Changes Made

Infrastructure:

  • Created jest-bench-e2e.config.ts for e2e benchmark configuration
    • Added globalSetup to start local registry and publish package
    • Added globalTeardown to stop local registry
    • Added moduleNameMapper to resolve @internal/test-util path
  • Created packages/workspace-e2e/tsconfig.bench.json for TypeScript configuration
  • Added benchmark target to packages/workspace-e2e/project.json (reused name)
  • Updated nx.json to share benchmark target configuration for both projects
  • Updated packages/workspace-e2e/tsconfig.json to reference tsconfig.bench.json
  • Updated packages/workspace-e2e/eslint.config.js to allow benchmark files to import from tools directory
  • Uses uniqueId from @internal/test-util package (test-utils)
  • Updated .gitignore to ignore only individual benchmark files, not the directory

E2E Benchmark Tests:

  • Created performance-benchmark.bench.ts with:

    • Single file operations (small, medium, large files)
    • Multiple file operations
    • Uses workspace nx version for create-nx-workspace
    • No logging or console output
    • No assertions (benchmarks measure time only)
    • Uses stdio inherit for visibility
    • beforeEachIteration to reset state before each run
    • randomBytes-based library names for uniqueness
  • Created performance-stress-test.bench.ts with:

    • 10+ projects with cross-project dependencies
    • 100+ large files with imports
    • Uses workspace nx version for create-nx-workspace
    • No logging or console output
    • No assertions
    • Uses stdio inherit for visibility
    • beforeEachIteration to reset state before each run
    • randomBytes-based library names for uniqueness
  • Removed old performance-benchmark.spec.ts and performance-stress-test.spec.ts

Use test-utils package:

  • Uses uniqueId from @internal/test-util package
  • Updated all files in workspace-e2e to use test-utils uniqueId
  • Benchmarks use randomBytes for library name suffixes

CI Workflow Updates (.github/workflows/ci.yml):

  • Added set-e2e-benchmark-matrix job to configure OS matrix:
    • Ubuntu 24 Arm only for pull requests (fast feedback)
    • macOS, Windows, Ubuntu 24 Arm for push to main and workflow_dispatch
  • Updated e2e-benchmark job:
    • Uses matrix from set-e2e-benchmark-matrix
    • Configured with 140% performance threshold (fails if >40% slower)
    • Uses benchmark-action/github-action-benchmark@v1
    • Caches results per OS to track performance over time
    • Runs nx benchmark workspace-e2e (reused target name)
  • Updated test job to exclude \.bench\.ts$ files
  • Updated e2e job to exclude \.bench\.ts$ files only (simplified pattern)
  • Job runs on PR, push to main, and workflow_dispatch

Commands Verified Locally:

  • npx nx build workspace - builds successfully
  • npx nx benchmark workspace - runs micro-benchmarks
  • npx nx lint workspace-e2e - passes linting
  • npx nx format:check - formatting passes
  • ✅ YAML syntax validated with yaml-lint
  • npx nx benchmark workspace-e2e - testing in progress

Security:

  • ✅ Code review completed - no issues
  • ✅ CodeQL security scan completed - no alerts

How It Works

  1. E2E Benchmarks: Run in separate CI job
    • Ubuntu only for PRs (fast feedback)
    • All OSes (macOS, Windows, Ubuntu Arm) for merges and manual runs
  2. Benchmark Format: Uses tinybench with describe-it structure
  3. State Management: beforeEachIteration ensures files are reset BEFORE each benchmark iteration
  4. Regression Detection: Fails if performance drops >40% (140% threshold)
  5. No Side Effects: Benchmarks don't log or assert, only measure execution time
  6. Visibility: Uses stdio inherit to show progress during long-running operations
  7. Local Registry: Global setup starts Verdaccio and publishes @nxworker/workspace@e2e for testing
  8. Library Naming: Uses cryptographically secure randomBytes for short, unique library name suffixes
  9. Module Resolution: Uses test-utils package for shared uniqueId utility
Original prompt

This section details on the original issue you should resolve

<issue_title>E2E Benchmark regression testing </issue_title>
<issue_description>Now that we have benchmark regression testing in place for unit tests, we should change our performance benchmark and stress test e2e tests to run in a separate CI job on macOS latest, Windows latest, and Ubuntu 24 Arm.

This CI job should fail if performance drops by more than 40%. The CI job should be similar to the benchmark job but should be called e2e-benchmark. It should be part of ci.yml, not a separate workflow Current performance and stress tests from the e2e project should run in the e2e-benchmark job but no longer be run in the benchmark job.

The e2e benchmarks should be in the workspace-e2e project. The Jest and TypeScript configurations, nx.json as well as project target should follow the workspace project setup for benchmarks.

The benchmarks should use the same Jest-like describe-it structure as the workspace benchmarks.

If e2e benchmarks use create-nx-workspace, they should use the same version as the nx version currently installed.

Format the code. Verify commands used in CI jobs manually in your local environment, don't ask the user to run them or skip them.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add E2E benchmark regression testing in CI Add e2e benchmark regression testing with cross-platform CI Oct 29, 2025
Copilot AI requested a review from LayZeeDK October 29, 2025 15:29
Copilot finished work on behalf of LayZeeDK October 29, 2025 15:29
Copilot AI requested a review from LayZeeDK October 29, 2025 16:28
Copilot finished work on behalf of LayZeeDK October 29, 2025 16:28
Copilot finished work on behalf of LayZeeDK October 29, 2025 16:46
Copilot finished work on behalf of LayZeeDK October 29, 2025 16:50
@LayZeeDK LayZeeDK marked this pull request as ready for review October 29, 2025 17:14
@LayZeeDK LayZeeDK requested a review from Copilot October 29, 2025 20:42

This comment was marked as resolved.

@LayZeeDK

This comment was marked as resolved.

This comment was marked as resolved.

Copilot finished work on behalf of LayZeeDK October 29, 2025 21:06
@LayZeeDK

This comment was marked as resolved.

This comment was marked as resolved.

Copilot finished work on behalf of LayZeeDK October 29, 2025 22:01
@LayZeeDK

This comment was marked as resolved.

This comment was marked as resolved.

Copilot finished work on behalf of LayZeeDK October 29, 2025 22:28
@LayZeeDK

This comment was marked as resolved.

Copy link
Contributor Author

Copilot AI commented Oct 29, 2025

@copilot e2e benchmarks are still failing. You must run them locally to confirm that they are working after your changes.

...

Fixed in 24e8d87. The issue was using afterEachIteration instead of beforeEachIteration. Benchmarks run multiple iterations, and files need to be reset BEFORE each iteration starts, not after. Changed all e2e benchmarks to use beforeEachIteration which ensures files are always in the expected location when each benchmark iteration runs. The benchmarks now run successfully locally (verified with npx nx benchmark workspace-e2e).

Copilot finished work on behalf of LayZeeDK October 29, 2025 23:57
Copilot AI and others added 10 commits October 30, 2025 07:30
- Create jest-bench-e2e.config.ts for e2e benchmarks
- Add tsconfig.bench.json in workspace-e2e
- Add e2e-benchmark target to workspace-e2e project
- Update nx.json with e2e-benchmark target defaults
- Convert performance tests to benchmark format (.bench.ts)
- Add e2e-benchmark CI job (macOS, Windows, Ubuntu Arm)
- Configure 140% threshold for e2e benchmarks
- Update test/e2e jobs to exclude .bench.ts files

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
- Add tsconfig.bench.json reference to workspace-e2e tsconfig.json
- Configure eslint to allow benchmark files to import from tools
- Remove unused beforeEachIteration import

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Performance tests now only run in e2e-benchmark job, not in regular e2e job.
This prevents long-running performance tests from blocking regular CI.

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
- Replace lodash uniqueId with crypto-based uniqueId across codebase
- Remove old performance spec files (replaced by .bench.ts)
- Remove all logging and console output from benchmarks
- Remove assertions from benchmarks (benchmarks measure time only)
- Use stdio inherit for all execSync calls in benchmarks
- Add afterEachIteration to reset state between benchmark runs
- Reuse benchmark target name (not e2e-benchmark)
- Update nx.json to share benchmark target configuration
- Add matrix configuration for e2e-benchmark (Ubuntu only for PRs)
- Simplify e2e test ignore pattern (only .bench.ts needed)
- Add benchmarks/ directory to .gitignore

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Remove benchmarks/ from .gitignore - only ignore individual
benchmark result files (benchmark.json, *-benchmark.txt)

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Configure jest-bench-e2e.config.ts with globalSetup and
globalTeardown scripts to start local registry and publish
package before running benchmarks. This fixes the CI failure
where npm install @nxworker/workspace@e2e was failing.

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Replace randomUUID() with timestamp-based IDs (Date.now().toString(36))
for library names in benchmarks. UUIDs were too long and contained
characters that caused issues with Nx library path resolution.
Timestamp-based IDs are shorter, still unique per benchmark run,
and avoid path-related issues.

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Replace Date.now().toString(36) with randomBytes(4).toString('hex')
for library name suffixes in benchmarks. randomBytes provides
cryptographically secure random data, is shorter (8 chars),
and avoids potential timestamp collisions.

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
…tion

Replace afterEachIteration with beforeEachIteration to ensure files
are reset BEFORE each benchmark iteration runs, not after. This
ensures files are always in the expected location when the benchmark
starts, preventing "file not found" errors during warmup and actual
benchmark runs.

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
@LayZeeDK LayZeeDK force-pushed the copilot/add-e2e-benchmark-ci-job branch from 24e8d87 to 9daba45 Compare October 30, 2025 06:34
@LayZeeDK
Copy link
Member

@copilot Rebased with new uniqueId function. Failure:

FAIL e2e-benchmarks packages/workspace-e2e/src/performance-stress-test.bench.ts
● Test suite failed to run

Cannot find module '@internal/test-util' from 'packages/workspace-e2e/src/performance-stress-test.bench.ts'

4 | describe,
5 | it,

6 | } from '../../../tools/tinybench-utils';
| ^
7 | import { uniqueId } from '@internal/test-util';
8 | import { execSync } from 'node:child_process';
9 | import { join, dirname } from 'node:path';

at Resolver._throwModNotFoundError (node_modules/jest-resolve/build/resolver.js:427:11)
at Object. (packages/workspace-e2e/src/performance-stress-test.bench.ts:6:19)

Test Suites: 2 failed, 2 total
Tests: 0 total
Snapshots: 0 total
Time: 0.512 s
Ran all test suites.

- Add moduleNameMapper to jest-bench-e2e.config.ts to resolve @internal/test-util
- Keep beforeEachIteration import (it's actually used in the code)
- The benchmarks use uniqueId from the new test-utils package

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Oct 30, 2025

@copilot Rebased with new uniqueId function. Failure:

FAIL e2e-benchmarks packages/workspace-e2e/src/performance-stress-test.bench.ts
...

Fixed in 709d580. Added moduleNameMapper to jest-bench-e2e.config.ts to resolve the @internal/test-util path alias to the test-utils package. Also kept the beforeEachIteration import since it's actually used in the benchmark code to reset file state between iterations.

Copilot finished work on behalf of LayZeeDK October 30, 2025 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

E2E Benchmark regression testing

2 participants