feat: Add performance testing tools and configurations #225

frontegg-david · 2026-01-29T00:35:18Z

Summary by CodeRabbit

New Features
- Full performance testing suite with metrics, leak & parallel-leak checks, regression detection, baseline handling, and consolidated JSON/Markdown reports.
- EventStore support for SSE resumability (in-memory and Redis options).
- Demo "reset" tools added to several demo apps for deterministic runs.
Tests
- New per-project performance test targets and extensive end-to-end perf suites across many demo apps.
Chores
- CI workflows, scripts, and npm scripts to fetch/publish baselines and consolidate/upload perf reports; perf-results ignored in VCS.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-29T00:35:39Z

📝 Walkthrough

Walkthrough

Adds a performance-testing subsystem (types, metrics, leak detection, fixtures, reporter, baseline store, report generator), per-demo Jest perf configs and Nx test:perf targets, EventStore implementations and wiring (memory/Redis), CI workflows and scripts for fetching/publishing baselines, many e2e perf tests, and related testing-library exports/refactors.

Changes

Cohort / File(s)	Summary
CI workflows & scripts `/.github/workflows/perf.yml`, `/.github/workflows/perf-release.yml`, `scripts/perf-fetch-baseline.mjs`, `scripts/perf-publish-baseline.mjs`, `package.json`, `/.gitignore`	Adds perf CI matrix and publish flow, scripts to fetch/publish baselines, npm scripts (test:perf, perf:fetch-baseline, perf:publish-baseline), and ignores `perf-results/`.
Perf framework core `libs/testing/src/perf/...`, `libs/testing/src/index.ts`, `libs/testing/src/perf/index.ts`, `libs/testing/src/index.ts`	New comprehensive perf API: types, metrics collector, GC helpers, leak detector (sync/async/parallel), perf fixtures, perfTest harness, global measurement aggregation and exports.
Baseline, regression & reporting `libs/testing/src/perf/baseline-store.ts`, `libs/testing/src/perf/regression-detector.ts`, `libs/testing/src/perf/report-generator.ts`, `libs/testing/src/perf/jest-perf-reporter.ts`, `libs/testing/src/perf/jest-perf-reporter.js`	Baseline storage/formatting, regression detection, report generation (JSON/MD), and a Jest reporter (TS + CJS wrapper) that emits artifacts and can fail on regressions.
Demo e2e perf projects & tests `apps/e2e/demo-e2e-/jest.perf.config.ts`, `apps/e2e/demo-e2e-/project.json`, `apps/e2e/demo-e2e-/e2e/.perf.test.ts`	Adds per-demo Jest perf configs, adds Nx `test:perf` targets and many perf test suites (basic, leak, parallel) across demo apps.
EventStore & transport integration `libs/sdk/src/transport/event-stores/`, `libs/sdk/src/transport/index.ts`, `libs/sdk/src/common/types/options/transport/`, `libs/sdk/src/scope/scope.instance.ts`, `libs/sdk/src/transport/adapters/*`	Introduces EventStore factory, Memory and Redis EventStore implementations, schema/types, wires eventStore into Scope and streamable transport, removes prior inline in-memory store.
EventStore tests & infra `libs/sdk/src/transport/event-stores/__tests__/*`, `libs/sdk/src/transport/event-stores/memory.event-store.ts`, `.../redis.event-store.ts`, `.../event-store.factory.ts`, `.../index.ts`	Adds unit tests and production-ready Memory/Redis stores, factory, and centralized exports.
Testing library exports & refactors `libs/testing/src/index.ts`, `libs/testing/project.json`, `libs/testing/src/auth/`, `libs/testing/src/client/`, `libs/testing/src/transport/`, `libs/testing/src/fixtures/`	Re-exports perf APIs, adds perf build entrypoints, defensive guards, lazy crypto loading, centralized transport accessor, safer SSE parsing, and minor test fixes.
Demo reset tools `apps/e2e/demo-e2e-public/.../notes-reset.tool.ts`, `apps/e2e/demo-e2e-public/.../index.ts`, `apps/e2e/demo-e2e-codecall/.../crm-reset.tool.ts`, `.../index.ts`	Adds reset tools (NotesResetTool, CrmResetTool) and registers them in demo apps for test setup/reset flows.
Small edits / housekeeping `apps/e2e/demo-e2e-codecall/e2e/codecall.e2e.test.ts`, `apps/e2e/demo-e2e-serverless/src/.../cold-start-test.tool.ts`	Adjusts a test expectation (+1 tool) and shortens a simulated cold-start delay.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(50,168,82,0.5)
    participant GH as GitHub Actions
  end
  rect rgba(66,133,244,0.5)
    participant Runner as Runner
  end
  rect rgba(244,180,0,0.5)
    participant Matrix as Perf Matrix Jobs
  end
  rect rgba(170,0,255,0.5)
    participant Artifacts as Artifact Storage
  end
  rect rgba(255,87,34,0.5)
    participant Publish as Publish Job / Scripts
  end
  rect rgba(218,68,83,0.5)
    participant GHRelease as GitHub Release API
  end

  GH->>Runner: trigger perf workflow (push/PR/dispatch)
  Runner->>Matrix: start per-project jobs (matrix)
  Matrix-->>Artifacts: upload per-project perf reports
  Runner->>Publish: start publish-baseline job (on success)
  Publish->>Artifacts: download all perf artifacts
  Publish->>Publish: merge reports → baseline.json
  Publish->>GHRelease: fetch release, update body with baseline
  GHRelease-->>Publish: release updated
  Publish-->>Artifacts: upload consolidated perf-baseline artifact

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: fix circular deps in @FrontMcp #165 — overlaps transport/event-store wiring and SSE/session recreation changes.
feat: Migrate transport configuration to a unified structure with Redis support #115 — related EventStore implementations and transport integration (Redis/memory).
feat: add initial implementation of testing library with Jest integration #82 — related testing-library exports and perf API surface expansion.

Poem

🐰 I hopped through heaps and time,
Counting bytes and requests in rhyme,
Baselines tucked in release notes bright,
Reports and artifacts wrapped up tight,
A rabbit cheers — perf tests take flight! 🎉

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Add performance testing tools and configurations' accurately and concisely describes the main change: introducing a comprehensive performance testing framework with tools, utilities, configurations, and workflows across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 83.67% which is sufficient. The required threshold is 65.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add-performance-tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

.github/workflows/perf-release.yml

.github/workflows/perf.yml

coderabbitai

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents

In @.github/workflows/perf.yml:
- Around line 18-21: Add an explicit top-level permissions block to restrict
GITHUB_TOKEN for this workflow (the job named "perf") instead of relying on
defaults; determine which scopes "perf" actually needs (e.g., contents: read,
actions: read, id-token: write, or others) and add only those minimal entries
under a top-level permissions: key so the workflow no longer runs with broad
default permissions.

In `@apps/e2e/demo-e2e-redis/e2e/redis-stress.perf.test.ts`:
- Around line 104-134: The test is calling non-existent tools vault-store and
vault-retrieve; either implement those tool handlers or update the test to use
the VaultApp exported tool names. Replace vault-store with create-vault-entry
(or update-vault-entry/add-credential as appropriate) and replace vault-retrieve
with get-vault-entry (or get-credentials) in the stress test (and other tests
referencing vault-store/vault-retrieve), or add new tool registrations named
exactly vault-store and vault-retrieve that delegate to the existing VaultApp
methods so the mcp.tools.call invocations succeed.

In `@libs/testing/src/perf/baseline-store.ts`:
- Around line 95-106: In createFromMeasurements, remove the non-null assertion
on grouped.get(key)!; instead safely retrieve the array from the Map using
grouped.get(key) into a local variable (e.g., const arr = grouped.get(key)), if
it is undefined initialize a new array and set it back into grouped
(grouped.set(key, arr)), then push the measurement into that array—this avoids
the `!` operator while keeping the same grouping logic using the grouped Map and
key variables.

In `@libs/testing/src/perf/index.ts`:
- Around line 41-154: The PR exposes many public perf testing APIs (e.g.,
perfTest, PerfFixtures/PerfTestFixtures,
PerfBaseline/TestBaseline/MetricBaseline, LeakDetector, MetricsCollector,
RegressionDetector, ReportGenerator, BaselineStore and related types exported
from libs/testing/src/perf/index.ts) but no docs exist under
docs/draft/docs/testing/; add documentation pages in that folder that describe
each public API, its purpose, key methods/types, basic usage examples, and links
between related items (metrics-collector, leak-detector, perf-fixtures,
baseline-store, regression-detector, report-generator); ensure exported type
names (PerfTestFn, PerfFixturesImpl, PerfBaseline, MetricBaseline, LeakDetector,
MetricsCollector, RegressionDetector, ReportGenerator, BaselineStore) are
covered and include code snippets for common workflows (creating fixtures,
running perfTest, using baseline store and generating reports).

In `@libs/testing/src/perf/leak-detector.ts`:
- Around line 347-466: generateIntervals currently allows intervalSize <= 0
which can produce Infinity/loops; validate and guard at the start of
generateIntervals (or where it's called from analyzeLeakPattern) to ensure
intervalSize is a positive integer (e.g., if intervalSize <= 0 then set
intervalSize = 1 or throw a clear error), and adjust the calculation of
numIntervals = Math.ceil(samples.length / intervalSize) accordingly so
subsequent loops cannot become infinite; update any callers (analyzeLeakPattern)
if you choose to coerce the value so behavior is explicit.
- Around line 223-306: detectLeakParallel can leave clients connected if a
worker Promise rejects because the Promise.all that runs operations can
short-circuit before the disconnect block; wrap the parallel worker execution in
a try/finally (or use Promise.allSettled) so that the clients cleanup always
runs regardless of failures. Specifically, around the operations.map Promise.all
that produces workerResults, ensure you catch errors only to allow the finally
block to run the disconnect loop over the clients array (referencing clients and
client.disconnect) and then rethrow the original error so callers still observe
the failure.
- Around line 167-192: In detectLeakSync, remove the non-null assertions on
global.gc and instead cache the GC function once: if forceGc is true call
isGcAvailable(); if it returns false throw a clear error (e.g., "GC requested
but not available"); otherwise store const gc = global.gc and call gc() where
needed (replace global.gc!() calls). Reference detectLeakSync, isGcAvailable and
global.gc when making the change so all GC invocations use the cached gc
function and proper error handling instead of '!' assertions.

In `@libs/testing/src/perf/perf-setup.ts`:
- Around line 37-55: The beforeAll mocks console.log/info/debug but afterAll
mistakenly reassigns them to themselves; declare and capture originalLog,
originalInfo, originalDebug (analogous to originalWarn/originalError) before
replacing them, use jest.fn() in beforeAll to mock, and then restore the
originals in afterAll (restore originalLog -> console.log, originalInfo ->
console.info, originalDebug -> console.debug) while keeping the existing
originalWarn/originalError handling intact; update the variables near where
originalWarn and originalError are declared and reference the same
beforeAll/afterAll blocks.

In `@libs/testing/src/perf/perf-test.ts`:
- Around line 163-191: The createTestFixtures function uses non-null assertions
on serverInstance and tokenFactory (used in McpTestClient.create and
createAuthFixture) which can be null after initializeSharedResources; replace
these assertions with explicit null checks and throw clear, specific errors if
serverInstance or tokenFactory are missing (e.g., "serverInstance not
initialized, missing baseUrl/server config" and "tokenFactory not initialized");
ensure you validate currentConfig.baseUrl/server before calling
McpTestClient.create and validate tokenFactory before calling createAuthFixture,
and only call perfImpl.baseline when perfImpl exists—update createTestFixtures
to perform these guards and surface descriptive exceptions instead of relying on
!.

In `@scripts/perf-fetch-baseline.mjs`:
- Around line 21-90: The gh API call in getReleaseComments currently constructs
a shell command via exec('gh api repos/${owner}/${repo}/releases/tags/${tag}'),
which allows command injection if tag is untrusted; change this to a
shell-argument-safe call (e.g., import and use execFileSync or spawnSync and
pass ['api','repos',owner+'/'+repo,'releases','tags',tag] as separate args) or
validate/normalize tag against a strict regex before use, and update error
handling in getReleaseComments (and any other places that call exec with
user-supplied values) to handle null/throwing results consistently.

In `@scripts/perf-publish-baseline.mjs`:
- Line 219: The code interpolates the user-provided tag directly into shell
commands (see the exec call that creates releaseJson and the later exec usage at
the other occurrence), which allows command injection; create and use a
sanitization/escaping helper (e.g., sanitizeShellArg or escapeShellArg) and call
it on tag before passing into exec (replace direct `${tag}` with the sanitized
value), and apply the same change for the other exec invocation at the later
occurrence so all shell commands use the escaped tag.
- Around line 258-262: The code is using require('fs').writeFileSync and
require('fs').unlinkSync in an ESM .mjs file; replace those calls to use the
already-imported writeFileSync and unlinkSync functions (used where tmpFile is
written and later unlinked around the exec(`gh release edit "${tag}"
--notes-file "${tmpFile}"`) call) and ensure the top-level import includes
writeFileSync and unlinkSync if not already present.

🟡 Minor comments (16)

scripts/perf-fetch-baseline.mjs-199-202 (1)
199-202: Guard against missing tests in baseline payload.
If the baseline JSON lacks a tests key, this log line will throw and mask an otherwise successful fetch.
✅ Safer logging
-  console.log(`[perf-fetch-baseline] Tests: ${Object.keys(baseline.tests).length}`);
+  const testCount = baseline?.tests ? Object.keys(baseline.tests).length : 0;
+  console.log(`[perf-fetch-baseline] Tests: ${testCount}`);
apps/e2e/demo-e2e-errors/e2e/errors-parallel.perf.test.ts-49-55 (1)
49-55: Assert expected errors instead of silently swallowing them.
If throw-validation-error stops throwing, these tests will still pass because the catch blocks ignore the outcome. Please fail the iteration when no error is raised.
💡 Proposed fix (apply in both error-handling blocks)
-          try {
-            await client.tools.call('throw-validation-error', {
-              field: `field-${counter++}`,
-            });
-          } catch {
-            // Expected to throw
-          }
+          let didThrow = false;
+          try {
+            await client.tools.call('throw-validation-error', {
+              field: `field-${counter++}`,
+            });
+          } catch {
+            didThrow = true;
+          }
+          if (!didThrow) {
+            throw new Error('Expected throw-validation-error to throw');
+          }
-            try {
-              await client.tools.call('throw-validation-error', { field: 'test' });
-            } catch {
-              // Expected to throw
-            }
+            let didThrow = false;
+            try {
+              await client.tools.call('throw-validation-error', { field: 'test' });
+            } catch {
+              didThrow = true;
+            }
+            if (!didThrow) {
+              throw new Error('Expected throw-validation-error to throw');
+            }
Also applies to: 86-90
scripts/perf-publish-baseline.mjs-127-133 (1)
127-133: Add defensive checks when merging report summaries.

The code assumes each report has a valid summary object with all required numeric fields. If a malformed report is encountered (e.g., from a corrupted file or schema change), this will throw an undefined property access error.
🛡️ Proposed defensive fix
   for (const report of reports) {
+    if (!report.summary) {
+      console.warn(`Report missing summary, skipping`);
+      continue;
+    }
-    merged.summary.totalTests += report.summary.totalTests;
-    merged.summary.passedTests += report.summary.passedTests;
-    merged.summary.warningTests += report.summary.warningTests;
-    merged.summary.failedTests += report.summary.failedTests;
-    merged.summary.leakTests += report.summary.leakTests;
+    merged.summary.totalTests += report.summary.totalTests ?? 0;
+    merged.summary.passedTests += report.summary.passedTests ?? 0;
+    merged.summary.warningTests += report.summary.warningTests ?? 0;
+    merged.summary.failedTests += report.summary.failedTests ?? 0;
+    merged.summary.leakTests += report.summary.leakTests ?? 0;
     merged.projects.push(...report.projects);
   }
apps/e2e/demo-e2e-standalone/e2e/standalone-stress.perf.test.ts-40-59 (1)
40-59: Test "mixed operations" only calls parent-hello - same as the first test.

This test is named "mixed operations" but only invokes parent-hello, making it a duplicate of the first test. Compare with the UI stress test file which properly mixes different operations (html-table, markdown-report, static-badge).

Consider either:

Actually mixing different operations (e.g., alternating between parent-hello and tools.list)

Renaming if this is intentionally testing the same operation under different conditions
🔧 Suggested fix to actually mix operations
  perfTest('stress test: 1000 mixed operations', async ({ mcp, perf }) => {
    let callIndex = 0;
    const result = await perf.checkLeak(
      async () => {
-       await mcp.tools.call('parent-hello', { name: `User ${callIndex++}` });
+       if (callIndex % 2 === 0) {
+         await mcp.tools.call('parent-hello', { name: `User ${callIndex}` });
+       } else {
+         await mcp.tools.list();
+       }
+       callIndex++;
      },
.github/workflows/perf-release.yml-1-16 (1)
1-16: Add explicit GITHUB_TOKEN permissions.

The workflow currently relies on default token permissions. Add an explicit permissions block scoped to only what's needed: contents: write for the release body update performed by perf:publish-baseline.
 name: Publish Performance Baseline
+
+permissions:
+  contents: write
 
 on:
   release:
     types: [published]
(Note: actions: write is not required—actions/upload-artifact does not use GITHUB_TOKEN or the GitHub API.)
apps/e2e/demo-e2e-cache/e2e/cache-basic.perf.test.ts-75-81 (1)

75-81: The calculated duration values are not being used for performance assertions.

The perf.measure() API only accepts a label parameter and captures system metrics (memory, CPU). While nonCachedDuration and cachedDuration are correctly calculated, they're only logged but never used for assertions or comparisons. Consider either:

Removing the manual duration calculations if not needed

Using these values with perf.assertThresholds() to validate performance expectations

Capturing and comparing the PerfSnapshot objects returned by measure() instead
apps/e2e/demo-e2e-remember/e2e/remember-stress.perf.test.ts-94-126 (1)
94-126: Seed keys for recall/check in mixed operations.
If recall-value or check-memory errors on missing keys, this test will exercise error paths and skew perf metrics or fail. Seed keys and keep the keyspace consistent.
🔧 Suggested fix
   perfTest('stress test: 1000 mixed memory operations', async ({ mcp, perf }) => {
+    const seedCount = 100;
+    for (let i = 0; i < seedCount; i++) {
+      await mcp.tools.call('remember-value', {
+        key: `mix-key-${i}`,
+        value: `mix-val-${i}`,
+        scope: 'session',
+      });
+    }
+
     let callIndex = 0;
     const result = await perf.checkLeak(
       async () => {
         const op = callIndex++ % 4;
         if (op === 0) {
           await mcp.tools.call('remember-value', {
-            key: `mix-key-${callIndex}`,
+            key: `mix-key-${callIndex % seedCount}`,
             value: `mix-val-${callIndex}`,
             scope: 'session',
           });
         } else if (op === 1) {
-          await mcp.tools.call('recall-value', { key: `mix-key-${callIndex % 100}` });
+          await mcp.tools.call('recall-value', { key: `mix-key-${callIndex % seedCount}` });
         } else if (op === 2) {
           await mcp.tools.call('list-memories', {});
         } else {
-          await mcp.tools.call('check-memory', { key: `mix-key-${callIndex % 100}` });
+          await mcp.tools.call('check-memory', { key: `mix-key-${callIndex % seedCount}` });
         }
       },
apps/e2e/demo-e2e-public/e2e/public-stress.perf.test.ts-15-39 (1)
15-39: Reset the store before the create‑note stress run for isolation.

If the store already contains data (from earlier runs or parallel suites), the leak profile and throughput can be skewed. Resetting upfront makes the test deterministic, like the other scenarios.
✅ Suggested fix
   perfTest('stress test: 1000 create-note operations', async ({ mcp, perf }) => {
+    await mcp.tools.call('notes-reset', {});
     let counter = 0;
     const result = await perf.checkLeak(
.github/workflows/perf.yml-96-103 (1)
96-103: Hardcoded directory list is incomplete and will miss most app reports.

The fallback list only includes demo-e2e-cache, demo-e2e-skills, and demo-e2e-redis, but this PR adds many more apps (transport-recreation, config, codecall, notifications, multiapp, etc.). Consider dynamically discovering directories.
🛠️ Suggested dynamic discovery
-              const dirs = ['demo-e2e-cache', 'demo-e2e-skills', 'demo-e2e-redis'];
-              for (const dir of dirs) {
-                const path = `perf-results/${dir}/report.md`;
-                if (fs.existsSync(path)) {
-                  report += fs.readFileSync(path, 'utf8') + '\n\n---\n\n';
+              const perfResultsDir = 'perf-results';
+              if (fs.existsSync(perfResultsDir)) {
+                const dirs = fs.readdirSync(perfResultsDir, { withFileTypes: true })
+                  .filter(d => d.isDirectory())
+                  .map(d => d.name);
+                for (const dir of dirs) {
+                  const path = `${perfResultsDir}/${dir}/report.md`;
+                  if (fs.existsSync(path)) {
+                    report += fs.readFileSync(path, 'utf8') + '\n\n---\n\n';
+                  }
                 }
               }
.github/workflows/perf.yml-63-67 (1)
63-67: Silent failure masking may hide test failures.

Using || true swallows all errors, making it impossible to detect when individual app tests fail. Consider capturing exit codes and failing the workflow if any test fails, or at minimum logging which apps failed.
🛠️ Suggested improvement
+            FAILED_APPS=""
             for app in $APPS; do
-              yarn nx run $app:test:perf || true
+              if ! yarn nx run $app:test:perf; then
+                FAILED_APPS="$FAILED_APPS $app"
+              fi
             done
+            if [ -n "$FAILED_APPS" ]; then
+              echo "::warning::Some apps failed:$FAILED_APPS"
+            fi
libs/testing/src/index.ts-271-351 (1)
271-351: Add missing resetBaselineStore export to the barrel file.

The exports are well-organized with clear section headers, but resetBaselineStore is missing. This function is exported from the perf module and is part of the public API surface needed for test cleanup between runs. It should be included in the Baseline Storage section alongside BaselineStore, parseBaselineFromComment, formatBaselineAsComment, and getBaselineStore.
Affected export section
  // Baseline Storage
  BaselineStore,
  parseBaselineFromComment,
  formatBaselineAsComment,
  getBaselineStore,
  resetBaselineStore,  // ← Add this
libs/testing/src/perf/report-generator.ts-289-301 (1)
289-301: Treat 0 CPU time as a valid value.
The current truthy check hides legitimate zero measurements.
Proposed fix
-      const cpuTime = m.final?.cpu.total ? formatMicroseconds(m.final.cpu.total) : 'N/A';
+      const cpuTotal = m.final?.cpu.total;
+      const cpuTime = cpuTotal !== undefined ? formatMicroseconds(cpuTotal) : 'N/A';
libs/testing/src/perf/baseline-store.ts-60-66 (1)
60-66: Handle baseline paths without a directory component.
If baselinePath has no /, substring(0, lastIndexOf('/')) yields '' and ensureDir('') can fail.
Proposed fix
-    const dir = this.baselinePath.substring(0, this.baselinePath.lastIndexOf('/'));
-    await ensureDir(dir);
+    const lastSlash = this.baselinePath.lastIndexOf('/');
+    const dir = lastSlash > 0 ? this.baselinePath.slice(0, lastSlash) : '.';
+    if (dir !== '.') {
+      await ensureDir(dir);
+    }
libs/testing/src/perf/report-generator.ts-187-195 (1)
187-195: Differentiate regression vs improvement markers.
Both branches use !, so improvements are indistinguishable in Markdown.
Proposed fix
-          const emoji = r.status === 'regression' ? '!' : '!';
+          const emoji = r.status === 'regression' ? '!' : '~';
libs/testing/src/perf/baseline-store.ts-32-39 (1)

32-39: Add documentation for new performance baseline APIs.

The following public APIs under libs/testing/src/perf/ are newly exported but lack documentation in docs/draft/docs:

BaselineStore class

getBaselineStore(), resetBaselineStore() functions

parseBaselineFromComment(), formatBaselineAsComment(), buildReleaseCommentsUrl() helpers

Please add documentation in docs/draft/docs/testing/ (avoid direct edits under docs/docs/**).
libs/testing/src/perf/perf-fixtures.ts-35-41 (1)
35-41: Remove the redundant type annotation flagged by ESLint.

The number type on private startTime: number = 0; is inferrable from the literal 0 initializer and the no-inferrable-types rule will flag this. TypeScript can automatically infer the type, so the explicit annotation is unnecessary.
🔧 Proposed fix
-  private startTime: number = 0;
+  private startTime = 0;

🧹 Nitpick comments (22)

apps/e2e/demo-e2e-standalone/e2e/standalone-parallel.perf.test.ts (1)
18-31: Extract shared perf constants to avoid drift and keep worker IDs aligned.

Line 18 hard-codes 1000, and the perf params are duplicated in both tests. Hoisting these into shared constants keeps the counter base in sync with iterations and makes tuning safer.
♻️ Suggested refactor
 import { perfTest, expect } from '@frontmcp/testing';

+const WORKERS = 5;
+const ITERATIONS = 1000;
+const WARMUP_ITERATIONS = 10;
+const INTERVAL_SIZE = 200;
+const MEMORY_THRESHOLD_BYTES = 200 * 1024 * 1024;
+const MIN_RPS = 200;
+const MAX_GROWTH_BYTES = 200 * 1024;
+
 perfTest.describe('Standalone Parallel Stress Testing', () => {
   perfTest.use({
     server: 'apps/e2e/demo-e2e-standalone/src/main.ts',
     project: 'demo-e2e-standalone',
     publicMode: true,
   });

   perfTest('parallel stress: 5000 total parent-hello operations', async ({ perf, server }) => {
     const result = await perf.checkLeakParallel(
       (client, workerId) => {
-        let counter = workerId * 1000;
+        let counter = workerId * ITERATIONS;
         return async () => {
           await client.tools.call('parent-hello', {
             name: `User ${counter++}`,
           });
         };
       },
       {
-        iterations: 1000,
-        workers: 5,
-        threshold: 200 * 1024 * 1024, // 200MB for 5000 total operations
-        warmupIterations: 10,
-        intervalSize: 200,
+        iterations: ITERATIONS,
+        workers: WORKERS,
+        threshold: MEMORY_THRESHOLD_BYTES, // 200MB for 5000 total operations
+        warmupIterations: WARMUP_ITERATIONS,
+        intervalSize: INTERVAL_SIZE,
         clientFactory: () => server.createClient(),
       },
     );

     console.log(
       `[PARALLEL] parent-hello: ${result.totalRequestsPerSecond.toFixed(1)} req/s total ` +
         `(${result.workersUsed} workers × ${result.totalIterations / result.workersUsed} iterations)`,
     );

-    expect(result.totalRequestsPerSecond).toBeGreaterThan(200);
-    expect(result.growthRate).toBeLessThan(200 * 1024);
+    expect(result.totalRequestsPerSecond).toBeGreaterThan(MIN_RPS);
+    expect(result.growthRate).toBeLessThan(MAX_GROWTH_BYTES);
   });

   perfTest('parallel stress: 5000 total tool listings', async ({ perf, server }) => {
     const result = await perf.checkLeakParallel(
       (client) => async () => {
         await client.tools.list();
       },
       {
-        iterations: 1000,
-        workers: 5,
-        threshold: 200 * 1024 * 1024,
-        warmupIterations: 10,
-        intervalSize: 200,
+        iterations: ITERATIONS,
+        workers: WORKERS,
+        threshold: MEMORY_THRESHOLD_BYTES,
+        warmupIterations: WARMUP_ITERATIONS,
+        intervalSize: INTERVAL_SIZE,
         clientFactory: () => server.createClient(),
       },
     );

     console.log(
       `[PARALLEL] tools.list: ${result.totalRequestsPerSecond.toFixed(1)} req/s total ` +
         `(${result.workersUsed} workers)`,
     );

-    expect(result.totalRequestsPerSecond).toBeGreaterThan(200);
-    expect(result.growthRate).toBeLessThan(200 * 1024);
+    expect(result.totalRequestsPerSecond).toBeGreaterThan(MIN_RPS);
+    expect(result.growthRate).toBeLessThan(MAX_GROWTH_BYTES);
   });
 });
Also applies to: 40-41, 50-55, 64-65
apps/e2e/demo-e2e-public/project.json (1)
53-59: Consider failing the perf target when no tests are matched.
passWithNoTests: true can hide misconfigurations or missing perf tests; for perf runs it’s usually better to fail fast.
🔧 Suggested change
-        "passWithNoTests": true
+        "passWithNoTests": false
apps/e2e/demo-e2e-notifications/e2e/notifications-parallel.perf.test.ts (1)

1-5: Header comment mentions 4 test cases but only 3 are implemented.

The file header references four parallel test cases including trigger-resource-change, but only three tests exist in the file. Either add the missing test or update the header to reflect the actual content.
libs/testing/src/perf/regression-detector.ts (2)
103-131: Remove or use the unused _formatter parameter.

The _formatter parameter is declared but never used in the method body. The underscore prefix indicates intentional omission, but the formatters are imported at line 14, suggesting this might have been intended for use in building human-readable output.

Consider either removing the parameter or using it to format values in the returned MetricRegression object.
♻️ Option 1: Remove unused parameter
   private checkMetric(
     name: string,
     baseline: number,
     current: number,
-    _formatter: (value: number) => string,
   ): MetricRegression {
And update the call sites (lines 49, 53-58, 63) to remove the formatter argument.
♻️ Option 2: Use formatter in output (if human-readable values are desired)
   private checkMetric(
     name: string,
     baseline: number,
     current: number,
-    _formatter: (value: number) => string,
+    formatter: (value: number) => string,
   ): MetricRegression {
     const absoluteChange = current - baseline;
     const changePercent = baseline > 0 ? (absoluteChange / baseline) * 100 : 0;
     // ... status logic ...
     return {
       metric: name,
       baseline,
       current,
+      baselineFormatted: formatter(baseline),
+      currentFormatted: formatter(current),
       changePercent,
       absoluteChange,
       status,
     };
   }
Note: This would require updating the MetricRegression type in types.ts.
215-225: Singleton getter always recreates instance when config is provided.

The condition if (!globalDetector || config) means passing any config (even an empty object {}) will always create a new instance, potentially surprising callers who expect to get the existing singleton with updated config.
♻️ Consider explicit reset or merge behavior

If the intent is to allow config updates, consider documenting this behavior or providing a separate updateConfig method. If the intent is one-time initialization:
 export function getRegressionDetector(config?: RegressionConfig): RegressionDetector {
-  if (!globalDetector || config) {
+  if (!globalDetector) {
     globalDetector = new RegressionDetector(config);
+  } else if (config) {
+    console.warn('RegressionDetector already initialized; ignoring new config. Use resetRegressionDetector() first.');
   }
   return globalDetector;
 }
apps/e2e/demo-e2e-cimd/e2e/cimd-stress.perf.test.ts (1)
17-37: Consider extracting repeated configuration constants.

The same configuration values (iterations: 1000, threshold: 100 * 1024 * 1024, warmupIterations: 20, intervalSize: 100, growth rate limit 200 * 1024) are duplicated across all four tests. Extracting these to named constants at the top of the file would improve maintainability and make threshold adjustments easier.
♻️ Suggested refactor
+const STRESS_CONFIG = {
+  iterations: 1000,
+  threshold: 100 * 1024 * 1024, // 100MB
+  warmupIterations: 20,
+  intervalSize: 100,
+} as const;
+
+const MAX_GROWTH_RATE = 200 * 1024; // 200KB per iteration
+
 perfTest.describe('CIMD Sequential Stress Testing', () => {
   // ... then use STRESS_CONFIG and MAX_GROWTH_RATE in tests
apps/e2e/demo-e2e-cimd/e2e/cimd-parallel.perf.test.ts (1)
17-121: Verify growthRate units and consider making thresholds configurable to reduce CI flakiness.
threshold is in bytes (200 * 1024 * 1024), while growthRate is compared to 200 * 1024. If growthRate is also bytes, the limit is 200KB and may be unintentionally strict; if it’s KB, this is fine. Also, fixed RPS thresholds can be unstable across CI runners—consider env overrides or constants to clarify intent.

Run this to confirm the unit semantics and avoid false positives:
#!/bin/bash
# Locate checkLeakParallel implementation and growthRate units.
rg -n "checkLeakParallel" -S --type=ts -C3
rg -n "growthRate" -S --type=ts -C3
♻️ Optional: make limits explicit/configurable (apply pattern across tests)
+const PERF_MIN_RPS = Number(process.env.PERF_MIN_RPS ?? 200);
+const PERF_GROWTH_RATE_MAX = Number(process.env.PERF_GROWTH_RATE_MAX ?? 200 * 1024);
+const PERF_LEAK_THRESHOLD_BYTES = Number(process.env.PERF_LEAK_THRESHOLD_BYTES ?? 200 * 1024 * 1024);
+const PERF_WORKERS = Number(process.env.PERF_WORKERS ?? 5);
+const PERF_ITERATIONS = Number(process.env.PERF_ITERATIONS ?? 1000);
 
   perfTest('parallel stress: 5000 total tools.list operations', async ({ perf, server }) => {
     const result = await perf.checkLeakParallel(
       (client) => async () => {
         await client.tools.list();
       },
       {
-        iterations: 1000,
-        workers: 5,
-        threshold: 200 * 1024 * 1024, // 200MB for 5000 total operations
+        iterations: PERF_ITERATIONS,
+        workers: PERF_WORKERS,
+        threshold: PERF_LEAK_THRESHOLD_BYTES,
         warmupIterations: 10,
         intervalSize: 200,
         clientFactory: () => server.createClient(),
       },
     );
@@
-    expect(result.totalRequestsPerSecond).toBeGreaterThan(200);
-    expect(result.growthRate).toBeLessThan(200 * 1024);
+    expect(result.totalRequestsPerSecond).toBeGreaterThan(PERF_MIN_RPS);
+    expect(result.growthRate).toBeLessThan(PERF_GROWTH_RATE_MAX);
   });
apps/e2e/demo-e2e-redis/e2e/redis-leak.perf.test.ts (1)

62-85: Mixed operations test intentionally reuses keys - verify this is the intended behavior.

The key generation counter++ % 10 causes key reuse (only 10 unique keys cycled), while the first two tests use unique keys per iteration. This is likely intentional to test memory behavior under key reuse scenarios, but it's worth confirming this asymmetry is deliberate.

Additionally, note that get-session-data (line 74) may fail on first access if the key hasn't been set yet (when counter % 3 === 2 occurs before counter % 3 === 1 for the same key).
apps/e2e/demo-e2e-agents/e2e/agents-stress.perf.test.ts (1)
40-61: Confusing expression construction due to counter increment side effect.

Line 45 uses counter both before and after the post-increment in the same template literal:
expression: `${counter} + ${counter++ % 100}`,
Due to left-to-right evaluation, this produces expressions like "0 + 0", "1 + 1", "2 + 2", etc. (always n + n). If the intent was to create varied expressions, consider separating the increment:
Suggested clarification
  perfTest('stress test: 1000 calculator-agent operations', async ({ mcp, perf }) => {
    let counter = 0;
    const result = await perf.checkLeak(
      async () => {
+        const a = counter++;
+        const b = counter % 100;
        await mcp.tools.call('calculator-agent', {
-          expression: `${counter} + ${counter++ % 100}`,
+          expression: `${a} + ${b}`,
        });
      },
apps/e2e/demo-e2e-codecall/jest.perf.config.ts (1)

1-52: Consider extracting a shared perf Jest base config to reduce duplication.

These perf configs appear nearly identical across demos; a small shared factory or base config would minimize drift and simplify future changes.

Please verify any shared-config approach still aligns with Jest config typing and the repo’s current Jest/SWC versions.

apps/e2e/demo-e2e-direct/e2e/direct-stress.perf.test.ts (1)

15-105: Optional: assert a minimum throughput instead of only logging it.

Right now requestsPerSecond is logged but not enforced; adding a minimum threshold would turn throughput regressions into test failures.
apps/e2e/demo-e2e-codecall/e2e/codecall-parallel.perf.test.ts (2)
73-77: Consider ensuring resetClient is properly disposed after use.

The resetClient is created to reset the CRM store before the leak detection test, but it's not explicitly closed or disposed. While this may be handled automatically by the test framework or garbage collection, explicitly disposing the client after use would make the cleanup intent clearer and prevent potential resource accumulation in long test runs.
Suggested improvement
   perfTest('parallel stress: 5000 total mixed CRM operations', async ({ perf, server }) => {
     // Reset CRM store before leak detection to prevent accumulation from warmup
     const resetClient = await server.createClient();
     await resetClient.tools.call('crm-reset', {});
+    // Consider adding: await resetClient.close?.(); or similar disposal if available
1-6: Documentation mentions 5 test cases but only 4 are implemented.

The file header and AI summary mention "activities-list operations" and "activities-stats operations" as separate test cases, but these are only exercised within the mixed CRM operations test (Lines 92-96), not as standalone parallel stress tests. If separate tests were intended, consider adding them for completeness.
apps/e2e/demo-e2e-cache/e2e/cache-basic.perf.test.ts (1)
83-86: Consider adding a meaningful performance assertion.

The comment on Line 83-84 acknowledges that "Actual performance assertions would need baseline data," but the only assertion is a 30-second timeout. While acceptable for an initial implementation, consider adding at least a relative comparison assertion (e.g., cached should be noticeably faster than non-cached) to validate the cache is providing benefit.
Suggested improvement
       // Log performance comparison
       console.log(`Non-cached (20 ops): ${nonCachedDuration}ms`);
       console.log(`Cached (20 ops): ${cachedDuration}ms`);

-      // Just ensure test completes within timeout
-      // Actual performance assertions would need baseline data
+      // Cached operations should provide at least some benefit
+      // (using generous margin to avoid flaky tests)
+      expect(cachedDuration).toBeLessThan(nonCachedDuration * 2);
       perf.assertThresholds({ maxDurationMs: 30000 });
apps/e2e/demo-e2e-errors/e2e/errors-stress.perf.test.ts (1)
15-38: Consider adding throughput assertions for consistency with parallel tests.

The sequential stress tests only assert growthRate but not throughput (requestsPerSecond). The parallel tests assert both:
expect(result.totalRequestsPerSecond).toBeGreaterThan(200);
expect(result.growthRate).toBeLessThan(200 * 1024);
For consistency and to catch performance regressions, consider adding throughput assertions to sequential tests as well (with appropriately lower thresholds since they're single-threaded).
Suggested improvement
     // Check growth rate is reasonable (< 200KB/iter)
     expect(result.growthRate).toBeLessThan(200 * 1024);
+    // Single-threaded baseline (adjust threshold as needed)
+    expect(result.requestsPerSecond).toBeGreaterThan(50);
Also applies to: 97-115
apps/e2e/demo-e2e-skills/e2e/skills-stress.perf.test.ts (1)

64-90: Mixed operations test only cycles through loadSkills and searchSkills.

The mixed operations test cycles through 3 operations, but 2 of them are loadSkills calls (Lines 70 and 74). Consider diversifying to include other skill-related operations if available, or updating the comment to reflect that it's testing "loadSkills variations and searchSkills" rather than a broader mix.
apps/e2e/demo-e2e-cache/e2e/cache-leak.perf.test.ts (1)
15-33: Avoid unbounded cache keys in this leak test.
Using Math.random() creates a new cache entry every iteration, which can introduce expected growth and make the “no leak” check noisy. Consider cycling through a fixed key set instead.
♻️ Suggested change (bounded key set)
-    const result = await perf.checkLeak(
-      async () => {
-        await mcp.tools.call('expensive-operation', {
-          operationId: `leak-test-${Math.random()}`,
-          complexity: 2,
-        });
-      },
+    const leakKeys = Array.from({ length: 10 }, (_, i) => `leak-test-${i}`);
+    let leakIndex = 0;
+    const result = await perf.checkLeak(
+      async () => {
+        const operationId = leakKeys[leakIndex++ % leakKeys.length];
+        await mcp.tools.call('expensive-operation', {
+          operationId,
+          complexity: 2,
+        });
+      },
       {
         iterations: 30,
         threshold: 5 * 1024 * 1024, // 5MB threshold
         warmupIterations: 5,
       },
     );
apps/e2e/demo-e2e-redis/e2e/redis-parallel.perf.test.ts (3)
21-52: Remove unused mcp fixture from destructuring.

The mcp fixture is destructured in the test function signature but never used. Only perf and server are actually utilized.
🔧 Proposed fix
   (SKIP_REDIS ? perfTest.skip : perfTest)(
     'parallel stress: 5000 total session set/get operations',
-    async ({ mcp, perf, server }) => {
+    async ({ perf, server }) => {
54-87: Remove unused mcp fixture from destructuring.

Same issue as the first test - mcp is not used.
🔧 Proposed fix
   (SKIP_REDIS ? perfTest.skip : perfTest)(
     'parallel stress: 5000 total vault store/retrieve operations',
-    async ({ mcp, perf, server }) => {
+    async ({ perf, server }) => {
129-168: Remove unused mcp fixture from destructuring.

The mcp fixture is destructured but not used in this test.
🔧 Proposed fix
   (SKIP_REDIS ? perfTest.skip : perfTest)(
     'parallel stress: 5000 total mixed Redis operations',
-    async ({ mcp, perf, server }) => {
+    async ({ perf, server }) => {
apps/e2e/demo-e2e-public/e2e/public-parallel.perf.test.ts (2)
45-71: Consider disconnecting resetClient after use.

The resetClient is created for store reset but is never disconnected. While this may be cleaned up automatically when the test ends, explicitly disconnecting is a better practice to avoid potential resource leaks in the test harness.
🔧 Proposed fix
   perfTest('parallel stress: 5000 total list-notes operations', async ({ perf, server }) => {
     // Reset store before leak check to prevent accumulated data from previous tests
     const resetClient = await server.createClient();
     await resetClient.tools.call('notes-reset', {});
+    await resetClient.disconnect();

     const result = await perf.checkLeakParallel(
73-107: Same suggestion: disconnect resetClient after use.
🔧 Proposed fix
   perfTest('parallel stress: 5000 total mixed operations', async ({ perf, server }) => {
     // Reset store before leak check to prevent accumulated data from previous tests
     const resetClient = await server.createClient();
     await resetClient.tools.call('notes-reset', {});
+    await resetClient.disconnect();

     const result = await perf.checkLeakParallel(

.github/workflows/perf.yml

apps/e2e/demo-e2e-redis/e2e/redis-stress.perf.test.ts

libs/testing/src/perf/baseline-store.ts

libs/testing/src/perf/index.ts

libs/testing/src/perf/leak-detector.ts

libs/testing/src/perf/perf-setup.ts

libs/testing/src/perf/perf-test.ts

scripts/perf-fetch-baseline.mjs

scripts/perf-publish-baseline.mjs

…s reset tools

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@libs/testing/src/auth/mock-oauth-server.ts`:
- Around line 1099-1120: computeCodeChallenge currently throws when
_sha256Base64url isn't loaded because no code path in this class preloads crypto
utils; fix by ensuring the grant handler uses the async helper: call and await
computeCodeChallengeAsync(verifier, method) (which itself awaits
loadCryptoUtils()) instead of computeCodeChallenge, or alternatively call await
loadCryptoUtils() during server init or at the start of the grant handling flow
before calling computeCodeChallenge; reference computeCodeChallenge,
computeCodeChallengeAsync, loadCryptoUtils, _sha256Base64url, and the grant
handler when making the change.
- Around line 51-63: The three module-scoped variables (_randomBytes,
_sha256Base64url, _base64urlEncode) are declared as definitely assigned but may
be uninitialized; change their declarations to allow undefined (e.g. append |
undefined) and update loadCryptoUtils to check for undefined (e.g. if
(_randomBytes === undefined) { const utils = await import(...); assign all
three) so all three are guaranteed assigned after loadCryptoUtils returns; also
ensure call sites either await loadCryptoUtils() first or narrow/assert the
variables before use (or throw if still undefined) to satisfy strict
definite-assignment rules while keeping lazy loading via loadCryptoUtils.
- Around line 1079-1093: generateCode currently uses crypto.getRandomValues
directly; change it to consistently use frontmcp/utils by making generateCode
async and delegating to generateCodeAsync (i.e., have generateCode return the
Promise from generateCodeAsync) so all random/encoding uses _randomBytes and
_base64urlEncode via loadCryptoUtils instead of the WebCrypto API; update the
generateCode signature and any callers as needed to handle the Promise<string>.

In `@libs/testing/src/perf/report-generator.ts`:
- Around line 190-196: The branch in report-generator.ts that builds regressions
lines treats both cases the same (const emoji = r.status === 'regression' ? '!'
: '!'), losing the distinction between regression and warning; update the
assignment for emoji in the loop that iterates regressionsWithIssues (use the r
variable and r.status) so it selects different markers (e.g., '❗' or '🔴' for
'regression' and '⚠️' for warnings/other non-ok statuses) before calling
lines.push(`- ${emoji} ${r.message}`) to restore clear visual distinction.
- Around line 24-29: Public perf reporting APIs ReportGenerator, saveReports,
and createReportGenerator were added but no draft docs were created; add a draft
documentation page for the testing perf-reporting APIs in the project's docs
draft area that documents each exported symbol (ReportGenerator class,
saveReports function, createReportGenerator factory), their public
methods/parameters/return values, example usage snippets, and any
configuration/behavior notes, and update the docs index/navigation so these new
draft pages are discoverable.

🧹 Nitpick comments (6)

libs/testing/src/perf/leak-detector.ts (1)

167-198: Missing warning when GC is unavailable in sync path.

The async detectLeak method warns when forceGc is requested but GC is unavailable (line 124-126). The sync detectLeakSync method silently skips GC without warning, which could lead to inaccurate results without the user being aware.
Proposed fix for consistency
   detectLeakSync(operation: () => void, options?: LeakDetectionOptions): LeakDetectionResult {
     const opts = { ...DEFAULT_OPTIONS, ...options };
     const { iterations, threshold, warmupIterations, forceGc, intervalSize } = opts;
 
+    // Warn if GC is not available
+    if (forceGc && !isGcAvailable()) {
+      console.warn('[LeakDetector] Manual GC not available. Run Node.js with --expose-gc for accurate results.');
+    }
+
     // Run warmup iterations
     for (let i = 0; i < warmupIterations; i++) {

libs/testing/src/client/mcp-test-client.ts (1)

955-970: Remove unused ensureConnected() method.

The new getConnectedTransport() method supersedes ensureConnected() - both perform the same validation, but getConnectedTransport() also returns the typed transport. Since ensureConnected() is private with no callers in this file, it should be removed to avoid code duplication.
♻️ Proposed fix
-  private ensureConnected(): void {
-    if (!this.transport || !this.transport.isConnected()) {
-      throw new Error('Not connected to MCP server. Call connect() first.');
-    }
-  }
-
   /**
    * Get the transport, throwing if not connected.
-   * This method is used after ensureConnected() to get a non-null transport.
+   * This method centralizes connectivity checks and provides type-safe transport access.
    */
   private getConnectedTransport(): McpTransport {
     if (!this.transport || !this.transport.isConnected()) {
       throw new Error('Not connected to MCP server. Call connect() first.');
     }
     return this.transport;
   }

libs/testing/src/perf/perf-test.ts (2)

285-289: Consider using dot notation for object property access.

The bracket notation with string literals is functionally correct but dot notation is more idiomatic:

     users: {
-      admin: users['admin'],
-      user: users['user'],
-      readOnly: users['readOnly'],
+      admin: users.admin,
+      user: users.user,
+      readOnly: users.readOnly,
     },

331-382: Consider extracting shared test wrapper logic to reduce duplication.

The perfTestWithFixtures, skip, and only functions duplicate the same fixture setup/teardown pattern. Extract a helper to DRY this up:

♻️ Proposed refactor

+function createTestWrapper(fn: PerfTestFn, testName: string) {
+  return async () => {
+    const { fixtures, perfImpl } = await createTestFixtures(testName);
+    let testFailed = false;
+    try {
+      await fn(fixtures);
+    } catch (error) {
+      testFailed = true;
+      throw error;
+    } finally {
+      await cleanupTestFixtures(fixtures, perfImpl, testFailed);
+    }
+  };
+}
+
 function perfTestWithFixtures(name: string, fn: PerfTestFn): void {
-  it(name, async () => {
-    const { fixtures, perfImpl } = await createTestFixtures(name);
-    let testFailed = false;
-    try {
-      await fn(fixtures);
-    } catch (error) {
-      testFailed = true;
-      throw error;
-    } finally {
-      await cleanupTestFixtures(fixtures, perfImpl, testFailed);
-    }
-  });
+  it(name, createTestWrapper(fn, name));
 }
 
 // ... use function unchanged ...
 
 function skip(name: string, fn: PerfTestFn): void {
-  it.skip(name, async () => {
-    const { fixtures, perfImpl } = await createTestFixtures(name);
-    let testFailed = false;
-    try {
-      await fn(fixtures);
-    } catch (error) {
-      testFailed = true;
-      throw error;
-    } finally {
-      await cleanupTestFixtures(fixtures, perfImpl, testFailed);
-    }
-  });
+  it.skip(name, createTestWrapper(fn, name));
 }
 
 function only(name: string, fn: PerfTestFn): void {
-  it.only(name, async () => {
-    const { fixtures, perfImpl } = await createTestFixtures(name);
-    let testFailed = false;
-    try {
-      await fn(fixtures);
-    } catch (error) {
-      testFailed = true;
-      throw error;
-    } finally {
-      await cleanupTestFixtures(fixtures, perfImpl, testFailed);
-    }
-  });
+  it.only(name, createTestWrapper(fn, name));
 }

apps/e2e/demo-e2e-skills/e2e/skills-basic.perf.test.ts (1)

20-30: Add a duration threshold for consistent performance monitoring.

The test only checks memory overhead; adding maxDurationMs ensures slowdowns don't slip through unnoticed, consistent with other tests in this file.
🛠️ Suggested change
-      perf.assertThresholds({ maxHeapDelta: 25 * 1024 * 1024 }); // 25MB
+      perf.assertThresholds({
+        maxHeapDelta: 25 * 1024 * 1024, // 25MB
+        maxDurationMs: 60000,
+      });

libs/testing/src/perf/report-generator.ts (1)

331-338: Strengthen the parallel-result type guard to validate field types.

The current guard only checks property presence. The downstream code at lines 140 and 161–163 assumes workersUsed, totalRequestsPerSecond, and totalIterations are numbers, and perWorkerStats is iterable. Add type checks to prevent runtime errors:

🔧 Suggested guard hardening

   private isParallelResult(result: unknown): result is ParallelLeakDetectionResult {
-    return (
-      typeof result === 'object' &&
-      result !== null &&
-      'workersUsed' in result &&
-      'perWorkerStats' in result &&
-      'totalRequestsPerSecond' in result
-    );
+    if (typeof result !== 'object' || result === null) return false;
+    const r = result as {
+      workersUsed?: unknown;
+      perWorkerStats?: unknown;
+      totalRequestsPerSecond?: unknown;
+      totalIterations?: unknown;
+    };
+    return (
+      typeof r.workersUsed === 'number' &&
+      typeof r.totalRequestsPerSecond === 'number' &&
+      typeof r.totalIterations === 'number' &&
+      Array.isArray(r.perWorkerStats)
+    );
   }

libs/testing/src/auth/mock-oauth-server.ts

libs/testing/src/perf/report-generator.ts

…async code challenge computation

.github/workflows/perf-release.yml

.github/workflows/perf.yml

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/perf-release.yml:
- Around line 13-17: Add an explicit permissions block to the perf job (and the
other job at lines 74-77) to scope the workflow token; for baseline publishing
and release asset uploads set at minimum permissions: contents: write (for
creating/releases & assets) and actions: write (for actions artifact upload),
and add packages: write only if publishing package artifacts; update the job
named "perf" to include this permissions: mapping and adjust the same for the
perf:publish-baseline, actions/*-artifact and release publishing jobs as needed,
then verify the workflow still performs artifact uploads and release publishes.

.github/workflows/perf-release.yml

…d reduce timeout

.github/workflows/perf-release.yml

.github/workflows/perf.yml

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@libs/sdk/src/transport/event-stores/event-store.factory.ts`:
- Around line 16-48: Remove the duplicate local EventStoreConfig definition and
instead import and re-export the single shared interface from the central
definitions used across the SDK (the existing interfaces.ts common types);
update any references in this file to use the imported EventStoreConfig symbol
so there is one source of truth, or if this file must be authoritative, export
its EventStoreConfig and replace the duplicate in the common file to import from
here—ensure only one exported EventStoreConfig symbol is referenced by
functions/classes in this module (e.g., by callers of event-store.factory's
public APIs).

In `@libs/sdk/src/transport/event-stores/memory.event-store.ts`:
- Around line 62-67: The current fallback always returns 'default-stream' when
this.index.get(lastEventId) is missing; change it to derive the stream id from
lastEventId (parse the stream portion of lastEventId) and use that as the
fallback StreamId, only using 'default-stream' if lastEventId is malformed or
contains no stream part; update the logic around meta /
this.index.get(lastEventId) to compute fallbackStream (from lastEventId), ensure
you call this.streams.has(fallbackStream) and this.streams.set(fallbackStream,
[]) when needed, and return fallbackStream instead of the hardcoded freshStream.
- Around line 42-48: The current ID generation uses list.length + 1 which can
collide after eviction/TTL; replace it with a per-stream monotonic counter
stored on the instance (e.g., add a Map like this.streamCounters) and use
++counter to build the EventId instead of list.length + 1 so IDs never reuse
even if events are removed; update where IDs are generated (the block creating
id and rec), persist the incremented counter for streamId, and ensure
this.index.set(id, { streamId, idx: list.length - 1 }) still uses the correct
idx after pushing the StoredEvent.

In `@libs/sdk/src/transport/event-stores/redis.event-store.ts`:
- Around line 79-87: Replace the generic throw new Error(...) in the provider
check with the project MCP error type (e.g., MCPError) and a unique MCP error
code (e.g., 'UNSUPPORTED_PROVIDER' or 'ERR_UNSUPPORTED_PROVIDER'); update the
import to bring in MCPError if missing; also fix the comment above the provider
check to remove the incorrect "Fall back to a simple key-value based approach"
line and instead state that Vercel KV lacks Redis Streams (XADD/XRANGE) and we
therefore throw an MCP-coded error when provider (the provider variable) is
'vercel-kv' or '@vercel/kv'.

In `@libs/sdk/src/transport/index.ts`:
- Around line 35-42: The new public exports createEventStore, MemoryEventStore,
RedisEventStore and types EventStoreConfig, EventStoreResult,
MemoryEventStoreOptions, RedisEventStoreOptions are missing documentation; add a
new docs draft page under docs/draft/docs (do not edit docs/docs) that documents
these APIs: describe each exported symbol (createEventStore factory,
MemoryEventStore and RedisEventStore classes) their constructor/config options
(EventStoreConfig, MemoryEventStoreOptions, RedisEventStoreOptions), returned
shapes (EventStoreResult), usage examples for instantiating and using
createEventStore with both MemoryEventStore and RedisEventStore, and note any
resumability guarantees; link or reference the existing transport README where
appropriate and ensure the new doc file path and front-matter match project docs
conventions.

🧹 Nitpick comments (6)

libs/sdk/src/transport/event-stores/memory.event-store.ts (1)
20-23: Eviction description mismatches implementation.

The code evicts by oldest timestamp (FIFO across streams), not LRU. Please update the comment to avoid misleading users.
✏️ Suggested doc tweak
- * - Max events limit with LRU eviction
+ * - Max events limit with oldest-event eviction (FIFO across streams)
libs/sdk/src/transport/event-stores/__tests__/memory.event-store.test.ts (1)
93-107: Use fake timers for deterministic TTL testing instead of increasing sleep durations.

The test currently uses a 10ms TTL with a 20ms real sleep, which can be flaky on busy CI due to non-deterministic timing. While increasing margins helps, a better approach is to use Jest's fake timers—already established in similar tests throughout the codebase (e.g., elicit-handler.test.ts, cache-memory.provider.test.ts). This ensures deterministic, repeatable test behavior.
💡 Deterministic timing with fake timers
    it('should not replay expired events', async () => {
      // Use very short TTL for testing
+     jest.useFakeTimers();
      const store = new MemoryEventStore({ ttlMs: 10 });
      const streamId = 'test-stream' as StreamId;
      const message: JSONRPCMessage = { jsonrpc: '2.0', method: 'test', id: 1 };

      const eventId = await store.storeEvent(streamId, message);

      // Store another event for replay reference
      const message2: JSONRPCMessage = { jsonrpc: '2.0', method: 'test2', id: 2 };
      await store.storeEvent(streamId, message2);

      // Wait for TTL to expire
-     await new Promise((r) => setTimeout(r, 20));
+     jest.advanceTimersByTime(20);

      // Replay - should get nothing because events are expired
      const replayed: JSONRPCMessage[] = [];
      await store.replayEventsAfter(eventId, {
        send: async (_id, msg) => {
          replayed.push(msg);
        },
      });

      // Second event should be expired too
      expect(replayed).toHaveLength(0);
+     jest.useRealTimers();
    });
libs/sdk/src/scope/scope.instance.ts (1)
126-135: Consider extracting EventStore initialization to a helper function.

The inline EventStore initialization adds logic directly to initialize(). For consistency with the codebase pattern (e.g., registerSkillCapabilities for skill registration), consider extracting this to a helper function like initializeEventStore().

That said, the current implementation is simple and self-contained, so this is a minor consistency concern rather than a blocking issue.
♻️ Optional refactor to extract helper

Create a new file libs/sdk/src/transport/event-stores/event-store-scope.helper.ts:
import type { EventStore } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import type { FrontMcpLogger, EventStoreConfig } from '../../common';
import { createEventStore } from './event-store.factory';

export function initializeEventStore(
  config: EventStoreConfig | undefined,
  logger: FrontMcpLogger,
): EventStore | undefined {
  if (!config?.enabled) {
    return undefined;
  }
  const { eventStore } = createEventStore(config, logger);
  logger.info('EventStore initialized for SSE resumability', {
    provider: config.provider ?? 'memory',
  });
  return eventStore;
}
Then simplify scope.instance.ts:
-    // Initialize EventStore for SSE resumability support (optional)
-    // Disabled by default because Claude.ai's client doesn't handle priming events correctly
-    const eventStoreConfig = transportConfig?.eventStore;
-    if (eventStoreConfig?.enabled) {
-      const { eventStore } = createEventStore(eventStoreConfig, this.logger);
-      this._eventStore = eventStore;
-      this.logger.info('EventStore initialized for SSE resumability', {
-        provider: eventStoreConfig.provider ?? 'memory',
-      });
-    }
+    // Initialize EventStore for SSE resumability support (optional)
+    this._eventStore = initializeEventStore(transportConfig?.eventStore, this.logger);
Based on learnings: "Keep scope.instance.ts lean by using helper functions for feature-specific registration logic"
libs/sdk/src/transport/event-stores/__tests__/event-store.factory.test.ts (2)
41-51: Test doesn't verify that options are actually applied.

This test creates an EventStore with custom maxEvents and ttlMs but only verifies the instance type and result type. Consider verifying the options were passed through to the store.
💡 Suggested improvement
     it('should respect maxEvents and ttlMs options', () => {
       const result = createEventStore({
         enabled: true,
         provider: 'memory',
         maxEvents: 5000,
         ttlMs: 120000,
       });

       expect(result.eventStore).toBeInstanceOf(MemoryEventStore);
       expect(result.type).toBe('memory');
+      // Verify options were applied (MemoryEventStore exposes these via internal state)
+      const store = result.eventStore as MemoryEventStore;
+      // If MemoryEventStore doesn't expose these, consider adding getters for testing
+      // or verify behavior by storing events and checking TTL/eviction
     });
Alternatively, add a behavioral test that stores more than maxEvents and verifies eviction occurs.
64-79: Consider using toBeInstanceOf for Redis store verification.

The test verifies result.eventStore is defined but doesn't confirm it's actually a RedisEventStore instance. For consistency with the memory provider tests, consider importing and checking the instance type.
♻️ Suggested improvement
+import { RedisEventStore } from '../redis.event-store';

 // ... in the test
     it('should create RedisEventStore when redis config is provided', () => {
       const result = createEventStore({
         enabled: true,
         provider: 'redis',
         redis: {
           provider: 'redis',
           host: 'localhost',
           port: 6379,
         },
       });

-      expect(result.eventStore).toBeDefined();
+      expect(result.eventStore).toBeInstanceOf(RedisEventStore);
       expect(result.type).toBe('redis');
     });
libs/sdk/src/common/types/options/transport/schema.ts (1)
225-231: Remove redundant .optional() before .default() in Zod 4 schemas.

In Zod 4, .default() implicitly makes a field optional on input and ensures a defined output value. The .optional() before .default() is redundant and can be simplified. Note that redis correctly uses .optional() without a default since it should remain undefined when not provided.
♻️ Suggested simplification
 export const eventStoreConfigSchema = z.object({
   enabled: z.boolean().default(false),
-  provider: z.enum(['memory', 'redis']).optional().default('memory'),
-  maxEvents: z.number().int().positive().optional().default(10000),
-  ttlMs: z.number().int().positive().optional().default(300000),
+  provider: z.enum(['memory', 'redis']).default('memory'),
+  maxEvents: z.number().int().positive().default(10000),
+  ttlMs: z.number().int().positive().default(300000),
   redis: redisOptionsSchema.optional(),
 });

libs/sdk/src/transport/event-stores/event-store.factory.ts

libs/sdk/src/transport/event-stores/memory.event-store.ts

libs/sdk/src/transport/event-stores/redis.event-store.ts

libs/sdk/src/transport/index.ts

…rmance testing

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.github/workflows/perf.yml:
- Around line 3-17: The workflow_dispatch input "apps" is declared but never
used; either remove the unused input block or wire it into the job matrix so
manual runs respect it—update the workflow to read the input
(github.event.inputs.apps) and parse it into the strategy.matrix used by the
perf jobs (or conditionally set matrix to all apps when the input equals "all"),
or simply delete the workflow_dispatch.inputs.apps definition to avoid a
misleading UI.
- Around line 102-110: The report job's permissions block only sets
pull-requests: write which disables all other scopes (including contents),
causing actions/checkout@v4 to fail; update the report job's permissions (the
permissions block under the report job) to include contents: read in addition to
pull-requests: write so the checkout action has the required read access.

.github/workflows/perf.yml

…rmance testing

… data and backward compatibility

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@scripts/perf-publish-baseline.mjs`:
- Around line 127-134: The loop that aggregates individual reports may read
missing or malformed summary fields (report.summary.*), causing NaN in merged
totals; update the aggregation in the for-of over reports to defensively default
missing values (e.g., use report.summary?.totalTests ?? 0,
report.summary?.passedTests ?? 0, etc.) when adding to merged.summary, and guard
pushing projects (e.g., only push report.projects if
Array.isArray(report.projects) or report.projects?.length) so merged.projects
stays consistent.
- Around line 162-168: Guard access to measurement.timing before reading its
fields: replace direct uses of measurement.timing.durationMs (and any other
measurement.timing.*) with safe access (e.g., measurement.timing?.durationMs)
and provide defensive defaults for mean/stdDev/min/max/p95 (e.g., 0 or null) and
set sampleCount to 1 only when measurement.timing exists otherwise 0; update the
object construction where durationMs, stdDev, min, max, p95, and sampleCount are
set so they won’t throw if measurement.timing is undefined.

🧹 Nitpick comments (2)

scripts/perf-publish-baseline.mjs (2)
269-276: Ensure temp file cleanup on failure.

If exec throws an unexpected error (as opposed to returning null), the temp file at tmpFile won't be deleted. Consider using a try/finally block.
♻️ Proposed fix with try/finally
   // Update release using gh CLI
   const tmpFile = `/tmp/release-body-${Date.now()}.md`;
-  require('fs').writeFileSync(tmpFile, newBody);
-
-  const result = exec(`gh release edit "${tag}" --notes-file "${tmpFile}"`);
-  require('fs').unlinkSync(tmpFile);
-
-  return result !== null;
+  writeFileSync(tmpFile, newBody);
+
+  try {
+    const result = exec(`gh release edit "${tag}" --notes-file "${tmpFile}"`);
+    return result !== null;
+  } finally {
+    unlinkSync(tmpFile);
+  }
281-314: Consider removing async keyword.

The main function is declared async but contains no await expressions. While this works, it's unnecessary and slightly misleading.
♻️ Proposed change
-async function main() {
+function main() {

scripts/perf-publish-baseline.mjs

… data and backward compatibility

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@libs/testing/src/perf/leak-detector.ts`:
- Around line 295-299: Clamp and validate intervalSize before using it in the
modulo check so it cannot be <= 0; for example, compute a safeInterval =
Math.max(1, intervalSize) (or similar) and use safeInterval in the condition
that calls forceFullGc, updating both occurrences where the code checks "if
(forceGc && i > 0 && i % intervalSize === 0)" (and the duplicate at the other
location) to "if (forceGc && i > 0 && i % safeInterval === 0)" so GC runs
deterministically and avoids NaN/divide-by-zero behavior when calling
forceFullGc.
- Around line 101-102: The new public APIs LeakDetector (class), assertNoLeak
(function), and createLeakDetector (function) are missing docs in
docs/draft/docs/**; add a new or update an existing markdown page under
docs/draft/docs that documents each symbol: provide a short description, public
surface (constructor/signature), parameter explanations, return values, example
usage snippets showing common patterns, and any error/edge-case behavior;
reference the implementation in libs/testing/src/perf/leak-detector.ts and
ensure the page filename and front-matter follow the repo's docs conventions so
the new APIs are discoverable.
- Around line 240-262: Client creation with clientFactory can throw and
currently runs before the try/finally, leaking already-created clients; wrap the
sequential creation loop in a try/catch (or add a surrounding try) and on any
error call the existing disconnectClients helper to await disconnecting all
clients in the clients array, then rethrow the error so upstream handling
remains the same; update the block that builds clients (the for loop that calls
clientFactory and pushes into clients) to perform this cleanup on failure.

🧹 Nitpick comments (5)

scripts/perf-fetch-baseline.mjs (2)
94-106: Redundant JSON parsing of releaseJson.

The same releaseJson string is parsed twice: once at line 96 and again at line 106. This is wasteful and could introduce subtle bugs if the parsing logic diverges.
♻️ Proposed fix
   let releaseId;
+  let releaseData;
   try {
-    const release = JSON.parse(releaseJson);
-    releaseId = release.id;
+    releaseData = JSON.parse(releaseJson);
+    releaseId = releaseData.id;
   } catch {
     console.error('Failed to parse release JSON');
     return [];
   }

   // Get comments for this release
   // Note: GitHub doesn't have a direct API for release comments,
   // so we check if the baseline is in the release body instead
-  const releaseData = JSON.parse(releaseJson);

   // Check release body for baseline
199-203: Guard against missing baseline.release and baseline.timestamp before logging.

If parseBaseline returns an object without release or timestamp properties, line 201 will log undefined. Consider defensive access.
🛡️ Proposed fix
   writeFileSync(OUTPUT_PATH, JSON.stringify(baseline, null, 2));
   console.log(`[perf-fetch-baseline] Baseline written to: ${OUTPUT_PATH}`);
-  console.log(`[perf-fetch-baseline] Baseline from: ${baseline.release} (${baseline.timestamp})`);
+  console.log(`[perf-fetch-baseline] Baseline from: ${baseline.release ?? 'unknown'} (${baseline.timestamp ?? 'unknown'})`);
   const testCount = baseline?.tests ? Object.keys(baseline.tests).length : 0;
   console.log(`[perf-fetch-baseline] Tests: ${testCount}`);
.github/workflows/perf-release.yml (1)
119-126: Tag input is used without validation.

The tag output is derived from user input (github.event.inputs.tag) for workflow_dispatch triggers. While GitHub sanitizes most injection vectors in ${{ }} expressions, consider validating the tag format to catch typos and ensure it matches expected release tag patterns.
🛡️ Proposed validation
       - name: Determine release tag
         id: tag
         run: |
           if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
-            echo "TAG=${{ github.event.inputs.tag }}" >> $GITHUB_OUTPUT
+            TAG="${{ github.event.inputs.tag }}"
+            if [[ ! "$TAG" =~ ^v[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9.]+)?$ ]]; then
+              echo "::error::Invalid tag format: $TAG (expected semver like v1.2.3)"
+              exit 1
+            fi
+            echo "TAG=$TAG" >> $GITHUB_OUTPUT
           else
             echo "TAG=${{ github.event.release.tag_name }}" >> $GITHUB_OUTPUT
           fi
.github/workflows/perf.yml (1)
227-229: Bot comment detection could match unrelated comments.

The search for existing bot comments uses a generic string match '## Performance Test Results'. If another user or bot posts a comment with this heading, it could be incorrectly identified and updated. Consider adding a unique marker.
♻️ Proposed marker-based detection
+            // Add unique marker to identify our comments
+            const BOT_MARKER = '';
+            report = BOT_MARKER + '\n' + report;
+
             const botComment = comments.find(comment =>
-              comment.body.includes('## Performance Test Results')
+              comment.body.includes(BOT_MARKER)
             );
scripts/perf-publish-baseline.mjs (1)
248-268: Baseline removal logic may leave artifacts in edge cases.

The logic to find and remove existing baseline sections handles the ## Performance Baseline header and trailing stats, but the lastIndexOf for section start could incorrectly match an unrelated ## Performance Baseline heading earlier in the document. Consider anchoring the search more precisely.
♻️ Proposed safer section removal
   if (startIdx !== -1 && endIdx !== -1) {
     // Find the start of the baseline section (including header)
-    const sectionStart = newBody.lastIndexOf('## Performance Baseline', startIdx);
+    // Only look for header within reasonable proximity (e.g., 200 chars before marker)
+    const searchStart = Math.max(0, startIdx - 200);
+    const sectionStart = newBody.indexOf('## Performance Baseline', searchStart);
-    const actualStart = sectionStart !== -1 ? sectionStart : startIdx;
+    const actualStart = (sectionStart !== -1 && sectionStart < startIdx) ? sectionStart : startIdx;
     const actualEnd = endIdx + BASELINE_END_MARKER.length;

libs/testing/src/perf/leak-detector.ts

…n logging

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@libs/testing/src/perf/leak-detector.ts`:
- Around line 298-299: The throughput calculation can produce Infinity when
durations are zero; update the calculations in the leak-detector to guard
against zero-duration runs by first clamping workerDurationMs and
globalDurationMs to a minimum of 1 (or treating zero as infinity-safe value)
before dividing, then compute requestsPerSecond = (iterations /
safeWorkerDurationMs) * 1000 and similarly for the global requests/sec using
globalDurationMs; update the occurrences around the requestsPerSecond
calculation and the analogous calculation using globalDurationMs so both use the
safe duration variables (workerDurationMs and globalDurationMs) to ensure a zero
duration yields 0 throughput instead of Infinity.

In `@scripts/perf-publish-baseline.mjs`:
- Around line 81-90: When reading the consolidated report at consolidatedPath,
validate the parsed object shape before returning: after JSON.parse assign to
report and check that report is an object, report.summary exists (and is an
object) and report.projects exists and is an array (e.g.,
Array.isArray(report.projects)); if any of these checks fail, log a warning
about malformed consolidated report and do NOT return report so the code falls
back to the existing fallback logic; update the block around consolidatedPath /
report parsing to perform these guards (references: consolidatedPath, report,
report.summary, report.projects).

🧹 Nitpick comments (1)

.github/workflows/perf.yml (1)

138-149: Sort project directories for deterministic report order.
Keeps the summary table and details stable across runs, reducing PR comment churn.

♻️ Proposed refactor

-              projectDirs = fs.readdirSync('perf-results').filter(d =>
-                d.startsWith('demo-e2e-') && fs.statSync(`perf-results/${d}`).isDirectory()
-              );
+              projectDirs = fs.readdirSync('perf-results')
+                .filter(d =>
+                  d.startsWith('demo-e2e-') && fs.statSync(`perf-results/${d}`).isDirectory()
+                )
+                .sort();

libs/testing/src/perf/leak-detector.ts

scripts/perf-publish-baseline.mjs

github-actions · 2026-01-30T02:56:02Z

Performance Test Results

Status: ✅ All tests passed

Summary

Project	Tests	Passed
✅ demo-e2e-agents	4	4
✅ demo-e2e-cache	11	11
✅ demo-e2e-codecall	4	4
✅ demo-e2e-config	4	4
✅ demo-e2e-direct	3	3
✅ demo-e2e-elicitation	1	1
✅ demo-e2e-errors	4	4
✅ demo-e2e-hooks	3	3
✅ demo-e2e-multiapp	4	4
✅ demo-e2e-notifications	3	3
✅ demo-e2e-openapi	2	2
✅ demo-e2e-providers	4	4
✅ demo-e2e-public	4	4
✅ demo-e2e-redis	14	14
✅ demo-e2e-remember	4	4
✅ demo-e2e-remote	5	5
✅ demo-e2e-serverless	2	2
✅ demo-e2e-skills	15	15
✅ demo-e2e-standalone	2	2
✅ demo-e2e-transport-recreation	3	3
✅ demo-e2e-ui	4	4

Total: 100 tests across 21 projects

📊 View full report in workflow run

Generated at: 2026-01-30T05:22:29.682Z
Commit: e4c81ee9

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@libs/testing/src/server/port-registry.ts`:
- Line 192: When binding to '::' (the server.listen call in this file and the
similar call inside isPortAvailable), catch errors from listen and if the error
code is 'EAFNOSUPPORT' retry binding on IPv4 loopback '127.0.0.1' instead of
failing; update the logic around the server.listen callback in the port
reservation function and the isPortAvailable function so they attempt IPv6
first, and on EAFNOSUPPORT re-invoke listen with '127.0.0.1' and proceed
normally.

libs/testing/src/server/port-registry.ts

feat: Add performance testing tools and configurations

54a7be1

github-advanced-security bot found potential problems Jan 29, 2026

View reviewed changes

.github/workflows/perf-release.yml Fixed Show fixed Hide fixed

.github/workflows/perf.yml Fixed Show fixed Hide fixed

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

frontegg-david added 2 commits January 29, 2026 02:56

feat: Update execute method to accept input parameter in CRM and Note…

1d578c2

…s reset tools

feat: Update execute method to accept input parameter in CRM and Note…

a98843d

…s reset tools

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

feat: Enhance performance testing workflows with matrix strategy and …

be2b4db

…async code challenge computation

github-advanced-security bot found potential problems Jan 29, 2026

View reviewed changes

.github/workflows/perf-release.yml Fixed Show fixed Hide fixed

.github/workflows/perf-release.yml Fixed Show fixed Hide fixed

.github/workflows/perf.yml Fixed Show fixed Hide fixed

.github/workflows/perf.yml Fixed Show fixed Hide fixed

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

.github/workflows/perf-release.yml Show resolved Hide resolved

feat: Refactor performance testing workflows to use project matrix an…

2bfdb38

…d reduce timeout

github-advanced-security bot found potential problems Jan 29, 2026

View reviewed changes

.github/workflows/perf-release.yml Fixed Show fixed Hide fixed

.github/workflows/perf.yml Dismissed Show dismissed Hide dismissed

feat: Add EventStore configuration for SSE resumability support

f50a849

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

frontegg-david added 2 commits January 30, 2026 00:56

feat: Reduce iterations in parallel stress tests to prevent OOM issues

c594b60

feat: Remove redundant parallel stress test cases to streamline perfo…

cc2097a

…rmance testing

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

.github/workflows/perf.yml Show resolved Hide resolved

.github/workflows/perf.yml Show resolved Hide resolved

agentfront deleted a comment from github-actions bot Jan 30, 2026

frontegg-david added 2 commits January 30, 2026 03:27

feat: Remove redundant parallel stress test cases to streamline perfo…

4e3d9e7

…rmance testing

feat: Enhance report conversion to consolidated format with full test…

f823fc6

… data and backward compatibility

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

scripts/perf-publish-baseline.mjs Show resolved Hide resolved

scripts/perf-publish-baseline.mjs Outdated Show resolved Hide resolved

feat: Enhance report conversion to consolidated format with full test…

9a16d57

… data and backward compatibility

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

libs/testing/src/perf/leak-detector.ts Show resolved Hide resolved

libs/testing/src/perf/leak-detector.ts Outdated Show resolved Hide resolved

libs/testing/src/perf/leak-detector.ts Show resolved Hide resolved

feat: Add bot marker to performance report and improve client creatio…

4d47eda

…n logging

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

libs/testing/src/perf/leak-detector.ts Show resolved Hide resolved

scripts/perf-publish-baseline.mjs Show resolved Hide resolved

feat: Update server listening address to support IPv6

30f767c

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

libs/testing/src/server/port-registry.ts Show resolved Hide resolved

feat: Add full performance report summary to workflow output

e832b9a

agentfront deleted a comment from github-actions bot Jan 30, 2026

frontegg-david merged commit 348150d into main Jan 30, 2026
58 checks passed

frontegg-david deleted the add-performance-tests branch January 30, 2026 05:58

feat: Add performance testing tools and configurations #225

feat: Add performance testing tools and configurations #225

Uh oh!

Conversation

frontegg-david commented Jan 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

frontegg-david commented Jan 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 29, 2026 •

edited

Loading

github-actions bot commented Jan 30, 2026 •

edited

Loading