Skip to content

Conversation

@frontegg-david
Copy link
Contributor

@frontegg-david frontegg-david commented Jan 5, 2026

Summary by CodeRabbit

  • New Features

    • Runtime loop protection with configurable max-iterations, static infinite-loop detection, configurable rapid-enumeration thresholds with per-operation overrides, pluggable custom scoring analyzer, and integrated performance benchmarking/reporting.
  • Tests

    • Expanded unit, integration, and extensive performance suites covering loop guards, infinite-loop detection, rapid-enumeration, double-VM, worker-pool, memory, latency, and correctness.
  • Documentation

    • Guides for tuning rapid-enumeration and using the __maxIterations runtime global.
  • Chores

    • Ignore perf-results.json in VCS.

✏️ Tip: You can customize this high-level summary in your review settings.

@frontegg-david
Copy link
Contributor Author

frontegg-david commented Jan 5, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Adds AST-level infinite-loop detection and injects per-loop runtime iteration guards using a new __maxIterations runtime global; propagates configurable rapid-enumeration thresholds across double‑VM boundaries; exposes iteration limits in runtimes and worker sandboxes; and adds a performance benchmarking framework (tests, reporter, CI integration).

Changes

Cohort / File(s) Change summary
AST guard & loop transforms
libs/ast-guard/src/rules/infinite-loop.rule.ts, libs/ast-guard/src/rules/index.ts, libs/ast-guard/src/agentscript-transformer.ts, libs/ast-guard/src/presets/agentscript.preset.ts, libs/ast-guard/src/__tests__/*
New InfiniteLoopRule with static truthiness checks; transformer injects per-loop __iter_N counters and __maxIterations guards; preset permits __maxIterations; extensive tests added.
Double‑VM rapid‑enumeration & config propagation
libs/enclave-vm/src/double-vm/types.ts, libs/enclave-vm/src/double-vm/parent-vm-bootstrap.ts, libs/enclave-vm/src/double-vm/double-vm-wrapper.ts, libs/enclave-vm/src/double-vm/suspicious-patterns.ts, libs/enclave-vm/src/types.ts, libs/enclave-vm/README.md
Add rapidEnumerationThreshold and rapidEnumerationOverrides to parent validation types and serialized bootstrap; suspicious-patterns resolve thresholds with per-operation overrides; wrapper/bootstrap propagate settings and include them in serialized config.
Runtime iteration limit exposure
libs/enclave-vm/src/safe-runtime.ts, libs/enclave-vm/src/enclave.ts, libs/enclave-vm/src/adapters/worker-pool/worker-script.ts, libs/enclave-vm/src/__tests__/enclave.multi-tenant.spec.ts
Expose __maxIterations from config into safe runtime and worker sandbox; validator presets include __maxIterations; tests updated to pass maxIterations.
Performance test infra & reporter
libs/enclave-vm/src/__tests__/perf/utils/*, libs/enclave-vm/src/__tests__/perf/*.spec.ts, libs/enclave-vm/src/__tests__/perf/setup.ts, libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts, libs/enclave-vm/jest.perf.config.ts, libs/enclave-vm/project.json, scripts/format-perf-summary.mjs, .github/workflows/push.yml
New benchmarking utilities (perf runners, statistics, memory tracking), custom Jest perf reporter writing perf-results.json, perf Jest config & Nx target, CI job/reporting flow, and formatter script for GitHub Actions.
Performance tests & threshold updates
libs/enclave-vm/src/__tests__/double-vm.security.spec.ts, libs/enclave-vm/src/__tests__/enclave.infinite-loop-attacks.spec.ts, libs/enclave-vm/src/__tests__/perf/*, libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts
Many new/updated perf and security tests; rapid‑enumeration default raised to 30 and pagination/per‑operation override scenarios added—review assertions and thresholds.
Exports / re-exports
libs/ast-guard/src/rules/index.ts, libs/enclave-vm/src/__tests__/perf/utils/index.ts, libs/enclave-vm/src/scoring/index.ts
New re-exports: infinite-loop rule, perf utilities/types, and scoring CustomAnalyzer type.
Worker / VM adapter typing & sandbox
libs/enclave-vm/src/adapters/vm-adapter.ts, libs/enclave-vm/src/adapters/worker-pool/worker-script.ts
Tightened sandbox typing; created protected sandbox proxy (unused with vm.Context); worker sandbox sets __maxIterations from config.
Scoring / analyzer / sidecar changes
libs/enclave-vm/src/scoring/*, libs/enclave-vm/src/sidecar/reference-resolver.ts, libs/enclave-vm/src/double-vm/double-vm-wrapper.ts, libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
Added CustomAnalyzer type and integration in local-LLM scorer (init/dispose/analyze hooks), improved type-safety, safer property access, and rapid-enum config propagation; string errors specially handled in double-vm wrapper.
Misc tooling & config
.gitignore, .prettierignore, eslint.config.mjs, libs/enclave-vm/eslint.config.mjs, .github/workflows/push.yml
Added perf-results.json to .gitignore; updated prettier ignores; adjusted ESLint rules and ignored deps; CI workflow enhanced with artifact caching and improved test reporting.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Transformer as AgentScript Transformer
  participant ChildVM as Child VM (sandbox)
  participant ParentVM as Parent VM (validator)
  participant Reporter as Perf Reporter

  rect rgb(240,248,255)
    note left of Transformer: Build-time loop transformation
    Transformer->>Transformer: scan AST & collect loops
    Transformer->>Transformer: inject per-loop counter\n(let __iter_N = 0) and guard\n(if (++__iter_N > __maxIterations) throw)
  end

  rect rgb(245,255,240)
    note left of ChildVM: Runtime execution & validation
    ChildVM->>ChildVM: execute transformed code\n(loop checks reference __maxIterations)
    ChildVM->>ParentVM: emit rapid-enum events / ops
    ParentVM-->>ChildVM: validate using\nrapidEnumerationThreshold / overrides
  end

  rect rgb(255,248,240)
    note left of Reporter: Performance reporting
    ChildVM->>Reporter: recordMetric(...)
    Reporter->>Reporter: aggregate metrics\nwrite perf-results.json
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through loops both brave and small,
I planted counters so none would stall.
Metrics in baskets, perf tales to share,
Guards and thresholds stitched with careful care.
JSON tucked away — carrots saved for repair.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically describes the main changes: adding iteration counter injection and infinite loop detection for loop transformations, which are the core features introduced across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Fix all issues with AI Agents 🤖
In @libs/ast-guard/src/rules/infinite-loop.rule.ts:
- Around line 100-102: Remove the unreachable check for the `'!!'` operator in
the unary-expression handling: the AST never produces operator `'!!'`
(double-bang is parsed as two nested `UnaryExpression`s with operator `'!'`), so
delete the branch checking `node.operator === '!!'` in the isAlwaysTruthy logic
(or the method handling UnaryExpression) and rely on existing handling for `'!'`
and nested expressions instead.

In @libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts:
- Around line 233-242: The loop intends to prevent GC while measuring memory but
using "void obj" does nothing; replace that with retaining the created objects
in a local array (e.g., retained = []), push each obj returned by factory() into
that array after calling captureMemory(), use footprints.push(after.heapUsed -
before.heapUsed) as before, and only clear the retained array (retained.length =
0) when you want to allow GC (for example after the loop or at a deliberate
checkpoint); remove the no-op "void obj" so the reference is actually preserved.

In @libs/enclave-vm/src/double-vm/parent-vm-bootstrap.ts:
- Around line 733-735: Add a short docs section to
docs/draft/docs/libraries/enclave.mdx documenting the runtime global
__maxIterations: explain that scripts executed by the enclave VM can read the
global __maxIterations (set from the maxIterations configuration), that it is
exposed as a non-writable, non-configurable number for safety, and show a
minimal example snippet of reading it from user code plus any
migration/compatibility notes. Ensure the section name is clear (e.g., "Runtime
globals — __maxIterations") and include a cross-reference to the existing
maxIterations configuration documentation.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
libs/ast-guard/src/agentscript-transformer.ts (1)

351-418: Consider handling the case where parentBody is not found.

If findParentBody returns null or the loop isn't found in the parent body (line 413), the counter declaration won't be inserted, but the iteration check referencing __iter_N will still be injected. This could cause a ReferenceError at runtime.

While this scenario may be unlikely with valid JavaScript, it would be safer to handle this edge case explicitly.

🔎 Suggested defensive check
     // Insert counter declaration before the loop
     const parentBody = findParentBody(ancestors);
     if (parentBody) {
       const loopIndex = parentBody.indexOf(node);
       if (loopIndex >= 0) {
         parentBody.splice(loopIndex, 0, createCounterDeclaration(counterName));
+      } else {
+        // Loop not found in parent body - this shouldn't happen with valid AST
+        // but if it does, remove the injected check to avoid ReferenceError
+        console.warn(`Loop transformation warning: could not insert counter declaration`);
       }
+    } else {
+      console.warn(`Loop transformation warning: could not find parent body for counter declaration`);
     }
🧹 Nitpick comments (13)
libs/ast-guard/src/rules/infinite-loop.rule.ts (1)

128-134: Consider edge case: isAlwaysFalsy(null) returns true for missing node.

Line 129 returns true when node is null/undefined, which is correct for internal usage (no expression means the check is "vacuously falsy" in the negation context). However, the Literal case comment on line 133 mentions undefined, but undefined is an Identifier, not a Literal in JS AST. The code handles this correctly via the Identifier case on line 142-145, so this is just a documentation nit.

🔎 Suggested comment fix
       case 'Literal':
-        // false, 0, "", null, undefined
+        // false, 0, "", null (note: `undefined` is an Identifier, not a Literal)
         return !node.value && node.value !== undefined;
libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts (3)

32-50: Potential naming collision: BenchmarkReport interface exists in statistics.ts with different shape.

The relevant_code_snippets shows libs/enclave-vm/src/__tests__/perf/utils/statistics.ts already exports a BenchmarkReport interface with a different structure (containing latency, throughput, memory fields). This could cause confusion or import conflicts when both are used together.

Consider renaming this interface to BenchmarkTestReport or BenchmarkSuiteReport to differentiate it from the statistics-focused report.

🔎 Proposed rename
-export interface BenchmarkReport {
+export interface BenchmarkSuiteReport {
   timestamp: string;
   duration: number;
   // ... rest of interface
 }

Then update usages in buildReport and printSummary methods.


110-124: Consider race condition in concurrent test scenarios.

The loadMetricsFromFile and saveMetricsToFile functions perform read-modify-write without locking. If Jest runs tests in parallel, metrics could be lost due to concurrent file access.

This is likely acceptable for performance tests (which typically run sequentially), but worth noting if parallel execution is ever enabled.


156-172: Test metrics not associated with test results.

In buildReport, testResults array is populated with empty metrics: [] arrays (line 169), but the allMetrics from getAllMetrics() are stored separately in report.metrics. This means individual test results don't show their associated metrics in the JSON output.

This may be intentional design, but if you want metrics associated with specific tests, consider matching metrics to test results by test name.

libs/enclave-vm/src/double-vm/suspicious-patterns.ts (1)

66-91: Consider simplifying the config access logic.

The current implementation has redundant checks and complex fallback paths. Lines 70-73 check globalThis.validationConfig twice using different access patterns, but they're semantically equivalent.

🔎 Suggested simplification
     let threshold = 30; // Default fallback
     try {
-      // eslint-disable-next-line @typescript-eslint/no-explicit-any
-      const config =
-        (globalThis as any).validationConfig ||
-        (typeof (globalThis as any)['validationConfig'] !== 'undefined'
-          ? (globalThis as any)['validationConfig']
-          : null);
-      if (!config) {
-        // Try direct access (works in parent VM where validationConfig is a local variable)
-        // @ts-expect-error validationConfig exists in parent VM runtime scope
-        const localConfig = typeof validationConfig !== 'undefined' ? validationConfig : null;
-        if (localConfig) {
-          const defaultThreshold = localConfig.rapidEnumerationThreshold ?? 30;
-          const overrides = localConfig.rapidEnumerationOverrides ?? {};
-          threshold = overrides[operationName] ?? defaultThreshold;
-        }
-      } else {
-        const defaultThreshold = config.rapidEnumerationThreshold ?? 30;
-        const overrides = config.rapidEnumerationOverrides ?? {};
-        threshold = overrides[operationName] ?? defaultThreshold;
+      // Try globalThis first, then local scope (parent VM injects validationConfig)
+      // eslint-disable-next-line @typescript-eslint/no-explicit-any
+      const config = (globalThis as any).validationConfig
+        // @ts-expect-error validationConfig exists in parent VM runtime scope
+        ?? (typeof validationConfig !== 'undefined' ? validationConfig : null);
+      
+      if (config) {
+        const defaultThreshold = config.rapidEnumerationThreshold ?? 30;
+        const overrides = config.rapidEnumerationOverrides ?? {};
+        threshold = overrides[operationName] ?? defaultThreshold;
       }
     } catch {
libs/enclave-vm/src/double-vm/double-vm-wrapper.ts (1)

153-164: Type narrowing issue with string error handling.

The code at line 124 casts error to Error (const err = error as Error), but then line 154 checks typeof err === 'string'. While this works at runtime (TypeScript casts don't affect runtime values), it's misleading and bypasses type safety.

🔎 Suggested improvement for clearer type handling
-    } catch (error: unknown) {
-      const err = error as Error;
+    } catch (error: unknown) {
       // Update stats
       stats.duration = Date.now() - startTime;
       stats.endTime = Date.now();
 
       // Report memory usage if tracking was enabled
       if (memoryTracker) {
         const memSnapshot = memoryTracker.getSnapshot();
         stats.memoryUsage = memSnapshot.peakTrackedBytes;
       }
 
       // Handle MemoryLimitError specially
-      if (err instanceof MemoryLimitError) {
+      if (error instanceof MemoryLimitError) {
         return {
           success: false,
           error: {
             name: 'MemoryLimitError',
-            message: err.message,
+            message: error.message,
             code: 'MEMORY_LIMIT_EXCEEDED',
-            data: { usedBytes: err.usedBytes, limitBytes: err.limitBytes },
+            data: { usedBytes: error.usedBytes, limitBytes: error.limitBytes },
           },
           stats,
         };
       }
 
       // Determine whether to sanitize stack traces
       const shouldSanitize = config.sanitizeStackTraces ?? true;
 
       // Handle thrown strings (e.g., iteration limit exceeded throws a string literal)
-      if (typeof err === 'string') {
+      if (typeof error === 'string') {
         return {
           success: false,
           error: {
             name: 'DoubleVMExecutionError',
-            message: err,
+            message: error,
             code: 'DOUBLE_VM_EXECUTION_ERROR',
           },
           stats,
         };
       }
 
+      const err = error as Error;
       return {
         success: false,
libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts (3)

80-84: Duplicate forceGC implementation with different return type.

This forceGC returns void, while memory-utils.ts (lines 43-49) has forceGC returning boolean to indicate whether GC was actually triggered. Consider consolidating to a single implementation in one module and re-exporting, or align the return types for consistency across the utility suite.


96-100: Unused name parameter.

The name parameter is declared but never used in the function body. Either remove it or use it (e.g., for logging or labeling results).

🔎 Proposed fix
 export async function benchmark(
-  name: string,
   fn: () => Promise<unknown>,
   options?: Partial<BenchmarkOptions>,
 ): Promise<BenchmarkSample[]> {

Or alternatively, include it in sample metadata if tracing is desired.


164-175: Unused name parameter and missing cooldown support.

Similar to benchmark, the name parameter is unused. Additionally, benchmarkSync doesn't apply cooldownMs during warmup iterations (unlike the async version at lines 111-113), which creates an inconsistency in behavior between the two functions.

libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts (2)

181-189: Duplicate formatBytes implementation.

This function is identical to formatBytes in statistics.ts (lines 222-230). Consider consolidating into a single shared utility to avoid divergence.


137-143: Redundant return type annotation.

The return type Omit<MemoryTrackingResult<T>, 'samples'> & { samples: MemorySnapshot[] } is equivalent to MemoryTrackingResult<T> since samples is already MemorySnapshot[] in the interface. Consider simplifying to just MemoryTrackingResult<T>.

libs/enclave-vm/src/__tests__/perf/utils/statistics.ts (2)

50-65: percentile assumes pre-sorted input.

The function expects a sorted array but this isn't enforced. Consider either renaming to percentileFromSorted to make the precondition explicit, or sorting internally for safety. Currently, calculateLatencyStats sorts before calling, which is correct, but other callers might miss this requirement.

🔎 Proposed fix (option A: rename for clarity)
-export function percentile(sorted: number[], p: number): number {
+export function percentileFromSorted(sortedValues: number[], p: number): number {

220-231: Duplicate formatBytes implementation.

This is identical to memory-utils.ts (lines 181-189). Consolidate into one location.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbe952f and f13ae75.

📒 Files selected for processing (29)
  • .gitignore
  • libs/ast-guard/src/__tests__/agentscript-transformer.spec.ts
  • libs/ast-guard/src/__tests__/infinite-loop.rule.spec.ts
  • libs/ast-guard/src/agentscript-transformer.ts
  • libs/ast-guard/src/presets/agentscript.preset.ts
  • libs/ast-guard/src/rules/index.ts
  • libs/ast-guard/src/rules/infinite-loop.rule.ts
  • libs/enclave-vm/jest.perf.config.ts
  • libs/enclave-vm/project.json
  • libs/enclave-vm/src/__tests__/double-vm.security.spec.ts
  • libs/enclave-vm/src/__tests__/enclave.infinite-loop-attacks.spec.ts
  • libs/enclave-vm/src/__tests__/enclave.multi-tenant.spec.ts
  • libs/enclave-vm/src/__tests__/perf/double-vm.perf.spec.ts
  • libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts
  • libs/enclave-vm/src/__tests__/perf/setup.ts
  • libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts
  • libs/enclave-vm/src/__tests__/perf/utils/index.ts
  • libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts
  • libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts
  • libs/enclave-vm/src/__tests__/perf/utils/statistics.ts
  • libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts
  • libs/enclave-vm/src/adapters/worker-pool/worker-script.ts
  • libs/enclave-vm/src/double-vm/double-vm-wrapper.ts
  • libs/enclave-vm/src/double-vm/parent-vm-bootstrap.ts
  • libs/enclave-vm/src/double-vm/suspicious-patterns.ts
  • libs/enclave-vm/src/double-vm/types.ts
  • libs/enclave-vm/src/enclave.ts
  • libs/enclave-vm/src/safe-runtime.ts
  • libs/enclave-vm/src/types.ts
🧰 Additional context used
📓 Path-based instructions (1)
libs/**

⚙️ CodeRabbit configuration file

libs/**: Contains publishable SDK libraries. Review for API correctness, breaking changes, and consistency with docs. When public APIs change, ensure there is a matching docs/draft/docs/** update (not direct edits under docs/docs/**).

Files:

  • libs/ast-guard/src/rules/index.ts
  • libs/enclave-vm/src/safe-runtime.ts
  • libs/enclave-vm/src/double-vm/parent-vm-bootstrap.ts
  • libs/enclave-vm/src/__tests__/perf/double-vm.perf.spec.ts
  • libs/enclave-vm/src/__tests__/perf/utils/index.ts
  • libs/enclave-vm/src/adapters/worker-pool/worker-script.ts
  • libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts
  • libs/enclave-vm/jest.perf.config.ts
  • libs/enclave-vm/src/types.ts
  • libs/ast-guard/src/__tests__/agentscript-transformer.spec.ts
  • libs/enclave-vm/src/__tests__/double-vm.security.spec.ts
  • libs/enclave-vm/src/double-vm/double-vm-wrapper.ts
  • libs/enclave-vm/src/double-vm/suspicious-patterns.ts
  • libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts
  • libs/ast-guard/src/agentscript-transformer.ts
  • libs/enclave-vm/src/double-vm/types.ts
  • libs/ast-guard/src/rules/infinite-loop.rule.ts
  • libs/enclave-vm/src/enclave.ts
  • libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts
  • libs/enclave-vm/src/__tests__/enclave.infinite-loop-attacks.spec.ts
  • libs/ast-guard/src/__tests__/infinite-loop.rule.spec.ts
  • libs/enclave-vm/project.json
  • libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts
  • libs/enclave-vm/src/__tests__/enclave.multi-tenant.spec.ts
  • libs/enclave-vm/src/__tests__/perf/setup.ts
  • libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts
  • libs/enclave-vm/src/__tests__/perf/utils/statistics.ts
  • libs/ast-guard/src/presets/agentscript.preset.ts
🧬 Code graph analysis (9)
libs/ast-guard/src/__tests__/agentscript-transformer.spec.ts (1)
libs/ast-guard/src/agentscript-transformer.ts (1)
  • transformAgentScript (89-191)
libs/enclave-vm/src/__tests__/double-vm.security.spec.ts (2)
libs/enclave-vm/src/types.ts (1)
  • ToolHandler (518-518)
libs/enclave-vm/src/enclave.ts (1)
  • Enclave (156-705)
libs/enclave-vm/src/double-vm/suspicious-patterns.ts (2)
libs/enclave-vm/src/double-vm/types.ts (2)
  • SuspiciousPattern (30-44)
  • OperationHistory (15-22)
libs/enclave-vm/src/double-vm/index.ts (2)
  • SuspiciousPattern (16-16)
  • OperationHistory (19-19)
libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts (1)
libs/enclave-vm/src/__tests__/perf/utils/statistics.ts (1)
  • BenchmarkReport (41-48)
libs/ast-guard/src/__tests__/infinite-loop.rule.spec.ts (2)
libs/ast-guard/src/rules/infinite-loop.rule.ts (1)
  • InfiniteLoopRule (32-171)
libs/ast-guard/src/presets/agentscript.preset.ts (1)
  • createAgentScriptPreset (179-434)
libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts (2)
libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts (1)
  • forceGC (80-84)
libs/enclave-vm/src/__tests__/perf/utils/statistics.ts (1)
  • formatBytes (223-231)
libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts (1)
libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts (1)
  • forceGC (44-50)
libs/enclave-vm/src/__tests__/perf/utils/statistics.ts (2)
libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts (1)
  • BenchmarkSample (11-18)
libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts (1)
  • formatBytes (181-189)
libs/ast-guard/src/presets/agentscript.preset.ts (1)
libs/ast-guard/src/rules/infinite-loop.rule.ts (1)
  • InfiniteLoopRule (32-171)
🔇 Additional comments (60)
.gitignore (1)

100-100: LGTM!

Appropriately ignoring the generated performance results file to keep it out of version control.

libs/enclave-vm/src/__tests__/double-vm.security.spec.ts (5)

535-551: LGTM!

The test correctly validates the new >30 threshold for rapid enumeration detection. The assertion expect(callCount).toBeLessThanOrEqual(32) properly accounts for the threshold check happening after some calls succeed.


553-586: LGTM!

Good addition of a legitimate pagination test case that verifies normal usage patterns (20 calls) are not blocked by the new 30-call threshold.


621-690: LGTM!

Well-structured tests for custom rapidEnumerationThreshold configuration covering both low threshold (triggering detection early) and high threshold (allowing more calls) scenarios.


692-763: LGTM!

Thorough testing of per-operation rapidEnumerationOverrides feature, verifying that different operations can have independent thresholds while respecting the default for unconfigured operations.


1444-1459: LGTM!

Test updated consistently with the new >30 threshold for rapid enumeration detection.

libs/ast-guard/src/rules/infinite-loop.rule.ts (2)

40-82: LGTM!

The validate method correctly implements configurable loop detection using acorn-walk, with sensible defaults (all loop types enabled). The handler registration pattern is clean and efficient.


32-38: Remove documentation requirement: InfiniteLoopRule is not exported as a public API.

InfiniteLoopRule is used internally by the agentscript preset but is not exported from libs/ast-guard/src/index.ts. It is therefore not a public API change subject to the documentation guidelines for libs/** libraries. No documentation update is required.

Likely an incorrect or invalid review comment.

libs/enclave-vm/src/__tests__/perf/setup.ts (1)

1-9: LGTM!

The intentional no-op is well-documented, explaining that metrics are managed by the reporter to enable cross-file collection during test runs.

libs/enclave-vm/src/enclave.ts (1)

653-655: LGTM!

Adding __maxIterations to the allowed globals list is necessary for the loop transformation feature. This enables transformed loops to access the iteration limit at runtime while passing AST validation.

libs/ast-guard/src/rules/index.ts (1)

22-22: LGTM!

Standard barrel export for the new InfiniteLoopRule, consistent with other rule exports in this file.

libs/enclave-vm/src/adapters/worker-pool/worker-script.ts (3)

335-337: LGTM! Consistent iteration limit exposure.

The __maxIterations global is properly exposed to transformed loop code with a sensible default of 10000, matching the fallback used in the safe loop wrappers throughout this file (lines 417, 448, 478).


733-736: LGTM! Inner VM iteration limit properly exposed.

The __maxIterations value is correctly injected into the inner VM context, making the iteration limit available to transformed loop code. The value is sourced from the maxIterations parameter passed to the bootstrap generator.

Note: Line 733 also adds parallel: createSecureProxy(innerParallel) which appears to be a separate feature addition (parallel execution support). Consider whether this should be mentioned in the PR objectives or split into a separate change for clarity.


182-183: No action needed. The rapidEnumerationThreshold and rapidEnumerationOverrides fields are properly typed in the SerializableParentValidationConfig interface as number and Record<string, number> respectively.

libs/ast-guard/src/presets/agentscript.preset.ts (3)

204-207: LGTM! Proper allowlist addition for runtime global.

Adding __maxIterations to the allowed globals prevents false positives from the UnknownGlobalRule when validating transformed code that references this runtime-injected iteration limit. The comment clearly documents its purpose.


367-372: Excellent defense-in-depth approach!

Adding the InfiniteLoopRule provides early detection of obvious infinite loop patterns during static analysis, complementing the runtime iteration guards. The explanatory comment is clear and helpful. Good placement after ForbiddenLoopRule (step 7) as a logical extension (step 7b).


17-17: InfiniteLoopRule is properly exported.

The rule is exported from libs/ast-guard/src/rules/infinite-loop.rule.ts and re-exported via the index file, allowing the import in the preset file to work correctly. No issues found.

libs/enclave-vm/src/safe-runtime.ts (1)

572-574: LGTM! Consistent iteration limit exposure across adapters.

The __maxIterations field properly exposes the configured iteration limit to the safe runtime, enabling loop transformation logic to reference this value. This is consistent with similar additions in worker-script.ts (lines 335-337) and parent-vm-bootstrap.ts (line 735).

libs/enclave-vm/project.json (1)

9-16: Performance test configuration file verified and properly configured.

The libs/enclave-vm/jest.perf.config.ts file exists and is well-configured for performance benchmarking. The configuration includes appropriate settings: extended timeout (2 minutes per test), serial execution to prevent benchmark interference, and custom JSON reporter for results output.

libs/enclave-vm/src/__tests__/enclave.multi-tenant.spec.ts (1)

307-335: LGTM!

The maxIterations: 150000 configuration is correctly added to allow the 100k iteration loop in this test to complete without hitting the default iteration limit. This change is necessary given the new loop iteration guards being introduced in this PR.

libs/enclave-vm/src/__tests__/enclave.infinite-loop-attacks.spec.ts (3)

24-42: LGTM! Comprehensive AST-level infinite loop detection tests.

The test suite effectively validates that obvious infinite loop patterns like for(;;) are blocked at AST validation time. The test correctly verifies both failure status and error message content.


80-118: Good coverage for ForbiddenLoopRule behavior.

These tests correctly document that while(true) and do{}while(true) are blocked by the ForbiddenLoopRule (since while/do-while are disabled by default in AgentScript preset), distinct from the InfiniteLoopRule that targets infinite for loops.


139-157: Good runtime iteration limit test.

This test validates the runtime enforcement of iteration limits for loops that pass AST validation but exceed the configured maxIterations (100) at runtime. The bounded 1000-iteration loop is syntactically valid but exceeds the limit.

libs/enclave-vm/src/double-vm/suspicious-patterns.ts (1)

35-54: Good documentation of the configurable threshold behavior.

The updated docblock clearly explains the default threshold of 30, legitimate use cases, and configuration options. This helps maintainers understand the rationale behind the threshold values.

libs/enclave-vm/src/types.ts (1)

985-995: LGTM!

The new rapidEnumerationThreshold and rapidEnumerationOverrides defaults are appropriately added to DEFAULT_DOUBLE_VM_CONFIG. The default threshold of 30 aligns with the fallback value in the detection logic, and the empty overrides object provides a clean starting point for per-operation customization.

libs/ast-guard/src/__tests__/agentscript-transformer.spec.ts (3)

241-260: Good test for basic for loop iteration counter injection.

The test correctly verifies that:

  1. A counter declaration is added (let __iter_N = 0)
  2. The __maxIterations check is present
  3. The original loop structure is preserved

This ensures the transformation adds safety without breaking semantics.


305-323: Good test for unique counter names in nested loops.

Verifying that nested loops get unique counter names (__iter_0, __iter_1) is critical to prevent variable shadowing issues that could cause incorrect iteration counting.


338-376: Good tests for break/continue preservation.

These tests verify that the iteration counter injection doesn't break the semantics of break and continue statements, which is essential for correct loop behavior.

libs/ast-guard/src/agentscript-transformer.ts (2)

276-307: Good design: throwing string literal instead of Error object.

The decision to throw a string literal rather than constructing an Error object (line 300-302) is clever - it avoids requiring the Error constructor to be accessible in the sandbox while still providing a meaningful error message. The enclave can catch and wrap this appropriately.


401-418: Good approach: reverse-order transformation to preserve indices.

Processing loops in reverse order (line 403) is the correct approach when inserting declarations before loops - it ensures that insertions don't affect the indices of loops processed later.

libs/enclave-vm/jest.perf.config.ts (1)

1-30: Well-structured performance test configuration.

The Jest config is appropriately designed for performance benchmarks:

  • maxWorkers: 1 ensures consistent measurements without parallel interference
  • testTimeout: 120000 provides adequate time for long-running benchmarks
  • Coverage is correctly disabled to avoid instrumentation overhead
  • Custom reporter for structured output to perf-results.json
  • Setup file properly clears metrics between runs
libs/enclave-vm/src/double-vm/types.ts (2)

102-114: Well-structured addition of rapid enumeration configuration fields.

The new rapidEnumerationThreshold and rapidEnumerationOverrides fields are properly documented with JSDoc comments including default values and usage examples. The types are consistent with the serializable counterpart.

As per coding guidelines for libs/**, when public APIs change, ensure there is a matching docs/draft/docs/** update. Please verify documentation is updated to reflect these new configuration options.


136-137: LGTM!

The serializable config fields mirror the parent config correctly, maintaining type consistency for cross-VM boundary serialization.

libs/ast-guard/src/__tests__/infinite-loop.rule.spec.ts (5)

1-11: LGTM! Well-organized test file with comprehensive coverage.

The test file is well-structured with clear organization covering:

  • Infinite patterns for all loop types (for, while, do-while)
  • Valid patterns that should not be flagged
  • Configuration options testing
  • Integration with AgentScript preset
  • Edge cases and error location verification

12-124: Comprehensive for-loop test coverage.

Good coverage of both infinite patterns (missing test, always-true, truthy literals, negations) and valid patterns (bounded, variable conditions, always-false). The tests correctly expect for(;0;) and for(;false;) to pass validation since they never execute.


126-285: LGTM! Good coverage of while and do-while patterns.

Correctly tests JavaScript-specific truthy semantics (empty arrays and objects are truthy in JS). The do-while tests properly handle the "executes at least once" behavior with while(false).


287-392: LGTM! Good configuration and integration coverage.

The configuration tests verify all boolean options are respected. Integration tests with the AgentScript preset confirm the rule works within the full validation pipeline, including proper handling of allowedLoops configuration.


394-475: LGTM! Comprehensive edge case coverage.

Good testing of:

  • Nested loops reporting multiple issues
  • Special identifier handling (Infinity as truthy, undefined/NaN as falsy)
  • Empty body loops (for(;;);)
  • Error location accuracy with line numbers

These edge cases align with the isAlwaysTruthy/isAlwaysFalsy implementation in the rule.

libs/enclave-vm/src/__tests__/perf/utils/index.ts (1)

1-5: LGTM! Clean barrel export structure.

The file properly centralizes performance utility exports with appropriate separation of runtime and type-only exports.

libs/enclave-vm/src/double-vm/double-vm-wrapper.ts (1)

424-425: LGTM!

The new rapid enumeration fields are correctly serialized from the parent validation config. Both types (number and Record<string, number>) are natively serializable for cross-VM boundary passing.

libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts (4)

1-50: LGTM! Well-structured worker pool performance test suite.

Good setup with:

  • Extended timeout (60s) for worker spawn overhead
  • Configurable helper function for worker enclave creation
  • Explicit double VM disabled to isolate worker adapter testing

52-121: LGTM! Startup latency tests are well-designed.

Cold and warm startup tests properly isolate the measured behavior with dispose/recreate patterns. The assertions (5s for cold, 3s for first execution) provide reasonable upper bounds.


123-385: LGTM! Comprehensive latency and load testing.

Good patterns used:

  • Controlled concurrency with Promise.race and inFlight tracking
  • Comparison against VM adapter for overhead analysis
  • Queue exhaustion testing with intentionally small pool

The latency assertions are reasonable for the measured operations.


387-578: LGTM! Good coverage of scaling, memory, and stability.

Thorough testing of:

  • Throughput scaling with different pool sizes
  • Memory footprint and leak detection
  • Long-running stability with degradation checks

The use of global.gc() when available helps ensure accurate memory measurements.

libs/enclave-vm/src/__tests__/perf/double-vm.perf.spec.ts (2)

1-85: LGTM! Well-designed double VM overhead comparison.

The test provides valuable comparative metrics between double VM enabled vs disabled, with proper warmup iterations and cleanup. The assertion that double VM should be "at least 50% of single" (line 82) ensures it's not somehow faster due to measurement variance, while the "less than 10x slower" bound is reasonable for the security benefit.


87-422: LGTM! Comprehensive double VM performance characterization.

Good coverage of:

  • Performance across all security levels
  • Context creation (cold vs warm)
  • Tool call proxy overhead and linear scaling verification
  • Validation/pattern detection impact
  • Memory overhead comparison

The tests provide valuable performance baselines for the double VM feature.

libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts (3)

1-46: LGTM! Well-structured cold start latency tests.

Good use of it.each for parameterized testing across security levels. Proper create/dispose pattern for cold start measurement with metric recording for JSON output.


48-225: LGTM! Comprehensive latency and throughput testing.

Good coverage of:

  • Warm start latency with benchmark utility
  • Security level comparison
  • Throughput for various code patterns (simple, tool calls, loops)

The loop throughput test at line 192-193 correctly uses the maxIterations configuration introduced in this PR.


227-397: LGTM! Valuable overhead and memory profiling.

Good testing of:

  • Tool call overhead scaling with per-call overhead calculation
  • Code complexity scaling to understand parse/transform costs
  • Memory footprint and leak detection

These metrics provide useful baselines for performance monitoring.

libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts (2)

41-57: LGTM!

The measureAsync and measureSync implementations are clean and correctly capture timing with performance.now().

Also applies to: 59-75


121-156: LGTM!

The measurement phase correctly handles errors, captures memory when requested, and applies GC/cooldown between iterations.

libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts (3)

26-50: LGTM!

The captureMemory and forceGC functions are straightforward and correct. Returning a boolean from forceGC is useful for callers to know if GC was actually performed.


52-132: LGTM!

The trackMemory function properly handles interval-based sampling with cleanup in both success and error paths. The peak tracking and delta calculations are correct.


191-204: LGTM!

The formatMemorySummary function provides a clean, readable output format for memory tracking results.

libs/enclave-vm/src/__tests__/perf/utils/statistics.ts (5)

75-83: Population vs. sample standard deviation.

This computes population standard deviation (dividing by n). For sample statistics, dividing by (n-1) is typically preferred for unbiased estimation. If this is intentional for benchmark analysis, consider adding a brief comment to clarify.


1-6: LGTM!

The interfaces are well-structured and provide comprehensive coverage of latency, throughput, and memory statistics needed for benchmark reporting.

Also applies to: 7-49


85-121: LGTM!

The calculateLatencyStats function correctly handles empty input, sorts the data internally, and computes all relevant percentiles.


123-166: LGTM!

The throughput and memory stats calculations are correct, with proper handling of edge cases (empty samples, missing memory data).


233-262: LGTM!

The compareReports function provides a straightforward regression detection mechanism. The 10% threshold for latency increase or throughput decrease is a reasonable default for flagging regressions.

libs/enclave-vm/src/double-vm/parent-vm-bootstrap.ts (1)

182-183: Rapid enumeration config fields are properly defined with defaults and handled gracefully.

The rapidEnumerationThreshold and rapidEnumerationOverrides fields are correctly defined as required fields in SerializableParentValidationConfig (types.ts lines 136-137) with sensible defaults in DEFAULT_DOUBLE_VM_CONFIG (rapidEnumerationThreshold: 30 and rapidEnumerationOverrides: {}). The suspicious pattern detection code in suspicious-patterns.ts (lines 79-86) defensively handles these fields with nullish coalescing operators, ensuring graceful fallback even though the fields are required in the type.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @libs/ast-guard/src/rules/infinite-loop.rule.ts:
- Around line 131-133: The comment for the case 'Literal' branch is inconsistent
with the condition "return !node.value && node.value !== undefined" because it
lists `undefined` as a falsy value to detect while the code explicitly excludes
it; update the comment next to the case 'Literal' (and/or near the node.value
check) to accurately state that undefined is excluded here (since undefined is
handled elsewhere, e.g., the Identifier branch) so the comment matches the
condition's intent.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f13ae75 and 946413d.

📒 Files selected for processing (5)
  • docs/draft/docs/libraries/enclave.mdx
  • libs/ast-guard/src/rules/infinite-loop.rule.ts
  • libs/enclave-vm/README.md
  • libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts
  • libs/enclave-vm/src/double-vm/suspicious-patterns.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • libs/enclave-vm/src/tests/perf/utils/memory-utils.ts
🧰 Additional context used
📓 Path-based instructions (3)
libs/**

⚙️ CodeRabbit configuration file

libs/**: Contains publishable SDK libraries. Review for API correctness, breaking changes, and consistency with docs. When public APIs change, ensure there is a matching docs/draft/docs/** update (not direct edits under docs/docs/**).

Files:

  • libs/enclave-vm/src/double-vm/suspicious-patterns.ts
  • libs/enclave-vm/README.md
  • libs/ast-guard/src/rules/infinite-loop.rule.ts
docs/draft/docs/**

⚙️ CodeRabbit configuration file

docs/draft/docs/**: This folder holds the draft/source docs that humans are expected to edit. When authors want to add or change documentation, they should do it here. The Codex workflow uses these drafts, together with the code diff, to generate the latest docs under docs/docs/. As a reviewer: - Encourage contributors to add/update content here instead of docs/docs/. - It is fine to do structural/content feedback here (clarity, examples, etc).

Files:

  • docs/draft/docs/libraries/enclave.mdx
docs/**

⚙️ CodeRabbit configuration file

docs/**: Repository documentation for the SDK, using MDX and hosted by Mintlify. See more specific rules for: - docs/docs/** (latest rendered docs, automation-only) - docs/v/** (archived versions, read-only) - docs/draft/docs/** (human-editable drafts) - docs/blogs/** (blogs, human edited) - docs/docs.json (Mintlify navigation)

Files:

  • docs/draft/docs/libraries/enclave.mdx
🧬 Code graph analysis (1)
libs/enclave-vm/src/double-vm/suspicious-patterns.ts (2)
libs/enclave-vm/src/double-vm/types.ts (2)
  • SuspiciousPattern (30-44)
  • OperationHistory (15-22)
libs/enclave-vm/src/double-vm/index.ts (2)
  • SuspiciousPattern (16-16)
  • OperationHistory (19-19)
🔇 Additional comments (12)
libs/ast-guard/src/rules/infinite-loop.rule.ts (6)

7-16: LGTM: Clean API design.

The interface provides clear, granular control over which loop types to check, with well-documented defaults.


32-38: LGTM: Appropriate metadata and constructor.

The rule metadata is well-defined, and ERROR severity with enabled-by-default is appropriate for infinite loop detection.


40-82: LGTM: Solid implementation of loop detection logic.

The conditional handler registration based on options is clean, and the detection logic correctly identifies missing test conditions and always-truthy patterns for each loop type.


87-122: LGTM: Comprehensive truthiness detection.

The logic correctly handles literals, unary expressions, special identifiers, and object/array expressions. The comment at lines 98-99 explains that !!x is parsed as two nested UnaryExpressions, which properly addresses the past review concern about the !! operator check.


151-169: LGTM: Robust error reporting.

The method defensively handles optional location data and provides clear, informative error messages with useful debugging context.


7-16: Add InfiniteLoopRule exports to main library index and update documentation.

InfiniteLoopRule and InfiniteLoopOptions are exported from libs/ast-guard/src/rules/index.ts but are missing from the main library export file (libs/ast-guard/src/index.ts). All other validation rules follow the pattern of being explicitly exported from the main index. Add these exports to maintain consistency.

Additionally, per the coding guidelines for libs/**, new public APIs must have matching documentation under docs/draft/docs/**. The ast-guard.mdx documentation currently does not mention the InfiniteLoopRule or document its options. Add documentation covering this new validation rule and its configuration options.

libs/enclave-vm/src/double-vm/suspicious-patterns.ts (2)

38-55: LGTM! Excellent documentation of legitimate use cases and configuration.

The expanded docblock clearly explains the purpose, legitimate use cases (pagination/iteration), configuration options, and DoS considerations. This will help developers understand when and how to tune the threshold.


63-79: Original review comment is not applicable—the code is working as designed.

The parent VM bootstrap (lines 178-184 of parent-vm-bootstrap.ts) properly initializes validationConfig as a const before the detection functions execute. Since these functions are serialized and injected into the bootstrap scope at runtime, validationConfig is guaranteed to be available in their execution context. The try-catch with @ts-expect-error is a defensive pattern that correctly handles compile-time type constraints while relying on runtime scope availability—this is the intended design, not a workaround for configuration issues.

docs/draft/docs/libraries/enclave.mdx (2)

87-87: LGTM! Documentation accurately reflects the new configurable threshold.

The updated description correctly shows the new default threshold of 30 and indicates it's configurable.


92-108: Excellent addition! Clear configuration examples for rapid enumeration tuning.

This new section provides practical examples of how to configure both the default threshold and per-operation overrides. The examples (search, users.list) demonstrate common pagination scenarios.

libs/enclave-vm/README.md (2)

102-103: LGTM! Appropriate API documentation update.

The addition of doubleVm configuration option to the interface documentation with a reference to the detailed section is correct.


240-346: The Double VM documentation is accurate and consistent with the implementation. All configuration types, default values, pattern IDs, and examples match the codebase. The boundary condition described (">30 times in 5 seconds") is mathematically correct and matches the code's use of the > operator (which blocks at 31+ calls).

No changes needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
.github/workflows/push.yml (2)

60-65: Remove redundant error handling.

The command includes || true (line 64) which forces the exit code to 0, making continue-on-error: true (line 65) redundant. Choose one approach for clarity.

🔎 Suggested fix
       - name: Run performance tests
         id: perf-tests
         run: |
           echo "Running performance benchmarks..."
-          npx nx run enclave-vm:test-perf --passWithNoTests || true
+          npx nx run enclave-vm:test-perf --passWithNoTests
         continue-on-error: true

67-78: Consider adding error handling for the formatting script.

If the formatting script fails (e.g., due to malformed JSON), this step will fail without continue-on-error, potentially blocking the workflow. Consider whether failures in performance summary generation should be non-blocking.

🔎 Optional: Add continue-on-error if non-blocking is desired
       - name: Generate performance summary
         if: always()
         run: |
           if [ -f "perf-results.json" ]; then
             echo "## 📊 Performance Test Results" >> $GITHUB_STEP_SUMMARY
             echo "" >> $GITHUB_STEP_SUMMARY
             node scripts/format-perf-summary.mjs perf-results.json >> $GITHUB_STEP_SUMMARY
           else
             echo "## ⚠️ Performance Tests" >> $GITHUB_STEP_SUMMARY
             echo "" >> $GITHUB_STEP_SUMMARY
             echo "No performance results found. Tests may have been skipped or failed to run." >> $GITHUB_STEP_SUMMARY
           fi
+        continue-on-error: true
scripts/format-perf-summary.mjs (4)

23-23: Add error handling for JSON parsing.

If perf-results.json contains malformed JSON, the script will throw an uncaught exception. Consider wrapping the parse operation in a try-catch block to provide a user-friendly error message.

🔎 Suggested error handling
+let report;
+try {
-const report = JSON.parse(readFileSync(resultsPath, 'utf-8'));
+  report = JSON.parse(readFileSync(resultsPath, 'utf-8'));
+} catch (error) {
+  console.log('## ⚠️ Invalid Performance Results\n');
+  console.log(`Failed to parse \`${resultsFile}\`: ${error.message}`);
+  process.exit(0);
+}

39-47: Validate JSON structure before accessing properties.

The code assumes report.summary exists and has total, passed, and failed properties. If the JSON structure is unexpected, this will throw. Consider adding validation or defensive checks.

🔎 Suggested validation
 // Test summary
 console.log('### Test Summary\n');
-const { summary } = report;
+const summary = report.summary || { total: 0, passed: 0, failed: 0 };
 const passRate = summary.total > 0 ? ((summary.passed / summary.total) * 100).toFixed(1) : 0;
 console.log(`✅ **${summary.passed}/${summary.total}** tests passed (${passRate}%)\n`);

55-60: Add validation for metrics structure.

The code iterates over metrics without verifying that each value is an array. If the JSON structure is unexpected (e.g., metrics[testName] is not an array), this will throw.

🔎 Suggested defensive check
   const allMetrics = [];
   for (const [testName, metrics] of Object.entries(report.metrics)) {
+    if (!Array.isArray(metrics)) continue;
     for (const metric of metrics) {
       allMetrics.push({ testName, ...metric });
     }
   }

97-98: Add fallback for missing timestamp.

If report.timestamp is missing, the footer will display "undefined". Consider using optional chaining with a fallback value.

🔎 Suggested fix
 // Footer
-console.log(`---\n*Generated at ${report.timestamp}*`);
+console.log(`---\n*Generated at ${report.timestamp || new Date().toISOString()}*`);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 946413d and be7101b.

📒 Files selected for processing (2)
  • .github/workflows/push.yml
  • scripts/format-perf-summary.mjs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (1)
.github/workflows/push.yml (1)

80-87: LGTM!

The artifact upload step is well-configured with appropriate error handling (if-no-files-found: ignore) and retention settings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Fix all issues with AI Agents 🤖
In @libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts:
- Around line 178-180: The awaiting call to await this.getEmbedding(prompt) in
local-llm.scorer.ts performs an expensive embedding inference and its result is
discarded; remove this call (since initialize() already validates the pipeline)
or, if you actually need the embedding, assign it to a variable and incorporate
it into the scoring logic (e.g., use the embedding in analyzePrompt or the
scoring function); if the intention was only to verify pipeline readiness,
replace the call with a lightweight readiness check such as using initialize()
or an explicit pipelineLoaded flag instead of running getEmbedding.
- Around line 201-204: The code calls await this.getEmbedding(prompt) but never
uses the result, wasting CPU on every scoring request; either remove that call
from the scoring path in local-llm.scorer.ts (and in scoreWithClassification) or
actually use the embedding: assign const embedding = await
this.getEmbedding(prompt) and feed it into the similarity-based scoring logic
(or a VectoriaDB lookup) instead of the current keyword heuristics;
alternatively replace the call with a lightweight readiness helper (e.g.,
ensurePipelineLoaded or a boolean check) if you only need to warm the model.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.prettierignore (1)

15-23: Clarify the intent: broad docs/ ignore makes specific patterns redundant.

Line 18 adds docs/ as a broad ignore pattern that matches the entire docs/ directory and all its contents. This makes the specific patterns in lines 21–23 (docs/live/docs.json, docs/docs.draft.json, docs/docs.backup.json) redundant—Prettier will skip all files under docs/ once the broader pattern matches.

Decide on the intent:

  • If all documentation should be ignored: keep line 18 and remove lines 21–23.
  • If only specific files should be ignored: remove line 18.
🧹 Nitpick comments (1)
scripts/format-perf-summary.mjs (1)

95-95: Hardcoded threshold comparison operator may be inaccurate.

Line 95 assumes all thresholds are upper bounds by hardcoding <=. However, some performance metrics have lower-bound thresholds (e.g., minimum throughput, minimum operations per second) where the comparison should be >=. Consider including comparison operator information in the metric object or document that only upper-bound thresholds are supported.

💡 Possible enhancement

If the upstream performance reporter can provide comparison operators in the metric object, you could render them dynamically:

-    const threshold = metric.threshold !== undefined ? `<= ${metric.threshold} ${metric.unit || ''}` : '-';
+    const operator = metric.operator || '<=';
+    const threshold = metric.threshold !== undefined ? `${operator} ${metric.threshold} ${metric.unit || ''}` : '-';
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be7101b and db0e780.

📒 Files selected for processing (11)
  • .github/workflows/push.yml
  • .prettierignore
  • eslint.config.mjs
  • libs/enclave-vm/eslint.config.mjs
  • libs/enclave-vm/src/adapters/vm-adapter.ts
  • libs/enclave-vm/src/double-vm/double-vm-wrapper.ts
  • libs/enclave-vm/src/scoring/feature-extractor.ts
  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
  • libs/enclave-vm/src/scoring/scoring-gate.ts
  • libs/enclave-vm/src/sidecar/reference-resolver.ts
  • scripts/format-perf-summary.mjs
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/push.yml
  • libs/enclave-vm/src/double-vm/double-vm-wrapper.ts
🧰 Additional context used
📓 Path-based instructions (1)
libs/**

⚙️ CodeRabbit configuration file

libs/**: Contains publishable SDK libraries. Review for API correctness, breaking changes, and consistency with docs. When public APIs change, ensure there is a matching docs/draft/docs/** update (not direct edits under docs/docs/**).

Files:

  • libs/enclave-vm/src/sidecar/reference-resolver.ts
  • libs/enclave-vm/src/scoring/feature-extractor.ts
  • libs/enclave-vm/src/adapters/vm-adapter.ts
  • libs/enclave-vm/eslint.config.mjs
  • libs/enclave-vm/src/scoring/scoring-gate.ts
  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Performance
  • GitHub Check: Test
🔇 Additional comments (7)
eslint.config.mjs (1)

14-20: LGTM! Standard best practice for intentional unused variables.

The no-unused-vars rule with underscore ignore patterns is a well-established convention that allows developers to explicitly mark intentional unused variables and arguments. This is particularly useful for runtime placeholders and callback parameters that must match a specific signature but aren't used.

libs/enclave-vm/eslint.config.mjs (1)

12-15: LGTM! Appropriate exclusion for test-only dependency.

Adding @jest/reporters to the ignored dependencies list is correct, as test-only dependencies used by benchmark reporters don't need to be treated as production dependencies. The explanatory comment clearly documents the rationale.

libs/enclave-vm/src/scoring/feature-extractor.ts (1)

274-274: LGTM! Type-safety improvement.

The explicit type cast as string is safe and appropriate here, as the isStaticName guard on line 273 already ensures toolNameNode.value is a string. This improves type-safety without changing runtime behavior.

libs/enclave-vm/src/sidecar/reference-resolver.ts (1)

46-49: LGTM!

The refactored type guard correctly uses an intermediate typed variable with bracket-notation property access. Short-circuit evaluation ensures obj['__type'] and obj['__parts'] are only accessed after confirming value is a non-null object. This is a clean, defensive pattern for type guards on unknown values with no change to public API behavior.

libs/enclave-vm/src/scoring/scoring-gate.ts (2)

17-18: LGTM: Clean import restructuring following TypeScript best practices.

The separation of type-only imports (line 17) from value imports (line 18) using the import type syntax is excellent for tree-shaking and clearly distinguishes compile-time types from runtime values. The removal of unused imports (ExtractedFeatures, DEFAULT_SCORING_CONFIG) keeps the module boundary clean.


30-40: Configuration normalization consolidates logic effectively.

Delegating configuration normalization to normalizeScoringConfig (line 34) is a good separation of concerns, centralizing the handling of both legacy and new progressive modes in the types module. The constructor remains clean and the public API is unchanged, maintaining backward compatibility for this SDK library.

libs/enclave-vm/src/adapters/vm-adapter.ts (1)

411-411: Good type safety improvement.

Replacing any with Record<string, unknown> provides better type safety while maintaining the necessary flexibility for dynamic property assignment.

Also applies to: 415-415

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts:
- Around line 85-120: The test disposes the wrong instance inside the loop: it
calls enclave.dispose() (the outer instance) on each iteration which will fail
after the first dispose; remove the repeated enclave.dispose() call from inside
the loop and either dispose the outer enclave once before the loop or eliminate
the outer createWorkerEnclave() entirely and only use the per-iteration
newEnclave created by createWorkerEnclave(); ensure you only call dispose() on
newEnclave inside the loop (and on the outer enclave exactly once if you keep
it).
🧹 Nitpick comments (2)
libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts (2)

292-339: Consider tightening the p99 threshold.

Line 338 uses a 50x multiplier (stats.p99 < stats.p50 * 50) to account for CI variability. While CI environments can be noisy, this threshold is extremely permissive and may not catch significant performance regressions. For example, if p50 is 10ms, p99 could be up to 500ms and still pass.

Consider either:

  • Using a tighter multiplier (e.g., 10-20x) if historical data supports it
  • Adding an absolute upper bound in addition to the relative check
  • Documenting why such a loose threshold is necessary based on observed CI behavior

501-529: Consider documenting the --expose-gc requirement.

The test checks for global.gc availability (line 518) before forcing garbage collection, which is good defensive coding. However, for this test to be fully effective, Node.js must be run with the --expose-gc flag.

Consider adding a comment or console warning when GC is not available, or document this requirement in the test suite setup to ensure consistent CI configuration.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db0e780 and 6ccfbd3.

📒 Files selected for processing (1)
  • libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts
🧰 Additional context used
📓 Path-based instructions (1)
libs/**

⚙️ CodeRabbit configuration file

libs/**: Contains publishable SDK libraries. Review for API correctness, breaking changes, and consistency with docs. When public APIs change, ensure there is a matching docs/draft/docs/** update (not direct edits under docs/docs/**).

Files:

  • libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts
🧬 Code graph analysis (1)
libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts (7)
libs/enclave-vm/src/__tests__/perf/utils/statistics.ts (4)
  • calculateLatencyStats (88-121)
  • calculateReport (171-182)
  • formatReport (187-218)
  • formatBytes (223-231)
libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts (1)
  • benchmark (96-159)
scripts/format-perf-summary.mjs (1)
  • report (23-23)
libs/enclave-vm/src/__tests__/perf/utils/index.ts (1)
  • recordMetrics (4-4)
libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts (1)
  • recordMetrics (76-94)
libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts (2)
  • trackMemory (55-132)
  • formatBytes (181-189)
libs/enclave-vm/src/sidecar/reference-resolver.ts (1)
  • resolve (167-213)
🔇 Additional comments (9)
libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts (9)

1-15: LGTM! Appropriate imports and timeout configuration.

The 60-second timeout is reasonable for worker thread spawn overhead, and all necessary utilities are imported.


20-50: LGTM! Well-structured helper with appropriate defaults.

The configuration correctly isolates the worker adapter for testing by disabling double VM and VM-level memory limits.


54-83: LGTM! Cold startup test correctly measures pool initialization.

The test properly disposes the enclave after each measurement iteration.


123-232: LGTM! Comprehensive latency benchmarks with proper cleanup.

The tests appropriately compare worker pool overhead against VM adapter baseline and include tool call scenarios.


234-290: LGTM! Proper concurrent load handling with metrics recording.

The test correctly maintains concurrency level using Promise.race and records metrics for CI reporting.


341-385: LGTM! Queue exhaustion test properly validates backpressure behavior.

The test correctly forces queueing by using more concurrent requests than available workers.


388-444: LGTM! Throughput scaling test with appropriate lack of strict assertions.

The test correctly avoids making strict scaling factor assertions, which is appropriate given the variability in CI environments.


447-499: LGTM! Memory tracking tests with reasonable thresholds.

The tests properly use the trackMemory utility which forces GC for accurate baseline measurements.


532-578: LGTM! Stability test appropriately validates performance over time.

The test correctly measures latency degradation across batches and uses a reasonable 50% threshold to detect stability issues.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Fix all issues with AI Agents 🤖
In @libs/enclave-vm/README.md:
- Around line 376-432: The README example incorrectly nests scoring under a
`scoring` property; update the Enclave constructor example so `scorer` and
`localLlm` are top-level properties on the EnclaveConfig (not under `scoring`),
keeping `customAnalyzer: myAnalyzer` inside `localLlm`; adjust the example that
constructs `new Enclave({...})` to pass `scorer: 'local-llm'` and `localLlm: {
modelId: 'Xenova/all-MiniLM-L6-v2', customAnalyzer: myAnalyzer }` alongside
`toolHandler`, and ensure references to the CustomAnalyzer interface (analyze,
initialize, dispose) remain unchanged.
- Around line 101-104: Export PartialDoubleVmConfig from the package entry so
consumers can import it: open libs/enclave-vm/src/index.ts and add
PartialDoubleVmConfig to the re-exports from ./types (the same place other types
are exported), ensuring the public API matches CreateEnclaveOptions.doubleVm
which references PartialDoubleVmConfig; update any export list that re-exports
types from ./types to include PartialDoubleVmConfig.

In @libs/enclave-vm/src/scoring/__tests__/local-llm-scorer.spec.ts:
- Around line 601-623: The test name claims the custom analyzer is initialized
but never asserts initializeFn was called; update the test to either (A)
explicitly assert initializeFn was invoked when the model loads by mocking model
load to succeed and adding expect(initializeFn).toHaveBeenCalled() after await
scorer.initialize(), or (B) if model load is non-deterministic keep current
behavior but rename the test to something like "should be ready with custom
analyzer configured" and remove the misleading assertion about initialization;
refer to LocalLlmScorer, LocalLlmConfig, customAnalyzer, and initializeFn in
your changes.

In @libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts:
- Around line 122-126: The custom analyzer's initialize() should be called
whenever a custom analyzer is provided, even if the HuggingFace model fails to
load; update the logic in the constructor/initialization flow of
local-llm.scorer.ts so that if config.customAnalyzer is present you await
config.customAnalyzer.initialize() (wrap in try/catch so initialization errors
are logged but don't prevent fallbackToRules), and only set ready = true after
either model load or the analyzer initialize attempt completes; this ensures
score() can safely rely on the custom analyzer being initialized when
configured.

In @scripts/format-perf-summary.mjs:
- Around line 95-96: The comment incorrectly references threshold.type; update
the comment to use the correct property name thresholdType (matching how
metric.thresholdType is used in the code and on line 97) so the docstring aligns
with the actual property name and clarifies that for lower-bound thresholds set
thresholdType = 'min'.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
libs/enclave-vm/README.md (1)

82-86: Document the __maxIterations runtime global in the README.

The codebase exposes __maxIterations as a runtime global (in safe-runtime.ts:573 and enclave.ts:656), allowing user code to access the iteration limit. However, it is not documented in the README. Add it to the "Custom Globals" section (around line 514) or create a "Runtime Globals" section documenting both __maxIterations and the loop enforcement wrappers (__safe_for, __safe_forOf, etc.) along with their purpose.

🧹 Nitpick comments (7)
libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts (1)

334-337: Consider tightening the p99 threshold multiplier.

The 50x multiplier between p50 and p99 is quite lenient and may mask performance regressions, especially in cases where the median is very low. While CI variability is a valid concern, a 10-20x multiplier with a lower absolute bound might better catch meaningful degradation.

scripts/format-perf-summary.mjs (1)

72-82: Consider adding validation for metric properties.

While the code checks for array and object types, individual metric properties (name, value, unit) aren't validated. If the JSON is malformed, this could result in rendering "Unknown" or "N/A" values without clear indication that the data is invalid.

🔎 Optional: Add property validation
     for (const metric of testMetrics) {
-      if (metric && typeof metric === 'object') {
+      if (metric && typeof metric === 'object' && typeof metric.name === 'string' && typeof metric.value === 'number') {
         allMetrics.push({ testName, ...metric });
       }
     }

This ensures only well-formed metrics are included in the report.

libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts (2)

86-119: Consider asserting on p95 in addition to p50.

The test computes both p50 and p95 for each security level but only asserts on p50 (line 118). Since the test logs both metrics, consider adding an assertion for p95 consistency as well.

🔎 Optional: Add p95 assertion
   // PERMISSIVE should generally be faster than STRICT
   expect(results['PERMISSIVE'].p50).toBeLessThanOrEqual(results['STRICT'].p50 * 1.5);
+  expect(results['PERMISSIVE'].p95).toBeLessThanOrEqual(results['STRICT'].p95 * 1.5);
 });

195-230: Verify the low throughput threshold and consider testing iteration limit enforcement.

Two observations:

  1. Low threshold (line 229): The test expects throughput > 5 exec/sec for loops, which is significantly lower than the 100 exec/sec for simple code and 50 exec/sec for tool calls. While loops may have overhead, this threshold seems very permissive. Consider if this should be higher (e.g., 20-30 exec/sec).

  2. Iteration limit not verified: The test uses maxIterations: 10000 (line 198), which relates to the PR's infinite loop detection feature. However, the test only runs 100 iterations (line 203), well under the limit. Consider adding a test case that verifies the iteration limit is enforced when exceeded.

Based on PR objectives mentioning "iteration counter injection and infinite loop detection," ensuring the limit is tested would strengthen confidence in the feature.

libs/enclave-vm/src/__tests__/perf/double-vm.perf.spec.ts (3)

142-194: Reconsider the weak assertion for context creation overhead.

The assertion at line 192 requires only that coldStats.mean >= warmStats.mean * 0.5, allowing cold start to be faster than warm execution. The comment acknowledges "sometimes cold is faster due to CPU caching," which suggests this test may not reliably validate context creation overhead.

Suggestions:

  1. If CPU caching effects are significant, consider increasing the sample size or using a more controlled measurement approach.
  2. Alternatively, document that this test is primarily observational rather than a strict validation.
  3. Consider asserting on a percentile (e.g., p50 or p75) which might be more stable than mean.

The current assertion is too permissive and may not catch regression in context creation performance.


265-318: Consider documenting the rationale for the 40x scaling assertion.

The test asserts that 20 tool calls should take less than 40x the time of 1 call (line 317). This implies allowing up to 2x overhead beyond linear scaling (20x). While reasonable, the rationale for this specific multiplier isn't documented.

🔎 Optional: Add explanatory comment
   // 20 calls should be less than 40x the time of 1 call (accounting for overhead)
+  // (Linear would be 20x; allowing 2x factor for per-call overhead and base cost)
   expect(ratio).toBeLessThan(40);

321-383: Consider adding a test for iteration limit enforcement.

The validation overhead test includes a loop (lines 338-340) but doesn't verify that the iteration counter injection or infinite loop detection features work correctly. Given the PR objectives mention "iteration counter injection and infinite loop detection," consider adding a test case that verifies:

  1. Loops exceeding maxIterations are detected and terminated
  2. The error message clearly indicates an iteration limit was exceeded

This would provide more complete coverage of the PR's core features.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91f1e02 and 15fd465.

📒 Files selected for processing (12)
  • .prettierignore
  • README.md
  • libs/enclave-vm/README.md
  • libs/enclave-vm/src/__tests__/perf/double-vm.perf.spec.ts
  • libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts
  • libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts
  • libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts
  • libs/enclave-vm/src/scoring/__tests__/local-llm-scorer.spec.ts
  • libs/enclave-vm/src/scoring/index.ts
  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
  • libs/enclave-vm/src/scoring/types.ts
  • scripts/format-perf-summary.mjs
🚧 Files skipped from review as they are similar to previous changes (2)
  • .prettierignore
  • libs/enclave-vm/src/tests/perf/utils/benchmark-reporter.ts
🧰 Additional context used
📓 Path-based instructions (1)
libs/**

⚙️ CodeRabbit configuration file

libs/**: Contains publishable SDK libraries. Review for API correctness, breaking changes, and consistency with docs. When public APIs change, ensure there is a matching docs/draft/docs/** update (not direct edits under docs/docs/**).

Files:

  • libs/enclave-vm/src/__tests__/perf/double-vm.perf.spec.ts
  • libs/enclave-vm/src/scoring/types.ts
  • libs/enclave-vm/src/scoring/index.ts
  • libs/enclave-vm/README.md
  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
  • libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts
  • libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts
  • libs/enclave-vm/src/scoring/__tests__/local-llm-scorer.spec.ts
🧬 Code graph analysis (3)
libs/enclave-vm/src/scoring/types.ts (2)
libs/enclave-vm/src/scoring/index.ts (3)
  • CustomAnalyzer (71-71)
  • ExtractedFeatures (52-52)
  • RiskSignal (60-60)
libs/enclave-vm/src/index.ts (2)
  • ExtractedFeatures (70-70)
  • RiskSignal (77-77)
libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts (5)
libs/enclave-vm/src/__tests__/perf/utils/statistics.ts (4)
  • calculateLatencyStats (88-121)
  • calculateReport (171-182)
  • formatReport (187-218)
  • formatBytes (223-231)
libs/enclave-vm/src/__tests__/perf/utils/benchmark-reporter.ts (2)
  • recordMetrics (89-117)
  • recordMetric (61-84)
libs/enclave-vm/src/__tests__/perf/utils/index.ts (2)
  • recordMetrics (4-4)
  • recordMetric (4-4)
libs/enclave-vm/src/__tests__/perf/utils/perf-utils.ts (1)
  • benchmark (96-159)
libs/enclave-vm/src/__tests__/perf/utils/memory-utils.ts (2)
  • trackMemory (55-132)
  • formatBytes (181-189)
libs/enclave-vm/src/scoring/__tests__/local-llm-scorer.spec.ts (2)
libs/enclave-vm/src/scoring/types.ts (2)
  • CustomAnalyzer (379-404)
  • LocalLlmConfig (409-478)
libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts (1)
  • LocalLlmScorer (65-439)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test
  • GitHub Check: Performance
🔇 Additional comments (32)
libs/enclave-vm/src/__tests__/perf/worker-pool.perf.spec.ts (8)

1-15: LGTM!

The imports are well-organized and the extended timeout appropriately accounts for worker thread spawn overhead.


17-50: LGTM!

The helper function provides a clean abstraction for creating worker pool enclaves with sensible defaults and appropriate configuration for performance testing.


52-112: LGTM! Previous disposal issue has been resolved.

The disposal logic is now correct—each enclave instance is properly created within the loop scope and disposed after use. Both startup tests correctly measure their intended metrics with appropriate thresholds.


114-223: LGTM!

All three execution latency tests are well-structured with proper warmup, measurement, cleanup, and reasonable thresholds. The comparison test appropriately measures both adapters under identical conditions.


340-384: LGTM!

The queue exhaustion test correctly validates queueing behavior by using a small pool and high concurrency, with appropriate assertions.


387-443: LGTM!

The throughput scaling test properly measures and compares performance across different pool sizes with appropriate cleanup.


445-531: LGTM!

The memory tests are well-structured, properly clean up resources, and include appropriate safeguards (GC availability check, cleanup delays) for reliable measurements.


533-579: LGTM!

The stability test appropriately measures performance consistency over time with a reasonable degradation threshold.

libs/enclave-vm/README.md (2)

240-269: Documentation alignment verified—no action needed.

The draft documentation at docs/draft/docs/libraries/enclave.mdx correctly documents both rapidEnumerationThreshold (line 100) and the __maxIterations runtime global (lines 158–172) as referenced in this README update. Documentation is aligned and in the proper draft location.


240-295: All documentation claims verified as accurate. The rapidEnumerationThreshold default of 30, the 5-second detection window, all five suspicious pattern IDs (EXFIL_LIST_SEND, RAPID_ENUMERATION, CREDENTIAL_EXFIL, BULK_OPERATION, DELETE_AFTER_ACCESS), and the Record<string, number> type for rapidEnumerationOverrides are correctly documented and match the implementation in libs/enclave-vm/src/double-vm/.

README.md (1)

93-97: Security metrics are well-supported by the actual test suite.

The security test counts in the README are verified: actual test suite contains 1852 total test blocks with 1380 security-related tests, exceeding the claimed 1184+. The test file structure confirms multiple attack vector suites (attack-matrix, combined-attacks, constructor-obfuscation-attacks, runtime-attack-vectors, helper-escape), validating the 150+ blocked vectors claim. Feature additions (AI Scoring Gate, Reference Sidecar) are properly reflected in test directory structure (scoring and sidecar modules). The metrics are accurate and conservative.

libs/enclave-vm/src/scoring/types.ts (2)

359-404: LGTM! Well-designed extensibility interface.

The CustomAnalyzer interface provides a clean extension point with appropriate lifecycle hooks. The documentation and example clearly demonstrate usage.


456-477: LGTM! Clear documentation of the custom analyzer integration.

The field addition is well-documented with a practical example showing how to integrate external LLM scoring.

libs/enclave-vm/src/scoring/index.ts (1)

69-72: LGTM! Properly exposed in the public API.

The CustomAnalyzer type is correctly exported and well-categorized under extensibility types.

libs/enclave-vm/src/scoring/__tests__/local-llm-scorer.spec.ts (7)

565-599: LGTM! Comprehensive test of custom analyzer integration.

The test properly verifies that the custom analyzer is invoked and its results are used in the scoring output.


625-643: LGTM! Proper verification of cleanup.

The test correctly verifies that the custom analyzer's dispose() method is invoked during scorer disposal.


645-669: LGTM! Validates default heuristic behavior.

The test properly verifies that built-in heuristics are used when no custom analyzer is configured.


671-698: LGTM! Validates proper data flow to custom analyzer.

The test confirms that both the generated prompt and extracted features are correctly passed to the custom analyzer.


700-723: LGTM! Proper error handling validation.

The test confirms that the scorer gracefully falls back to rule-based scoring when the custom analyzer fails, maintaining system resilience.


725-749: LGTM! Good edge case coverage.

The test ensures that custom analyzer scores exceeding the valid range are properly clamped to 100.


751-775: LGTM! Validates risk level calculation.

The test confirms that custom analyzer scores are correctly mapped to risk levels.

libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts (5)

4-11: LGTM! Improved documentation.

The updated documentation clearly explains the built-in heuristics and custom analyzer options with practical examples.

Also applies to: 43-63


152-175: LGTM! Robust custom analyzer integration with fallback.

The implementation properly uses the custom analyzer when configured and gracefully falls back to rule-based scoring on errors.


210-226: LGTM! Wasteful embedding computation removed.

The refactored scoring methods now delegate to analyzePrompt, which properly routes to either the custom analyzer or built-in heuristics. This addresses the previous review comments about unused embedding computations and improves performance.

Also applies to: 231-253


308-324: LGTM! Clean separation of concerns.

The new analyzePrompt method provides a clear dispatch point between custom analyzer and built-in heuristics. The renamed analyzeWithHeuristics method makes its purpose explicit.

Also applies to: 326-413


432-438: LGTM! Proper resource cleanup.

The dispose method correctly cleans up the custom analyzer using optional chaining.

libs/enclave-vm/src/__tests__/perf/enclave.perf.spec.ts (4)

15-46: LGTM!

The cold start latency test correctly measures initialization and first execution time by creating fresh Enclave instances for each iteration. The thresholds (p50 < 100ms, p95 < 500ms) are reasonable benchmarks for cold start performance.


233-290: LGTM!

The tool call overhead test effectively characterizes the per-call cost by measuring across different call counts (0, 1, 5, 10, 20) and computing the incremental overhead. The assertion (line 288) correctly validates that increased tool calls result in increased execution time.


292-340: LGTM!

The code complexity scaling test appropriately measures how execution time scales with code size by varying the number of variables (1, 10, 50, 100). The assertion that the largest code completes under 200ms (line 338) provides a reasonable performance bound.


342-402: LGTM!

The memory usage tests provide comprehensive coverage of memory behavior: single instance footprint, repeated execution growth, and dispose cleanup verification. The use of trackMemory utility and conditional GC invocation (lines 390-392) ensures accurate measurement. The thresholds (10MB, 100MB, 20MB) are reasonable bounds for detecting memory leaks or excessive usage.

libs/enclave-vm/src/__tests__/perf/double-vm.perf.spec.ts (2)

16-85: LGTM - Note on permissive assertions.

The test correctly measures the overhead of double VM compared to single VM. The assertions (lines 82, 84) are intentionally permissive to account for the security benefit of double VM:

  • Lower bound: double VM >= 0.5x single VM (prevents unrealistic speedup)
  • Upper bound: double VM p95 < 10x single VM p95 (prevents excessive overhead)

These sanity-check bounds are reasonable for a performance characterization test.


385-422: LGTM!

The memory overhead test correctly compares single VM and double VM memory usage. The test properly measures peak heap usage, disposes of resources, and validates that double VM memory footprint remains under 20MB. The use of trackMemory utility ensures accurate measurement.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @libs/enclave-vm/src/index.ts:
- Line 34: The export PartialDoubleVmConfig is missing explicit documentation in
the Double VM docs; update docs/draft/docs/libraries/enclave.mdx to add a
dedicated section for PartialDoubleVmConfig that names the exported type,
describes its purpose (a partial/optional config for Double VM initialization),
lists and explains each optional property, shows usage examples (when to pass a
PartialDoubleVmConfig vs the full config), and cross-references the public API
export PartialDoubleVmConfig so consumers understand what they import from the
library.
🧹 Nitpick comments (2)
scripts/format-perf-summary.mjs (1)

65-109: LGTM! Robust metrics flattening and formatting logic.

The key metrics section handles the complex flattening, sorting, and formatting well:

  • Type validation before array/object iteration (lines 74, 78)
  • Failures prioritized in sort (line 86) for better CI visibility
  • 30-metric limit (line 93) keeps output manageable
  • thresholdType logic (line 97) correctly handles both min and max thresholds
  • Comprehensive fallback values throughout

If metric names or values might contain markdown special characters (|, *, etc.), consider escaping them to prevent table formatting issues. However, this is low-risk for a CI-focused script.

🔎 Example escaping helper

Add before line 65:

// Escape markdown special characters in table cells
const escapeMarkdown = (str) => String(str).replace(/[|*_`[\]]/g, '\\$&');

Then on line 102, wrap dynamic values:

-    console.log(`| ${name} | ${value} ${unit} | ${threshold} | ${status} |`);
+    console.log(`| ${escapeMarkdown(name)} | ${escapeMarkdown(value)} ${escapeMarkdown(unit)} | ${threshold} | ${status} |`);
libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts (1)

343-372: Consider reporting all matching risk keywords, not just the first.

The current implementation breaks after detecting the first critical or high-risk keyword. If a prompt contains multiple high-risk terms (e.g., both "password" and "secret"), only one signal is generated. Reporting all matches would provide richer context for security reviewers and more accurately reflect the risk profile.

🔎 Proposed enhancement to report all keywords
     // Check for critical keywords
+    const criticalMatches: string[] = [];
     for (const keyword of RISK_KEYWORDS.critical) {
       if (promptLower.includes(keyword)) {
-        const score = 25;
-        totalScore += score;
-        signals.push({
-          id: 'ML_CRITICAL_KEYWORD',
-          score,
-          description: `Critical security keyword detected: ${keyword}`,
-          level: 'critical',
-          context: { keyword },
-        });
-        break; // Only add once
+        criticalMatches.push(keyword);
       }
     }
+    if (criticalMatches.length > 0) {
+      const score = 25;
+      totalScore += score;
+      signals.push({
+        id: 'ML_CRITICAL_KEYWORD',
+        score,
+        description: `Critical security keyword(s) detected: ${criticalMatches.join(', ')}`,
+        level: 'critical',
+        context: { keywords: criticalMatches },
+      });
+    }

     // Check for high risk keywords
+    const highRiskMatches: string[] = [];
     for (const keyword of RISK_KEYWORDS.high) {
       if (promptLower.includes(keyword)) {
-        const score = 15;
-        totalScore += score;
-        signals.push({
-          id: 'ML_HIGH_RISK_KEYWORD',
-          score,
-          description: `High risk keyword detected: ${keyword}`,
-          level: 'high',
-          context: { keyword },
-        });
-        break;
+        highRiskMatches.push(keyword);
       }
     }
+    if (highRiskMatches.length > 0) {
+      const score = 15;
+      totalScore += score;
+      signals.push({
+        id: 'ML_HIGH_RISK_KEYWORD',
+        score,
+        description: `High risk keyword(s) detected: ${highRiskMatches.join(', ')}`,
+        level: 'high',
+        context: { keywords: highRiskMatches },
+      });
+    }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15fd465 and 9131b7d.

📒 Files selected for processing (5)
  • libs/enclave-vm/README.md
  • libs/enclave-vm/src/index.ts
  • libs/enclave-vm/src/scoring/__tests__/local-llm-scorer.spec.ts
  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
  • scripts/format-perf-summary.mjs
🚧 Files skipped from review as they are similar to previous changes (2)
  • libs/enclave-vm/README.md
  • libs/enclave-vm/src/scoring/tests/local-llm-scorer.spec.ts
🧰 Additional context used
📓 Path-based instructions (1)
libs/**

⚙️ CodeRabbit configuration file

libs/**: Contains publishable SDK libraries. Review for API correctness, breaking changes, and consistency with docs. When public APIs change, ensure there is a matching docs/draft/docs/** update (not direct edits under docs/docs/**).

Files:

  • libs/enclave-vm/src/index.ts
  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
🧬 Code graph analysis (2)
libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts (2)
libs/enclave-vm/src/scoring/scorers/rule-based.scorer.ts (1)
  • RuleBasedScorer (61-364)
libs/enclave-vm/src/scoring/types.ts (3)
  • ExtractedFeatures (201-226)
  • ScoringResult (261-286)
  • RiskSignal (231-256)
scripts/format-perf-summary.mjs (1)
libs/enclave-vm/src/sidecar/reference-resolver.ts (1)
  • resolve (167-213)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test
  • GitHub Check: Performance
🔇 Additional comments (5)
scripts/format-perf-summary.mjs (2)

1-63: LGTM! Well-structured script with comprehensive error handling.

The setup, file validation, JSON parsing, and initial output sections are all well-implemented:

  • Appropriate exit codes (0 for missing file to avoid failing CI, 1 for actual errors)
  • Type validation before accessing properties
  • Fallback values throughout (e.g., 'N/A', default objects)
  • Division-by-zero protection on line 58

111-135: LGTM! Clean detailed results section with good UX.

The collapsible detailed results section and footer are well-implemented:

  • Collapsible <details> section improves readability for large result sets
  • Type validation before accessing result properties (line 120)
  • Fallback values for all fields
  • Timestamp fallback on line 134 ensures footer always displays
libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts (3)

138-142: Good addition—custom analyzer now initialized regardless of model load outcome.

This addresses the past review comment about ensuring custom analyzer initialization is called consistently. The try-catch wrapper is appropriate since the custom analyzer may not depend on the model, and initialization failures shouldn't prevent fallback scoring from working.


158-210: LGTM—scoring flow refactored correctly with proper fallback handling.

The new scoring flow properly prioritizes the custom analyzer when configured, falls back to rule-based scoring when the pipeline is unavailable, and includes error handling at each branch. The delegation to scoreWithClassification and scoreWithSimilarity is clean, and all error paths lead to appropriate fallback behavior.


442-442: LGTM—custom analyzer disposal handled correctly.

The optional chaining ensures the custom analyzer's dispose() method is called when present, properly cleaning up resources. This is consistent with how fallbackScorer disposal is handled.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @.github/workflows/push.yml:
- Around line 135-141: The "Restore build artifacts" cache step (uses:
actions/cache/restore@v4, key: build-${{ github.sha }}) can silently miss and
cause flaky builds; update that step to include fail-on-cache-miss: true to fail
fast on a miss, or add an explicit fallback (e.g., run npm ci / restore
artifacts) immediately after the cache restore to handle cache misses
deterministically; ensure the change is applied to the step named "Restore build
artifacts" in .github/workflows/push.yml.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/push.yml (1)

46-64: Build artifacts may be cached even when the build fails.

The cache save step at lines 58-64 runs unconditionally after the build steps. If both the initial build (line 48) and the retry build (line 56) fail, the workflow will still cache potentially incomplete or broken artifacts, which will then be restored by the test and performance jobs.

Consider conditioning the cache save on build success:

🔎 Proposed fix
+      - name: Track build success
+        id: build-status
+        if: steps.build.outcome == 'success' || success()
+        run: echo "success=true" >> $GITHUB_OUTPUT
+
       - name: Cache build artifacts
+        if: steps.build-status.outputs.success == 'true'
         uses: actions/cache/save@v4
         with:
           path: |
             node_modules
             libs/*/dist
           key: build-${{ github.sha }}

Alternatively, fail the build job explicitly when both build attempts fail, so dependent jobs don't run at all.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9131b7d and 01df5fb.

📒 Files selected for processing (1)
  • .github/workflows/push.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Performance
  • GitHub Check: Test
🔇 Additional comments (3)
.github/workflows/push.yml (3)

101-114: Good addition for test failure visibility.

The failure logging to $GITHUB_STEP_SUMMARY with context extraction and output truncation is a practical approach. The explicit failure step at lines 112-114 ensures the job properly signals failure status.


161-168: LGTM!

The artifact upload configuration is well-structured with appropriate handling for missing files and reasonable retention.


147-159: The formatting script is properly implemented and handles edge cases.

scripts/format-perf-summary.mjs exists and includes comprehensive error handling: JSON parsing errors are caught with user-friendly messages, missing fields default to safe values, and invalid nested structures are skipped or validated before processing. The script exits gracefully for missing files and with error codes for invalid data, making the workflow's continue-on-error: true appropriate.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
scripts/format-perf-summary.mjs (1)

109-109: Consider using nullish coalescing for better falsy value handling.

The || operator on line 109 could incorrectly display 'N/A' for valid falsy non-numeric values like boolean false. While performance metrics are unlikely to be boolean values in practice, using ?? would be more technically correct.

🔎 Suggested refinement
-    const value = typeof metric.value === 'number' ? metric.value.toFixed(2) : escapeMarkdown(String(metric.value || 'N/A'));
+    const value = typeof metric.value === 'number' ? metric.value.toFixed(2) : escapeMarkdown(String(metric.value ?? 'N/A'));
libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts (1)

33-38: Remove or implement unused keyword categories.

The medium and low risk keyword categories are defined but never used in analyzeWithHeuristics. Consider either implementing detection for these categories or removing them to avoid dead code.

🔎 Option 1: Remove unused categories
 const RISK_KEYWORDS = {
   critical: ['password', 'secret', 'apikey', 'token', 'credential', 'private_key'],
   high: ['exfiltration', 'send', 'webhook', 'upload', 'transfer', 'email'],
-  medium: ['limit:999', 'bulk', 'batch', 'all', 'wildcard'],
-  low: ['loop', 'iterate', 'list', 'query'],
 };

Option 2: Implement detection (add to analyzeWithHeuristics after line 384)

// Check for medium risk keywords
const mediumMatches: string[] = [];
for (const keyword of RISK_KEYWORDS.medium) {
  if (promptLower.includes(keyword)) {
    mediumMatches.push(keyword);
  }
}
if (mediumMatches.length > 0) {
  const score = Math.min(20, 10 + (mediumMatches.length - 1) * 3);
  totalScore += score;
  signals.push({
    id: 'ML_MEDIUM_RISK_KEYWORD',
    score,
    description: `Medium risk keyword${mediumMatches.length > 1 ? 's' : ''} detected: ${mediumMatches.join(', ')}`,
    level: 'medium',
    context: { keywords: mediumMatches },
  });
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01df5fb and f6e91bf.

📒 Files selected for processing (3)
  • docs/draft/docs/libraries/enclave.mdx
  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
  • scripts/format-perf-summary.mjs
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/draft/docs/libraries/enclave.mdx
🧰 Additional context used
📓 Path-based instructions (1)
libs/**

⚙️ CodeRabbit configuration file

libs/**: Contains publishable SDK libraries. Review for API correctness, breaking changes, and consistency with docs. When public APIs change, ensure there is a matching docs/draft/docs/** update (not direct edits under docs/docs/**).

Files:

  • libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test
  • GitHub Check: Performance
🔇 Additional comments (11)
scripts/format-perf-summary.mjs (6)

1-22: Well-structured script with good documentation and defensive helper function.

The shebang, imports, and escapeMarkdown helper are implemented correctly. The nullish coalescing operator on line 20 properly handles null/undefined values, and the escaping covers the necessary markdown special characters for table cells.


24-48: Excellent error handling and validation.

The file handling gracefully manages missing files (exit 0) while properly failing on parse errors (exit 1). The validation logic and error messages are clear and appropriate for a CI script.


50-73: Clean output formatting with proper defensive coding.

The environment and test summary sections handle missing data gracefully with fallback values, and the pass rate calculation safely avoids division by zero. The conditional output for failed tests keeps the summary clean.


75-108: Metrics processing logic is well-implemented.

The flattening, sorting, and threshold formatting logic is correct. The comment on lines 105-106 now correctly references thresholdType (the previous issue was properly addressed), and the threshold operator logic on line 107 handles both upper-bound and lower-bound thresholds appropriately.


110-119: Good handling of metric limits and formatting.

The 30-metric limit is clearly communicated with the overflow message, and the remaining formatting logic (unit, name escaping) is implemented correctly.


121-145: Detailed results and footer are well-implemented.

The collapsible details section provides good UX, defensive validation prevents issues with malformed entries, and the timestamp fallback ensures the footer always has a valid date. All values are properly escaped for markdown output.

libs/enclave-vm/src/scoring/scorers/local-llm.scorer.ts (5)

4-8: Documentation improvements look good.

The updated documentation clearly describes the flexible scoring architecture with built-in heuristics and custom analyzer options. The examples are helpful for users.

Also applies to: 43-63


158-181: Scoring flow logic is correct.

The prioritization of custom analyzer with graceful fallback to rule-based scorer is well-structured. The control flow ensures that when a custom analyzer is configured, it's attempted first, and only falls back on errors.

Also applies to: 183-190


342-384: Enhanced keyword detection logic is well-designed.

The multi-match detection with progressive scoring and caps is a good improvement. It provides more detailed signals by reporting all matching keywords while preventing score inflation through the caps.


436-445: Public API additions are well-designed.

The getConfig() and isUsingFallback() methods provide useful introspection capabilities while properly protecting internal state through readonly typing.


454-454: Custom analyzer disposal is correctly implemented.

The disposal of custom analyzer resources uses proper optional chaining and follows the same pattern as the fallback scorer disposal.

@frontegg-david frontegg-david merged commit 6a12330 into main Jan 6, 2026
6 checks passed
@frontegg-david frontegg-david deleted the ehance-for-iteration branch January 6, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants