Feat/test improvements #1579

amikofalvy · 2026-01-24T06:12:06Z

No description provided.

- Use Turbo's --filter='...[origin/main]' to only run checks on packages that have changed since origin/main - Add fallback to run full check if origin/main doesn't exist (fresh clone) - Add documentation comments explaining the affected-only behavior This significantly speeds up pre-push validation when only a few packages are modified, targeting < 30 seconds with cache. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Document the setup command: npx turbo login && npx turbo link - Document expected behavior: FULL TURBO and cache hits - Document how to verify caching: pnpm turbo run build --summarize - Include CI/CD configuration for TURBO_TOKEN and TURBO_TEAM Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…asks - Add test:fast task with dependsOn=['build'], cache=true for fast unit tests - Add test:slow task with dependsOn=['build'], cache=true for database-backed tests - Add test:integration task with dependsOn=['build'], cache=false for integration tests This enables the test pyramid optimization by separating tests into tiers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add test:fast script: turbo test:fast - Add test:slow script: turbo test:slow - Add test:integration script: turbo test:integration Enables running different test tiers from root with pnpm test:fast/slow/integration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create agents-api/vitest.fast.config.ts based on existing config - Set include to ['**/*.test.ts', '**/*.test.tsx'] - Set exclude to include *.slow.test.ts and *.integration.test.ts - Set testTimeout to 5000ms (5 seconds) - Enable fileParallelism for parallel test execution - Add test:fast script to agents-api/package.json Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create agents-api/vitest.slow.config.ts for database-backed tests - Set include to ['**/*.slow.test.ts'] - Set testTimeout to 60000ms (60 seconds) - Keep setup files that initialize PGlite from original config - Add test:slow script to agents-api/package.json Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create packages/agents-core/vitest.fast.config.ts - Set include to ['**/*.test.ts', '**/*.test.tsx'] - Set exclude for *.slow.test.ts, *.integration.test.ts, and integration/ - Set testTimeout to 5000ms (5 seconds) - Add test:fast script to packages/agents-core/package.json Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create packages/agents-core/vitest.slow.config.ts - Set include to ['**/*.slow.test.ts'] - Set testTimeout to 60000ms (60 seconds) - Add test:slow script to packages/agents-core/package.json Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Modify check:prepush task to depend on test:fast instead of test - This makes pre-push checks faster by running only fast unit tests - Slow and integration tests still run in CI for full coverage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create SLOW_TESTS_TO_RENAME.md listing 35 test files that use database - Tests identified by searching for createTestClient and PGlite imports - Document reason for classification (PGlite setup, migrations, timeouts) - Include renaming instructions using git mv Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rename 5 test files from *.test.ts to *.slow.test.ts: - agent.test.ts -> agent.slow.test.ts - agentFull.test.ts -> agentFull.slow.test.ts - apiKeys.test.ts -> apiKeys.slow.test.ts - artifactComponents.test.ts -> artifactComponents.slow.test.ts - contextConfigs.test.ts -> contextConfigs.slow.test.ts These tests use database operations and are now excluded from test:fast. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rename 30 additional test files from *.test.ts to *.slow.test.ts: - CRUD route tests (credentialReferences, credentialStores, dataComponents, etc.) - Evaluation CRUD tests (datasets, evaluators, etc.) - Data layer tests (agentFull, artifactComponents, conversations, etc.) - Run domain tests (dataChat, delegationTaskCreation) Delete SLOW_TESTS_TO_RENAME.md as all tests are now renamed. Total: 35 tests moved to slow tier, significantly reducing test:fast time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rename 4 test files that use database operations to *.slow.test.ts: - cascading-delete.test.ts -> cascading-delete.slow.test.ts - projectLifecycle.test.ts -> projectLifecycle.slow.test.ts - projectMetadata.test.ts -> projectMetadata.slow.test.ts - client.test.ts -> client.slow.test.ts These tests use PGlite/createTestClient and are now categorized properly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add scripts/generate-test-db-snapshot.ts that creates pre-compiled PGlite database snapshots with all migrations applied. This enables tests to skip migration execution for faster initialization. The script: - Creates PGlite instances for both manage and runtime databases - Applies all Drizzle migrations from packages/agents-core/drizzle/ - Exports database state using PGlite's dumpDataDir('gzip') method - Saves snapshots to test-fixtures/ directory Added `pnpm test:generate-snapshot` command to root package.json. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Modify test database client functions to automatically load pre-compiled snapshots when available, significantly speeding up test initialization. - Add snapshot loading support to createTestManageDatabaseClient - Add snapshot loading support to createTestRuntimeDatabaseClient - Add snapshot loading support to *NoMigrations variants - Update agents-api setup.ts to skip migrations when snapshots exist - Update agents-core setup.ts with improved logging - Add test-fixtures/ to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add step to generate PGlite database snapshots before running tests. This pre-compiles database migrations so tests don't need to run them individually, speeding up test initialization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Document analysis of why agents-cli tests run sequentially: - Identified sequential execution settings in vitest.config.ts and vitest.config.ci.ts - Analyzed 38 test files to understand parallelization blockers - Identified key blockers: global mock pollution, file system race conditions, process state modifications, and child process execution - Documented which tests need sequential execution (command tests with heavy mocking) - Documented which tests could safely run in parallel (generator tests, pure utilities) - Provided recommendations for potential parallelization strategy Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Mark US-017 as complete with notes about findings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Created vitest.parallel.config.ts for agents-cli that enables parallel execution for tests identified as safe in US-017 analysis: - Generator tests (13 files) - pure code generation functions - Pure utility tests (package-manager, url, json-comparator, templates, ci-environment) - Component parser test Config enables fileParallelism=true, singleThread=false, maxConcurrency=8 for up to ~6x speedup on generator tests while keeping existing sequential config for tests with global mocks, file system operations, or process spawning. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ared setup - Parallelize entity creation in createMultipleAgents helpers using Promise.all - agent.slow.test.ts: Two-phase parallel creation (agents, then sub-agents) - subAgents.slow.test.ts: Direct parallel sub-agent creation - Add shared tenant/project setup using beforeAll in tools.slow.test.ts - Convert 3 "not found" tests to use shared tenant infrastructure - Add OPTIMIZATION comments documenting the changes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add conditional logic to run affected-only checks on pull_request events - Use Turbo's --filter='...[origin/main]' to detect affected packages - Full check suite still runs on push to main/master for complete validation - Fetch main branch for comparison when running affected-only checks - Add comments explaining the affected-only testing strategy Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Configure Vitest to output JUnit XML test reports with per-test timing data for identifying slow tests and tracking performance trends. - Add JUnit reporter to vitest.config.ci.ts with output to test-results/ - Add CI step to upload test-results directory as GitHub Actions artifact - Document how to access and analyze timing reports in AGENTS.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

changeset-bot · 2026-01-24T06:12:11Z

⚠️ No Changeset found

Latest commit: 7974331

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2026-01-24T06:12:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
agents-api	Ready	Preview, Comment	Jan 24, 2026 6:15am
agents-docs	Error		Jan 24, 2026 6:15am
agents-manage-ui	Ready	Preview, Comment	Jan 24, 2026 6:15am

claude · 2026-01-24T06:12:29Z

Claude finished @amikofalvy's task —— View job

TypeScript Staff Engineer & System Architect Review

Reviewing this comprehensive test infrastructure overhaul. This PR introduces significant changes to testing strategy, CI/CD, and build processes.

Todo List:

Analyze test separation strategy (slow vs fast)
Review CI/CD pipeline modifications
Examine database setup and snapshot generation
Evaluate parallel testing implementation
Check for architectural and design considerations
Assess potential impacts and risks
Provide comprehensive feedback and recommendations

Executive Summary

This PR implements a sophisticated testing infrastructure overhaul with three key innovations:

Test categorization (fast vs slow) with separate configurations
Database snapshot generation for faster test initialization
Affected-only testing in CI to reduce build times

While the technical execution is solid, there are several architectural concerns and potential risks that warrant attention before merging.

🔥 Critical Issues

1. Database Snapshot Management & Race Conditions

Location: packages/agents-core/src/db/manage/test-manage-client.ts:34-82

Issue: The snapshot loading strategy has multiple concerning patterns:

if (snapshotPath) {
  const snapshotData = readFileSync(snapshotPath); // Synchronous file I/O
  const blob = new Blob([snapshotData], { type: 'application/gzip' });
  client = new PGlite({ loadDataDir: blob });
} else {
  client = new PGlite();
  // Falls back to migration
}

Problems:

Race condition risk: Multiple test files loading the same snapshot simultaneously could cause file contention
Memory overhead: Each test loads the entire compressed snapshot into memory (potentially 10-50MB per test instance)
Synchronous I/O: Blocking reads in test setup violates Node.js best practices
Snapshot versioning: No mechanism to detect stale snapshots after schema changes

Recommendation: Implement snapshot versioning with checksums and async loading:

// Add version checking
const snapshotVersion = await getSnapshotVersion(snapshotPath);
const currentSchemaVersion = await getCurrentSchemaVersion();
if (snapshotVersion !== currentSchemaVersion) {
  // Regenerate or fall back to migrations
}

2. Parallel Test Resource Management

Location: agents-api/vitest.fast.config.ts:24-27

Issue: Both fast and slow configs use identical threading parameters:

poolOptions: {
  threads: {
    maxThreads: 10,
    minThreads: 4,
  },
}

Problems:

Resource contention: Fast tests (5s timeout) and slow tests (60s timeout) competing for the same thread pool
Memory scaling: 10 threads × database snapshots × test isolation = potential 500MB-1GB memory usage
No resource bounds: No memory limits or CPU throttling for database operations

Recommendation: Differentiate resource allocation by test type:

// Fast tests - CPU bound, low memory
maxThreads: 8, minThreads: 2

// Slow tests - I/O bound, higher memory
maxThreads: 4, minThreads: 1

⚠️ High-Impact Concerns

3. CI Strategy Complexity vs. Value Trade-off

Location: .github/workflows/ci.yml:81-99

Issue: The affected-only testing strategy adds significant complexity:

if [ "${{ github.event_name }}" == "pull_request" ]; then
  git fetch origin main --depth=1 2>/dev/null || git fetch origin master --depth=1 2>/dev/null || true
  pnpm turbo run lint typecheck test --filter='...[origin/main]'

Concerns:

Silent failures: The || true could hide legitimate git fetch failures
Inconsistent behavior: Different execution paths for PRs vs pushes increases CI debugging complexity
False positives: Turbo's change detection might miss transitive dependencies (e.g., schema changes affecting tests in other packages)

Alternative approach: Consider GitHub's built-in changed files API:

- uses: dorny/paths-filter@v2
  with:
    filters: |
      agents-api: 'agents-api/**'
      agents-core: 'packages/agents-core/**'

4. Test File Naming Convention Inconsistency

Location: 39 renamed files from .test.ts to .slow.test.ts

Issue: The mass renaming creates maintenance overhead without clear categorization criteria.

Missing guidelines:

What qualifies as a "slow" test? (Database I/O? Duration? Setup complexity?)
How do developers know which config to use for new tests?
What's the migration strategy for existing tests that become slow over time?

Recommendation: Document explicit criteria in CONTRIBUTING.md:

## Test Categories
- **Fast tests** (`.test.ts`): < 5s, no database, pure logic
- **Slow tests** (`.slow.test.ts`): Database operations, external APIs, file I/O
- **Integration tests** (`.integration.test.ts`): End-to-end workflows

🧐 Architectural Design Concerns

5. Database Setup Strategy Anti-pattern

Location: agents-api/src/__tests__/setup.ts:127-167

Issue: Complex migration logic in global setup with path resolution heuristics:

const isInPackageDir = process.cwd().includes('agents-api');
const manageMigrationsPath = isInPackageDir
  ? '../packages/agents-core/drizzle/manage'
  : './packages/agents-core/drizzle/manage';

Problems:

Fragile path resolution: Breaks if directory structure changes or tests run from different contexts
Hidden dependencies: Migration paths are implicit, making the system harder to debug
Monorepo coupling: Tight coupling between test setup and repository structure

Better approach: Use explicit configuration or environment variables:

const MANAGE_MIGRATIONS_DIR = process.env.MANAGE_MIGRATIONS_DIR 
  || resolve(__dirname, '../../packages/agents-core/drizzle/manage');

6. Lack of Test Performance Monitoring

Location: CI configuration uploads test timing but no analysis

Missing capabilities:

No alerting for test performance regressions
No trending analysis of slow tests over time
No automatic promotion of fast tests that become slow

Recommendation: Implement performance budgets:

# In CI
- name: Check test performance budget
  run: |
    if [ "$(cat test-results/timing.json | jq '.maxDuration')" -gt "300000" ]; then
      echo "Test duration exceeded 5 minute budget"
      exit 1
    fi

🔍 Implementation Quality Issues

7. Error Handling Gaps

Location: Multiple files lack comprehensive error handling

Examples:

scripts/generate-test-db-snapshot.ts:42-47: Migration failures only log but don't provide recovery guidance
packages/agents-core/src/db/manage/test-manage-client.ts:110-111: Database cleanup errors silently ignored

8. Missing Integration Tests for New Infrastructure

Critical gap: No tests validate that:

Snapshot generation produces valid databases
Fast/slow test separation works correctly
Affected-only testing catches real issues

💡 Positive Aspects

The PR demonstrates solid engineering practices:

Comprehensive documentation: The CLI parallelization analysis in TEST_PARALLELIZATION_NOTES.md shows thoughtful analysis
Incremental adoption: Allowing fallback to migrations when snapshots aren't available
CI optimization: Affected-only testing will significantly reduce CI times for focused changes
Resource management: Thoughtful thread pool configuration for different test types

🎯 Recommendations

Immediate Actions (Before Merge):

Add snapshot versioning to prevent stale snapshot issues
Implement resource budgets for memory and CPU usage
Add integration tests for the new testing infrastructure
Document test categorization criteria for future maintainers

Follow-up Work (Post-merge):

Monitor test performance and adjust resource allocation
Implement test performance budgets in CI
Consider test sharding if the test suite continues growing

Long-term Considerations:

Database test strategy evolution: Consider containerized databases vs in-memory for better production parity
Test infrastructure as code: Move configuration to a dedicated directory for better maintenance

Summary

This is a well-engineered solution to real performance problems, but the complexity introduced requires careful management. The database snapshot strategy is innovative but needs safeguards against race conditions and stale data. The affected-only CI testing is valuable but adds debugging complexity.

Recommendation: Approve with required changes - address the snapshot versioning and resource management issues before merge, then monitor closely in the first few weeks to ensure the new infrastructure performs as expected.

amikofalvy and others added 25 commits January 24, 2026 02:15

[US-017] chore: update PRD and progress log

9525caa

Mark US-017 as complete with notes about findings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

[US-018] chore: update PRD and progress log

4bcb3cc

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

[US-020] chore: update PRD and progress log

8d57366

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

[US-021] chore: update PRD and progress log

7974331

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vercel bot deployed to Preview – agents-api January 24, 2026 06:13 View deployment

vercel bot deployed to Preview – agents-manage-ui January 24, 2026 06:14 View deployment

vercel bot had a problem deploying to Preview – agents-docs January 24, 2026 06:15 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/test improvements #1579

Feat/test improvements #1579

Uh oh!

amikofalvy commented Jan 24, 2026

Uh oh!

changeset-bot bot commented Jan 24, 2026

Uh oh!

vercel bot commented Jan 24, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/test improvements #1579

Are you sure you want to change the base?

Feat/test improvements #1579

Uh oh!

Conversation

amikofalvy commented Jan 24, 2026

Uh oh!

changeset-bot bot commented Jan 24, 2026

⚠️ No Changeset found

Uh oh!

vercel bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TypeScript Staff Engineer & System Architect Review

Todo List:

Executive Summary

🔥 Critical Issues

1. Database Snapshot Management & Race Conditions

2. Parallel Test Resource Management

⚠️ High-Impact Concerns

3. CI Strategy Complexity vs. Value Trade-off

4. Test File Naming Convention Inconsistency

🧐 Architectural Design Concerns

5. Database Setup Strategy Anti-pattern

6. Lack of Test Performance Monitoring

🔍 Implementation Quality Issues

7. Error Handling Gaps

8. Missing Integration Tests for New Infrastructure

💡 Positive Aspects

🎯 Recommendations

Immediate Actions (Before Merge):

Follow-up Work (Post-merge):

Long-term Considerations:

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Jan 24, 2026 •

edited

Loading

claude bot commented Jan 24, 2026 •

edited

Loading