Skip to content

fix(function): isolated-vm worker pool to prevent single-worker bottleneck + execution user id resolution #3155

Merged
waleedlatif1 merged 10 commits intostagingfrom
fix/pool
Feb 7, 2026
Merged

fix(function): isolated-vm worker pool to prevent single-worker bottleneck + execution user id resolution #3155
waleedlatif1 merged 10 commits intostagingfrom
fix/pool

Conversation

@waleedlatif1
Copy link
Collaborator

@waleedlatif1 waleedlatif1 commented Feb 6, 2026

Summary

  • Replaced single isolated-vm worker process with a configurable pool (default 4 workers)
  • Executions are distributed across workers using least-loaded selection
  • Workers spawn lazily and clean up after idle timeout
  • Added env vars for pool tuning (IVM_POOL_SIZE, IVM_MAX_CONCURRENT, IVM_MAX_PER_WORKER, IVM_WORKER_IDLE_TIMEOUT_MS, IVM_QUEUE_TIMEOUT_MS)
  • Defaults are permissive (10k concurrent, 2500/worker, 5min queue timeout) — no behavior change for existing users
  • Fix User Id resolution to be in line with billing. For both func owner keys and permission groups.

Type of Change

  • Bug fix

Testing

Tested manually. Added test.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Feb 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Feb 7, 2026 0:36am

Request Review

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

  • Replaces single isolated-vm worker execution with a configurable worker pool, distributing work to the least-loaded worker and draining queued executions with weighted fairness.
  • Adds queue sizing, per-owner queued/active limits, and optional Redis-based distributed in-flight leasing to enforce per-owner concurrency across instances.
  • Updates function/execute and workflow execute routes to pass owner/actor context so scheduling and billing/permission checks attribute work to the authenticated user.
  • Extends env config with IVM_* tuning and safety limits (fetch URL/options/response/stdout caps), and adds/updates Vitest coverage for pool/queue/limits behavior.

Confidence Score: 4/5

  • This PR is reasonably safe to merge, but only after addressing the two previously-raised isolated-vm.ts issues.
  • Core pool/queue/lease changes are cohesive and tests cover key behaviors (spawn failure recovery, queue limits, per-owner limits, Redis lease fail-closed, and weighted drain ordering). Confidence is reduced due to two existing review threads on isolated-vm.ts that still require author action before merge.
  • apps/sim/lib/execution/isolated-vm.ts

Important Files Changed

Filename Overview
apps/sim/lib/execution/isolated-vm.ts Major rewrite implementing worker pool architecture with fair scheduling. Adds multi-worker management, weighted round-robin dispatch, distributed Redis leasing for cross-replica rate limiting, and per-owner queue limits. Complex but well-structured with proper error handling and resource cleanup. Two issues previously flagged at lines 269 (bootstrap injection) and 594 (cleanupWorker). Otherwise solid implementation.
apps/sim/lib/core/config/env.ts Adds 17 new IVM_* environment variables for worker pool configuration with sensible defaults. All string types with optional defaults following existing patterns. No issues.
apps/sim/app/api/workflows/[id]/execute/route.ts Adds useAuthenticatedUserAsActor logic for client sessions and personal API keys. Correctly determines when to bill authenticated user vs workspace account. No issues.
apps/sim/app/api/function/execute/route.ts Adds ownerKey and ownerWeight parameters to executeInIsolatedVM call for fair scheduling. Uses user:userId format. Simple and correct integration.
apps/sim/lib/execution/isolated-vm.test.ts New comprehensive test suite covering worker pool recovery from spawn failures, queue capacity limits, per-owner limits, distributed lease limits, Redis unavailability, weighted scheduling, and fetch payload limits. Good coverage of edge cases.
apps/sim/lib/execution/sandbox-fetch-proxy.ts Adds/enforces fetch proxy input/output caps (URL length, options JSON size, response size) used by isolated-vm worker execution.

Sequence Diagram

sequenceDiagram
  participant Client
  participant API as /api/function/execute
  participant IVM as isolated-vm pool
  participant Redis as Redis (optional)
  participant W as Worker Process

  Client->>API: POST execute(code, auth)
  API->>IVM: run({ownerKey, ownerWeight, code})
  opt Redis configured
    IVM->>Redis: eval acquireLease(ownerKey, ttl)
    Redis-->>IVM: granted/denied
    alt denied
      IVM-->>API: 429/limit error
      API-->>Client: error
    end
  end
  IVM->>IVM: enqueue or pick least-loaded worker
  IVM->>W: send execution request
  W-->>IVM: result/error
  opt Redis configured
    IVM->>Redis: eval releaseLease(ownerKey)
  end
  IVM-->>API: output
  API-->>Client: output
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@icecrasher321 icecrasher321 changed the title fix(executor): isolated-vm worker pool to prevent single-worker bottleneck fix(function): isolated-vm worker pool to prevent single-worker bottleneck Feb 6, 2026
@icecrasher321
Copy link
Collaborator

@cursor review

@icecrasher321 icecrasher321 changed the title fix(function): isolated-vm worker pool to prevent single-worker bottleneck fix(function): isolated-vm worker pool to prevent single-worker bottleneck + execution user id resolution Feb 6, 2026
@icecrasher321
Copy link
Collaborator

@cursor review

1 similar comment
@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Additional Comments (1)

apps/sim/lib/execution/isolated-vm.ts
Distributed lease leaked on enqueue

executeInIsolatedVM() acquires a distributed in-flight lease before deciding to enqueue, but on the enqueue path the lease is never released because enqueueExecution() doesn’t complete the promise and .finally() won’t run until the queued job times out or is executed. That means queued requests hold Redis leases for their entire queue wait, quickly exhausting IVM_DISTRIBUTED_MAX_INFLIGHT_PER_OWNER and causing subsequent requests from the same owner to be rejected even though they’re only queued. Fix by acquiring the lease only when dispatching to a worker, or by releasing the lease immediately when enqueueing (and reacquiring on dispatch).

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@waleedlatif1 waleedlatif1 merged commit 0ca25bb into staging Feb 7, 2026
12 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/pool branch February 7, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants