Skip to content

fix(serverless-hono): defer waitUntil cleanup to prevent tool crashes…#1191

Merged
omeraplak merged 1 commit intoVoltAgent:mainfrom
ravyg:fix/serverless-waituntil-cleanup
Apr 8, 2026
Merged

fix(serverless-hono): defer waitUntil cleanup to prevent tool crashes…#1191
omeraplak merged 1 commit intoVoltAgent:mainfrom
ravyg:fix/serverless-waituntil-cleanup

Conversation

@ravyg
Copy link
Copy Markdown
Contributor

@ravyg ravyg commented Apr 6, 2026

Summary

The finally block in toCloudflareWorker(), toVercelEdge(), and toDeno() calls cleanup() as soon as the Response object is returned — before streaming
and tool execution complete. This clears the global ___voltagent_wait_until while tools are still using it, causing crashes.

Fix: schedule cleanup through the platform's own waitUntil() so it runs only after all pending promises (streaming, tools, observability exports) have settled.
Falls back to synchronous cleanup when waitUntil is unavailable (non-serverless environments).

Fixes #1186

PR Checklist

What is the current behavior?

When an agent runs inside Cloudflare Workers (or Vercel Edge / Deno) and calls a tool that takes more than a few seconds, the agent crashes after the
tool-input-available event. This happens because:

  1. withWaitUntil(executionCtx) stores the platform's waitUntil function in a global (___voltagent_wait_until)
  2. The response is returned from app.fetch()
  3. The finally block fires immediately and calls cleanup(), which clears the global
  4. Streaming and tool execution are still in progress — they try to use the now-cleared global and crash

What is the new behavior?

Cleanup is deferred through the platform's own waitUntil() mechanism. The new deferCleanup() helper:

  • If the platform context has waitUntil — schedules cleanup as a microtask through it (runs after streaming/tools/exports finish)
  • If waitUntil throws (response already committed) or isn't available — falls back to synchronous cleanup (same behavior as before)

One file changed, one helper function added, applied to all three platform methods (toCloudflareWorker, toVercelEdge, toDeno). No API changes, fully
backward compatible.

Notes for reviewers

  • 7 new tests added for deferCleanup() covering: deferred cleanup, null/undefined/no-waitUntil fallbacks, waitUntil-throws fallback, and non-function waitUntil
  • The 2 pre-existing test failures in wait-until-wrapper.spec.ts reproduce on upstream main — not related to this change
  • The @voltagent/resumable-streams test failure is also pre-existing on main (vitest config issue)
  • Build passes across all 28 packages
  • No new dependencies added
  • Docs not needed — internal behavior fix with no API surface change

Summary by cubic

Defers cleanup of the serverless waitUntil context in @voltagent/serverless-hono to keep ___voltagent_wait_until alive until all background tasks settle, preventing crashes on Cloudflare Workers, Vercel Edge, and Deno. Adds deferCleanup() and applies it to toCloudflareWorker, toVercelEdge, and toDeno. Fixes #1186.

  • Bug Fixes
    • Wraps ___voltagent_wait_until with a tracking proxy and schedules cleanup via platform waitUntil(); handles late-registered promises.
    • Falls back to synchronous cleanup when waitUntil is missing or throws; no API changes.

Written for commit 1c9a310. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes

    • Prevented premature cleanup in serverless and edge runtimes so streaming responses, background tasks, and long-running tools complete before cleanup runs.
  • Documentation

    • Added a release note documenting the updated cleanup behavior across supported serverless platforms.
  • Tests

    • Added tests validating deferred cleanup behavior and fallback handling when platform waitUntil is unavailable.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 6, 2026

🦋 Changeset detected

Latest commit: 1c9a310

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@voltagent/serverless-hono Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a39fb6d5-544a-40e0-a900-bc0465337bb7

📥 Commits

Reviewing files that changed from the base of the PR and between 33946ed and 1c9a310.

📒 Files selected for processing (3)
  • .changeset/fix-serverless-waituntil-cleanup.md
  • packages/serverless-hono/src/serverless-provider.ts
  • packages/serverless-hono/src/utils/defer-cleanup.spec.ts
✅ Files skipped from review due to trivial changes (2)
  • .changeset/fix-serverless-waituntil-cleanup.md
  • packages/serverless-hono/src/utils/defer-cleanup.spec.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/serverless-hono/src/serverless-provider.ts

📝 Walkthrough

Walkthrough

Defers clearing of the global ___voltagent_wait_until by scheduling cleanup through the platform waitUntil() when available, replacing immediate cleanup in finally blocks so streaming/background tasks can complete before cleanup runs.

Changes

Cohort / File(s) Summary
Changelog & Metadata
.changeset/fix-serverless-waituntil-cleanup.md
Adds a Changeset documenting the patch that updates cleanup behavior for toCloudflareWorker(), toVercelEdge(), and toDeno() to use waitUntil() rather than immediate cleanup.
Core Implementation
packages/serverless-hono/src/serverless-provider.ts
Introduces exported deferCleanup(context, cleanup) which uses context.waitUntil to defer cleanup (with synchronous fallback). Replaces direct cleanup() calls in finally blocks of platform adapters and narrows context typing to `WaitUntilContext
Test Coverage
packages/serverless-hono/src/utils/defer-cleanup.spec.ts
New Vitest suite covering deferred cleanup behavior, fallback paths (null/undefined/non-callable), error handling when waitUntil throws, and ensuring cleanup runs once after all registered promises settle.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as Request Handler
    participant Global as globalThis.___voltagent_wait_until
    participant Platform as Platform.waitUntil
    participant Tools as Background / Streaming Tools
    Handler->>Global: install wrapper that forwards to Platform.waitUntil
    Handler->>Tools: return Response (streaming/background tasks continue)
    Handler->>Handler: finally -> deferCleanup(context, cleanup)
    note right of Handler: deferCleanup checks context.waitUntil
    alt context.waitUntil is callable
        Handler->>Platform: schedule cleanup via waitUntil(promises)
        Tools->>Global: register promises via wrapper
        Platform->>Platform: wait for all promises (including late registrations)
        Platform->>Handler: invoke cleanup after settle
    else no waitUntil
        Handler->>Handler: call cleanup() synchronously
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble bugs and patch the trail,
A waitUntil hug to mend the tale.
Streams can hum and tools can play,
Cleanup waits — then hops away. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main fix: deferring waitUntil cleanup to prevent tool crashes in serverless environments. It directly relates to the core change in the PR.
Description check ✅ Passed The PR description comprehensively covers current behavior, new behavior, includes the related issue link, and confirms all checklist items are satisfied (commit convention, tests, changesets).
Linked Issues check ✅ Passed The PR fully addresses issue #1186 by implementing deferred cleanup via the platform's waitUntil mechanism, preventing crashes when tools run for several seconds by keeping the global ___voltagent_wait_until alive until all operations complete.
Out of Scope Changes check ✅ Passed All changes are in scope: the deferCleanup helper, updates to three platform adapters, a changeset entry, and comprehensive tests for the new functionality. No extraneous modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/serverless-hono/src/utils/defer-cleanup.spec.ts (1)

76-79: Avoid as any in the test case.

Line 78 weakens type safety in violation of the coding guideline. Replace with as unknown as WaitUntilContext to safely test the scenario where waitUntil is not a function while preserving type information.

♻️ Suggested change
-    const context = { waitUntil: "not a function" } as any;
+    const context = { waitUntil: "not a function" } as unknown as WaitUntilContext;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/serverless-hono/src/utils/defer-cleanup.spec.ts` around lines 76 -
79, Replace the unsafe cast in the test "should handle context with non-function
waitUntil" by changing the context declaration to use a double-cast through
unknown into the proper WaitUntilContext type (i.e., replace "as any" with "as
unknown as WaitUntilContext") so the test still supplies waitUntil: "not a
function" while preserving TypeScript type-safety; update the variable named
context in defer-cleanup.spec.ts accordingly and keep the rest of the test
(cleanup = vi.fn()) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/serverless-hono/src/serverless-provider.ts`:
- Line 76: The code uses unsafe "as any" casts when calling deferCleanup and
withWaitUntil; instead narrow the unknown context values to WaitUntilContext |
null | undefined before passing them. Locate the Cloudflare branch where
executionCtx is used with deferCleanup/withWaitUntil, the Vercel branch where
context is passed, and the Deno branch where info is passed, and replace the
casts by checking/typing those values (e.g., const waitCtx = executionCtx as
WaitUntilContext | null | undefined) or by a type guard, then call
withWaitUntil(waitCtx, ...) and deferCleanup(waitCtx, cleanup) so type safety is
preserved for the functions deferCleanup and withWaitUntil.
- Around line 19-23: The current use of Promise.resolve().then(cleanup) after
context.waitUntil(...) does not guarantee cleanup runs after all
subsequently-registered waitUntil promises; replace the microtask trick with an
explicit synchronization barrier: introduce a deferred "lifetime" Promise and
reference-counting or open/close semantics (e.g., increment a counter when
background work starts and decrement when it finishes) that you resolve when no
more background tasks remain, then call context.waitUntil(lifetimePromise) and
only run cleanup after that lifetimePromise settles; update any code paths that
previously pushed promises via context.waitUntil to instead register against
this shared lifetime barrier so cleanup (the cleanup function and the
___voltagent_wait_until global management) always runs last.

---

Nitpick comments:
In `@packages/serverless-hono/src/utils/defer-cleanup.spec.ts`:
- Around line 76-79: Replace the unsafe cast in the test "should handle context
with non-function waitUntil" by changing the context declaration to use a
double-cast through unknown into the proper WaitUntilContext type (i.e., replace
"as any" with "as unknown as WaitUntilContext") so the test still supplies
waitUntil: "not a function" while preserving TypeScript type-safety; update the
variable named context in defer-cleanup.spec.ts accordingly and keep the rest of
the test (cleanup = vi.fn()) intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cd775cb-2228-442e-80cc-29f213f94881

📥 Commits

Reviewing files that changed from the base of the PR and between 3776cb6 and 33946ed.

📒 Files selected for processing (3)
  • .changeset/fix-serverless-waituntil-cleanup.md
  • packages/serverless-hono/src/serverless-provider.ts
  • packages/serverless-hono/src/utils/defer-cleanup.spec.ts

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/serverless-hono/src/serverless-provider.ts">

<violation number="1" location="packages/serverless-hono/src/serverless-provider.ts:21">
P2: Cleanup deferral uses an immediate microtask, which can run before all async tool/stream waitUntil usage completes, so global waitUntil may still be cleared too early.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

… in Cloudflare Workers

The `finally` block in toCloudflareWorker/toVercelEdge/toDeno calls
cleanup() as soon as the Response object is returned — before streaming
and tool execution complete. This clears the global ___voltagent_wait_until
while tools are still using it, causing crashes.

Fix: schedule cleanup through the platform's own waitUntil() so it runs
only after all pending promises (streaming, tools, observability exports)
have settled. Falls back to synchronous cleanup when waitUntil is
unavailable (non-serverless environments).

Fixes VoltAgent#1186
@ravyg ravyg force-pushed the fix/serverless-waituntil-cleanup branch from 33946ed to 1c9a310 Compare April 6, 2026 19:12
@omeraplak
Copy link
Copy Markdown
Member

Hey @ravyg ,
Thank you so much 🔥

@omeraplak omeraplak merged commit a21275f into VoltAgent:main Apr 8, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] time consuming tools are crashing in cloudflare workers

2 participants