Skip to content

Add streaming support for large file downloads (>2GB)#3222

Draft
epeicher wants to merge 9 commits intoWordPress:trunkfrom
epeicher:add-streaming-response-support
Draft

Add streaming support for large file downloads (>2GB)#3222
epeicher wants to merge 9 commits intoWordPress:trunkfrom
epeicher:add-streaming-response-support

Conversation

@epeicher
Copy link
Contributor

@epeicher epeicher commented Jan 30, 2026

Motivation for the change, related issues

When plugins like All-in-One WP Migration try to export large sites (>2GB), the download fails due to JavaScript's Uint8Array size limitations. The entire response was being buffered into memory before being sent to the client, which fails for files exceeding ~2GB.

This change adds streaming response support to the Playground CLI server, allowing large file downloads to stream directly to the client without buffering the entire response in memory.

  • Related to STU-73

Implementation details

@php-wasm/universal:

  • php-response.ts: Add getHeadersStream() to expose headers stream for Comlink serialization
  • api.ts: Fix StreamedPHPResponse transfer handler to include ReadableStreams in the transferables array (they weren't being transferred before)
  • php-request-handler.ts: Add requestStreamed() method for streaming requests
  • php-worker.ts: Add requestStreamed() method delegating to request handler

@wp-playground/cli:

  • load-balancer.ts: Add handleRequestStreamed() for load-balanced streaming
  • run-cli.ts: Use handleRequestStreamed instead of handleRequest
  • start-server.ts: Rewrite with streaming support:
    • Use duck-typing to detect StreamedPHPResponse (instanceof doesn't work across Comlink boundaries)
    • Stream responses when Content-Length is missing (unknown size) to safely handle potentially large responses
    • Stream responses with Content-Disposition: attachment or binary content types
    • Buffer small responses with known Content-Length for efficiency

Testing Instructions (or ideally a Blueprint)

Blueprint to install All in One Migration: install-aio-migration-blueprint.json

  1. Start the CLI server with a local WordPress installation mounted:
    npx nx dev playground-cli server --wp=6.8 --php=8.4 --login --auto-mount=/path/to/your/wordpress
    This mounts your local WordPress to /wordpress so exported files are written to the local filesystem.
  2. Login to wp-admin (default credentials: admin / password, or use --login to auto-login)
  3. Install and activate the All-in-One WP Migration plugin
  4. Create a large site (or import existing content to make it >2GB)
  5. Export the site using All-in-One WP Migration
  6. Verify the download completes successfully without memory errors

This change adds streaming response support to the Playground CLI server,
fixing an issue where downloading files larger than 2GB would fail due to
JavaScript's Uint8Array size limitations.

The problem occurred when plugins like All-in-One WP Migration tried to
export large sites - the entire response was buffered into memory before
being sent, which fails for files exceeding ~2GB.

Changes:

@php-wasm/universal:
- php-response.ts: Add getHeadersStream() to expose headers stream for
  Comlink serialization
- api.ts: Fix StreamedPHPResponse transfer handler to include ReadableStreams
  in the transferables array (they weren't being transferred before)
- php-request-handler.ts: Add requestStreamed() method for streaming requests
- php-worker.ts: Add requestStreamed() method delegating to request handler

@wp-playground/cli:
- load-balancer.ts: Add handleRequestStreamed() for load-balanced streaming
- run-cli.ts: Use handleRequestStreamed instead of handleRequest
- start-server.ts: Rewrite with streaming support:
  - Use duck-typing to detect StreamedPHPResponse (instanceof doesn't work
    across Comlink boundaries)
  - Stream responses when Content-Length is missing (unknown size) to safely
    handle potentially large responses
  - Stream responses with Content-Disposition: attachment or binary types
  - Buffer small responses with known Content-Length for efficiency
Fixes lint errors and test failures introduced by the streaming changes:

- php-request-handler.ts: Import StreamedPHPResponse as type-only since
  it's only used in type annotations. Fix unused expression by using
  optional chaining for cookie store.

- start-server.ts: Change to `import type` since all imports from
  @php-wasm/universal are only used as types.

- load-balancer.ts: Add .catch() to trackingPromise to prevent unhandled
  rejection warnings when requestStreamed throws an error. The actual
  error handling is done in start-server.ts.

- run-cli.spec.ts: Update error handling test to override requestStreamed
  instead of request, matching the new streaming API.
@epeicher epeicher force-pushed the add-streaming-response-support branch from e0e4ce6 to 862a955 Compare January 30, 2026 18:20
* @param request - PHP Request data.
* @returns A StreamedPHPResponse or PHPResponse for non-PHP files.
*/
async requestStreamed(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just replace the request() method with a streamed version? If anyone needs a buffered response, StreamedPHPResponse is easy to convert to PHPResponse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was initially considered, but I decided to adopt a backwards-compatible approach. However, I have applied an approach based on your suggestion while maintaining backwards compatibility as part of 59bb36e, which removes much duplication.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, PHPResponse has a different interface than StreamedPHPResponse. One way to solve this is by adding a few getters to StreamedPHPResponse to make it compatible with any code expecting a PHPResponse. That would allow us to only have a single request() method. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my Claude Code session, this is the response, and after asking multiple follow-up questions, I agree with that. Maybe that could be tackled as a follow-up? 🤔. The explanation below:

That's an interesting alternative. Let me analyze whether making StreamedPHPResponse API-compatible with PHPResponse would work.

The Core Challenge

The interfaces are fundamentally different:

// PHPResponse - sync properties
response.headers        // Record<string, string[]>
response.httpStatusCode // number
response.bytes          // Uint8Array

// StreamedPHPResponse - async properties  
response.headers        // Promise<Record<string, string[]>>
response.httpStatusCode // Promise<number>
response.stdoutBytes    // Promise<Uint8Array>

Options to Make Them Compatible

Option A: Add sync getters to StreamedPHPResponse that work after buffering

class StreamedPHPResponse {
  private _buffered?: PHPResponse;
  
  async buffer(): Promise<this> {
    this._buffered = await PHPResponse.fromStreamedResponse(this);
    return this;
  }
  
  // Sync getters that throw if not buffered
  get bytes(): Uint8Array {
    if (!this._buffered) throw new Error('Call buffer() first');
    return this._buffered.bytes;
  }
}

But callers would need to change from:

const response = await handler.request(req);
console.log(response.headers);

to:

const response = await (await handler.request(req)).buffer();
console.log(response.headers);

Option B: Make PHPResponse properties also async

This would be a massive breaking change affecting all callers.

Option C: Return a union type with type guards

Keep both types separate but provide good ergonomics for checking which one you have.

My Assessment

The current approach (having request() use requestStreamed() internally) gives us:

  • Zero breaking changes
  • No code duplication
  • Clean separation between buffered and streaming use cases

Making StreamedPHPResponse compatible would either require breaking changes or awkward APIs like mandatory .buffer() calls.

Copy link
Collaborator

@adamziel adamziel Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. How about adding a parameter to request(), then? I'd rather avoid a new method if we add it and then follow-up to remove it, that would also be a BC break.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this PR is mostly there if we make requestStreamed() private. Although the return signature of request() would grow so it would be breaking on the type level but not on the runtime level. Maybe we can turn it into request<T = PHPResponse>(): Promise<T> to keep the type-level compat?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. I guess a new method such as requestStreamed() makes this more visible to the API consumer, and there's not much duplication anyway. Alright, let's do this. But let's clean up some things about this PR – I'll leave notes inline in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you for all the notes. I will review them all.

Eliminates code duplication by having request() call requestStreamed()
and convert the result to a buffered PHPResponse. This addresses the
code review feedback about having two nearly identical methods.

Changes:
- Simplified request() to ~25 lines (was ~160 lines)
- Removed #spawnPHPAndDispatchRequest() - no longer needed
- Removed #dispatchToPHP() - no longer needed
- Removed unused PHPExecutionFailureError import

Benefits:
- URL resolution logic now exists only in requestStreamed()
- Backwards compatible - request() still returns PHPResponse
- Callers who need streaming can use requestStreamed() directly
- ~130 lines removed from the codebase
);

if (primaryPhp.isDir(fsPath)) {
// Ensure directory URIs have a trailing slash. Otherwise,
Copy link
Collaborator

@adamziel adamziel Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we retain all the inline documentation? Or did you find these doc strings not accurate or useful?

Copy link
Contributor Author

@epeicher epeicher Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely we should retain it, this was removed by mistake. Added as part of 3930991

/**
* Spawns a new PHP instance and dispatches a request to it with streaming.
*/
async #spawnPHPAndDispatchRequestStreamed(
Copy link
Collaborator

@adamziel adamziel Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to rename these methods? They're private – can we just make spawnPHPAndDispatchRequest return the StreamedPHPResponse now? Ditto for dispatchToPHPStreamed. Also, I know this is such a nitpick, but can we move these back after serveStaticFile? It's difficult to see what the changes are in the diff when the entire block of code is deleted on the left side and added on the right side.

Copy link
Contributor Author

@epeicher epeicher Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we don't need to rename them after the simplification in previous commits. I will keep the original name and move them back after serverStaticFile (this is not a nit at all). Done as part of be71915

exitCode: Promise<number>
) {
this.headersStream = headers;
this.#headersStream = headers;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encapsulation – the raw stream is an internal detail that can only be read once, so we expose it through getHeadersStream() for serialization and headers for parsed access, preventing accidental
misuse.

}

/**
* Determines if a response should be streamed based on its headers.
Copy link
Collaborator

@adamziel adamziel Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just always stream the response? Then we won't need any of these new functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! The shouldStreamResponse logic was added to avoid streaming overhead for small responses (regular page loads), but you're right — the overhead is minimal and always streaming would simplify things significantly.

We could remove:

  • shouldStreamResponse()
  • handleBufferedResponse()
  • The conditional buffering in handleStreamedResponse() (lines 161-179)

And just always use the streaming path. This would cut ~50 lines of code. I am going to try that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed as suggested so we always stream the response. This has been done as part of 2338e0d

* Returns a StreamedPHPResponse that allows processing the response
* body incrementally without buffering in memory.
*/
async handleRequestStreamed(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just make handleRequest() just always return StreamedPHPResponse? I don't think anyone relies on those classes outside of Playground CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Here are the alternatives:

Option A: Single method that always buffers (like PHPRequestHandler.request())

  • handleRequest() internally calls handleRequestStreamed() then converts via PHPResponse.fromStreamedResponse()
  • Callers who need streaming use worker.requestStreamed() directly
  • Pros: No code duplication, backward-compatible API
  • Cons: Adds buffering overhead for callers who don't need streaming

Option B: Replace handleRequest() with streaming, update callers

  • Since this is internal to the CLI, we update start-server.ts to handle StreamedPHPResponse
  • Pros: Simplest implementation, streaming by default
  • Cons: Requires updating caller code to handle async properties

I'd suggest Option A for consistency with PHPRequestHandler, or Option B if we want streaming to be the default. Happy to implement either — what do you prefer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been implemented as part of 2338e0d, now there is only StreamedPHPResponse

const payload = {
__type: 'StreamedPHPResponse',
headers: (obj as any)['headersStream'],
headers: headersStream,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was wrong with the original code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the getHeadersStream is better, as the headerStream field is private

type WorkerLoad = {
worker: RemoteAPI<PlaygroundCliWorker>;
activeRequests: Set<Promise<PHPResponse>>;
activeRequests: Set<Promise<PHPResponse | void>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why void?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleRequest() adds Promise (line 76), while handleRequestStreamed() adds a Promise tracking promise (line 124) to wait for stream completion. The Set needs to accept both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been added as part of 2338e0d, now is Promise<void> as it will be always streamed

epeicher and others added 2 commits February 3, 2026 19:47
They were removed by mistake
- Rename #spawnPHPAndDispatchRequestStreamed → #spawnPHPAndDispatchRequest
- Rename #dispatchToPHPStreamed → #dispatchToPHP
- Move both methods after #serveStaticFile to restore original location

This addresses review feedback: since these are private methods and there's
now only one version (streaming), the "Streamed" suffix is redundant.
Moving them back to their original position makes the diff easier to review.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor Author

@epeicher epeicher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great review, @adamziel! I have responded to all the comments, and I will work on making the response always streamed. I will update the PR with the results.

);

if (primaryPhp.isDir(fsPath)) {
// Ensure directory URIs have a trailing slash. Otherwise,
Copy link
Contributor Author

@epeicher epeicher Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely we should retain it, this was removed by mistake. Added as part of 3930991

const payload = {
__type: 'StreamedPHPResponse',
headers: (obj as any)['headersStream'],
headers: headersStream,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the getHeadersStream is better, as the headerStream field is private

Comment on lines +266 to +274
// ReadableStreams must be explicitly transferred
return [
payload,
[
headersStream as unknown as Transferable,
obj.stdout as unknown as Transferable,
obj.stderr as unknown as Transferable,
exitCodePort,
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change ensures all three ReadableStreams (headersStream, stdout, stderr) plus the MessagePort (exitCodePort) are properly transferred across the worker boundary, which is required for streaming responses to work through Comlink.

* @param request - PHP Request data.
* @returns A StreamedPHPResponse or PHPResponse for non-PHP files.
*/
async requestStreamed(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you for all the notes. I will review them all.

/**
* Spawns a new PHP instance and dispatches a request to it with streaming.
*/
async #spawnPHPAndDispatchRequestStreamed(
Copy link
Contributor Author

@epeicher epeicher Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we don't need to rename them after the simplification in previous commits. I will keep the original name and move them back after serverStaticFile (this is not a nit at all). Done as part of be71915

exitCode: Promise<number>
) {
this.headersStream = headers;
this.#headersStream = headers;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encapsulation – the raw stream is an internal detail that can only be read once, so we expose it through getHeadersStream() for serialization and headers for parsed access, preventing accidental
misuse.

type WorkerLoad = {
worker: RemoteAPI<PlaygroundCliWorker>;
activeRequests: Set<Promise<PHPResponse>>;
activeRequests: Set<Promise<PHPResponse | void>>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleRequest() adds Promise (line 76), while handleRequestStreamed() adds a Promise tracking promise (line 124) to wait for stream completion. The Set needs to accept both.

* Returns a StreamedPHPResponse that allows processing the response
* body incrementally without buffering in memory.
*/
async handleRequestStreamed(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Here are the alternatives:

Option A: Single method that always buffers (like PHPRequestHandler.request())

  • handleRequest() internally calls handleRequestStreamed() then converts via PHPResponse.fromStreamedResponse()
  • Callers who need streaming use worker.requestStreamed() directly
  • Pros: No code duplication, backward-compatible API
  • Cons: Adds buffering overhead for callers who don't need streaming

Option B: Replace handleRequest() with streaming, update callers

  • Since this is internal to the CLI, we update start-server.ts to handle StreamedPHPResponse
  • Pros: Simplest implementation, streaming by default
  • Cons: Requires updating caller code to handle async properties

I'd suggest Option A for consistency with PHPRequestHandler, or Option B if we want streaming to be the default. Happy to implement either — what do you prefer?

}

/**
* Determines if a response should be streamed based on its headers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! The shouldStreamResponse logic was added to avoid streaming overhead for small responses (regular page loads), but you're right — the overhead is minimal and always streaming would simplify things significantly.

We could remove:

  • shouldStreamResponse()
  • handleBufferedResponse()
  • The conditional buffering in handleStreamedResponse() (lines 161-179)

And just always use the streaming path. This would cut ~50 lines of code. I am going to try that

…ring

- Remove conditional buffering based on response size in start-server.ts
- Rename handleRequestStreamed() to handleRequest() in LoadBalancer
- Always use streaming for all PHP responses
- Fix PHP instance leak when #dispatchToPHP throws an error

The previous code did not release the PHP instance if dispatch failed,
causing "Too many PHP instances" errors when exceptions occurred.
Add StreamedPHPResponse.fromPHPResponse() to convert buffered responses
to streamed format. This allows uniform response handling throughout
the CLI server stack:

- LoadBalancer.handleRequest() now always returns StreamedPHPResponse
- start-server.ts simplified to single handleStreamedResponse() path
- Removed duck-typing checks and handleBufferedResponse()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants