Add streaming support for large file downloads (>2GB)#3222
Add streaming support for large file downloads (>2GB)#3222epeicher wants to merge 9 commits intoWordPress:trunkfrom
Conversation
This change adds streaming response support to the Playground CLI server,
fixing an issue where downloading files larger than 2GB would fail due to
JavaScript's Uint8Array size limitations.
The problem occurred when plugins like All-in-One WP Migration tried to
export large sites - the entire response was buffered into memory before
being sent, which fails for files exceeding ~2GB.
Changes:
@php-wasm/universal:
- php-response.ts: Add getHeadersStream() to expose headers stream for
Comlink serialization
- api.ts: Fix StreamedPHPResponse transfer handler to include ReadableStreams
in the transferables array (they weren't being transferred before)
- php-request-handler.ts: Add requestStreamed() method for streaming requests
- php-worker.ts: Add requestStreamed() method delegating to request handler
@wp-playground/cli:
- load-balancer.ts: Add handleRequestStreamed() for load-balanced streaming
- run-cli.ts: Use handleRequestStreamed instead of handleRequest
- start-server.ts: Rewrite with streaming support:
- Use duck-typing to detect StreamedPHPResponse (instanceof doesn't work
across Comlink boundaries)
- Stream responses when Content-Length is missing (unknown size) to safely
handle potentially large responses
- Stream responses with Content-Disposition: attachment or binary types
- Buffer small responses with known Content-Length for efficiency
Fixes lint errors and test failures introduced by the streaming changes: - php-request-handler.ts: Import StreamedPHPResponse as type-only since it's only used in type annotations. Fix unused expression by using optional chaining for cookie store. - start-server.ts: Change to `import type` since all imports from @php-wasm/universal are only used as types. - load-balancer.ts: Add .catch() to trackingPromise to prevent unhandled rejection warnings when requestStreamed throws an error. The actual error handling is done in start-server.ts. - run-cli.spec.ts: Update error handling test to override requestStreamed instead of request, matching the new streaming API.
e0e4ce6 to
862a955
Compare
| * @param request - PHP Request data. | ||
| * @returns A StreamedPHPResponse or PHPResponse for non-PHP files. | ||
| */ | ||
| async requestStreamed( |
There was a problem hiding this comment.
Should we just replace the request() method with a streamed version? If anyone needs a buffered response, StreamedPHPResponse is easy to convert to PHPResponse
There was a problem hiding this comment.
That was initially considered, but I decided to adopt a backwards-compatible approach. However, I have applied an approach based on your suggestion while maintaining backwards compatibility as part of 59bb36e, which removes much duplication.
There was a problem hiding this comment.
Ah, PHPResponse has a different interface than StreamedPHPResponse. One way to solve this is by adding a few getters to StreamedPHPResponse to make it compatible with any code expecting a PHPResponse. That would allow us to only have a single request() method. WDYT?
There was a problem hiding this comment.
From my Claude Code session, this is the response, and after asking multiple follow-up questions, I agree with that. Maybe that could be tackled as a follow-up? 🤔. The explanation below:
That's an interesting alternative. Let me analyze whether making StreamedPHPResponse API-compatible with PHPResponse would work.
The Core Challenge
The interfaces are fundamentally different:
// PHPResponse - sync properties
response.headers // Record<string, string[]>
response.httpStatusCode // number
response.bytes // Uint8Array
// StreamedPHPResponse - async properties
response.headers // Promise<Record<string, string[]>>
response.httpStatusCode // Promise<number>
response.stdoutBytes // Promise<Uint8Array>Options to Make Them Compatible
Option A: Add sync getters to StreamedPHPResponse that work after buffering
class StreamedPHPResponse {
private _buffered?: PHPResponse;
async buffer(): Promise<this> {
this._buffered = await PHPResponse.fromStreamedResponse(this);
return this;
}
// Sync getters that throw if not buffered
get bytes(): Uint8Array {
if (!this._buffered) throw new Error('Call buffer() first');
return this._buffered.bytes;
}
}But callers would need to change from:
const response = await handler.request(req);
console.log(response.headers);to:
const response = await (await handler.request(req)).buffer();
console.log(response.headers);Option B: Make PHPResponse properties also async
This would be a massive breaking change affecting all callers.
Option C: Return a union type with type guards
Keep both types separate but provide good ergonomics for checking which one you have.
My Assessment
The current approach (having request() use requestStreamed() internally) gives us:
- Zero breaking changes
- No code duplication
- Clean separation between buffered and streaming use cases
Making StreamedPHPResponse compatible would either require breaking changes or awkward APIs like mandatory .buffer() calls.
There was a problem hiding this comment.
Ah, good point. How about adding a parameter to request(), then? I'd rather avoid a new method if we add it and then follow-up to remove it, that would also be a BC break.
There was a problem hiding this comment.
I guess this PR is mostly there if we make requestStreamed() private. Although the return signature of request() would grow so it would be breaking on the type level but not on the runtime level. Maybe we can turn it into request<T = PHPResponse>(): Promise<T> to keep the type-level compat?
There was a problem hiding this comment.
Meh. I guess a new method such as requestStreamed() makes this more visible to the API consumer, and there's not much duplication anyway. Alright, let's do this. But let's clean up some things about this PR – I'll leave notes inline in other places.
There was a problem hiding this comment.
Great! Thank you for all the notes. I will review them all.
Eliminates code duplication by having request() call requestStreamed() and convert the result to a buffered PHPResponse. This addresses the code review feedback about having two nearly identical methods. Changes: - Simplified request() to ~25 lines (was ~160 lines) - Removed #spawnPHPAndDispatchRequest() - no longer needed - Removed #dispatchToPHP() - no longer needed - Removed unused PHPExecutionFailureError import Benefits: - URL resolution logic now exists only in requestStreamed() - Backwards compatible - request() still returns PHPResponse - Callers who need streaming can use requestStreamed() directly - ~130 lines removed from the codebase
| ); | ||
|
|
||
| if (primaryPhp.isDir(fsPath)) { | ||
| // Ensure directory URIs have a trailing slash. Otherwise, |
There was a problem hiding this comment.
Can we retain all the inline documentation? Or did you find these doc strings not accurate or useful?
There was a problem hiding this comment.
Absolutely we should retain it, this was removed by mistake. Added as part of 3930991
| /** | ||
| * Spawns a new PHP instance and dispatches a request to it with streaming. | ||
| */ | ||
| async #spawnPHPAndDispatchRequestStreamed( |
There was a problem hiding this comment.
Do we have to rename these methods? They're private – can we just make spawnPHPAndDispatchRequest return the StreamedPHPResponse now? Ditto for dispatchToPHPStreamed. Also, I know this is such a nitpick, but can we move these back after serveStaticFile? It's difficult to see what the changes are in the diff when the entire block of code is deleted on the left side and added on the right side.
There was a problem hiding this comment.
Actually, we don't need to rename them after the simplification in previous commits. I will keep the original name and move them back after serverStaticFile (this is not a nit at all). Done as part of be71915
| exitCode: Promise<number> | ||
| ) { | ||
| this.headersStream = headers; | ||
| this.#headersStream = headers; |
There was a problem hiding this comment.
Encapsulation – the raw stream is an internal detail that can only be read once, so we expose it through getHeadersStream() for serialization and headers for parsed access, preventing accidental
misuse.
| } | ||
|
|
||
| /** | ||
| * Determines if a response should be streamed based on its headers. |
There was a problem hiding this comment.
Can we just always stream the response? Then we won't need any of these new functions.
There was a problem hiding this comment.
Good point! The shouldStreamResponse logic was added to avoid streaming overhead for small responses (regular page loads), but you're right — the overhead is minimal and always streaming would simplify things significantly.
We could remove:
shouldStreamResponse()handleBufferedResponse()- The conditional buffering in
handleStreamedResponse()(lines 161-179)
And just always use the streaming path. This would cut ~50 lines of code. I am going to try that
There was a problem hiding this comment.
This has been removed as suggested so we always stream the response. This has been done as part of 2338e0d
| * Returns a StreamedPHPResponse that allows processing the response | ||
| * body incrementally without buffering in memory. | ||
| */ | ||
| async handleRequestStreamed( |
There was a problem hiding this comment.
can we just make handleRequest() just always return StreamedPHPResponse? I don't think anyone relies on those classes outside of Playground CLI.
There was a problem hiding this comment.
Good idea! Here are the alternatives:
Option A: Single method that always buffers (like PHPRequestHandler.request())
handleRequest()internally callshandleRequestStreamed()then converts viaPHPResponse.fromStreamedResponse()- Callers who need streaming use
worker.requestStreamed()directly - Pros: No code duplication, backward-compatible API
- Cons: Adds buffering overhead for callers who don't need streaming
Option B: Replace handleRequest() with streaming, update callers
- Since this is internal to the CLI, we update
start-server.tsto handleStreamedPHPResponse - Pros: Simplest implementation, streaming by default
- Cons: Requires updating caller code to handle async properties
I'd suggest Option A for consistency with PHPRequestHandler, or Option B if we want streaming to be the default. Happy to implement either — what do you prefer?
There was a problem hiding this comment.
This has been implemented as part of 2338e0d, now there is only StreamedPHPResponse
| const payload = { | ||
| __type: 'StreamedPHPResponse', | ||
| headers: (obj as any)['headersStream'], | ||
| headers: headersStream, |
There was a problem hiding this comment.
What was wrong with the original code?
There was a problem hiding this comment.
I think using the getHeadersStream is better, as the headerStream field is private
| type WorkerLoad = { | ||
| worker: RemoteAPI<PlaygroundCliWorker>; | ||
| activeRequests: Set<Promise<PHPResponse>>; | ||
| activeRequests: Set<Promise<PHPResponse | void>>; |
There was a problem hiding this comment.
handleRequest() adds Promise (line 76), while handleRequestStreamed() adds a Promise tracking promise (line 124) to wait for stream completion. The Set needs to accept both.
There was a problem hiding this comment.
This has been added as part of 2338e0d, now is Promise<void> as it will be always streamed
They were removed by mistake
- Rename #spawnPHPAndDispatchRequestStreamed → #spawnPHPAndDispatchRequest - Rename #dispatchToPHPStreamed → #dispatchToPHP - Move both methods after #serveStaticFile to restore original location This addresses review feedback: since these are private methods and there's now only one version (streaming), the "Streamed" suffix is redundant. Moving them back to their original position makes the diff easier to review. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| ); | ||
|
|
||
| if (primaryPhp.isDir(fsPath)) { | ||
| // Ensure directory URIs have a trailing slash. Otherwise, |
There was a problem hiding this comment.
Absolutely we should retain it, this was removed by mistake. Added as part of 3930991
| const payload = { | ||
| __type: 'StreamedPHPResponse', | ||
| headers: (obj as any)['headersStream'], | ||
| headers: headersStream, |
There was a problem hiding this comment.
I think using the getHeadersStream is better, as the headerStream field is private
| // ReadableStreams must be explicitly transferred | ||
| return [ | ||
| payload, | ||
| [ | ||
| headersStream as unknown as Transferable, | ||
| obj.stdout as unknown as Transferable, | ||
| obj.stderr as unknown as Transferable, | ||
| exitCodePort, | ||
| ], |
There was a problem hiding this comment.
This change ensures all three ReadableStreams (headersStream, stdout, stderr) plus the MessagePort (exitCodePort) are properly transferred across the worker boundary, which is required for streaming responses to work through Comlink.
| * @param request - PHP Request data. | ||
| * @returns A StreamedPHPResponse or PHPResponse for non-PHP files. | ||
| */ | ||
| async requestStreamed( |
There was a problem hiding this comment.
Great! Thank you for all the notes. I will review them all.
| /** | ||
| * Spawns a new PHP instance and dispatches a request to it with streaming. | ||
| */ | ||
| async #spawnPHPAndDispatchRequestStreamed( |
There was a problem hiding this comment.
Actually, we don't need to rename them after the simplification in previous commits. I will keep the original name and move them back after serverStaticFile (this is not a nit at all). Done as part of be71915
| exitCode: Promise<number> | ||
| ) { | ||
| this.headersStream = headers; | ||
| this.#headersStream = headers; |
There was a problem hiding this comment.
Encapsulation – the raw stream is an internal detail that can only be read once, so we expose it through getHeadersStream() for serialization and headers for parsed access, preventing accidental
misuse.
| type WorkerLoad = { | ||
| worker: RemoteAPI<PlaygroundCliWorker>; | ||
| activeRequests: Set<Promise<PHPResponse>>; | ||
| activeRequests: Set<Promise<PHPResponse | void>>; |
There was a problem hiding this comment.
handleRequest() adds Promise (line 76), while handleRequestStreamed() adds a Promise tracking promise (line 124) to wait for stream completion. The Set needs to accept both.
| * Returns a StreamedPHPResponse that allows processing the response | ||
| * body incrementally without buffering in memory. | ||
| */ | ||
| async handleRequestStreamed( |
There was a problem hiding this comment.
Good idea! Here are the alternatives:
Option A: Single method that always buffers (like PHPRequestHandler.request())
handleRequest()internally callshandleRequestStreamed()then converts viaPHPResponse.fromStreamedResponse()- Callers who need streaming use
worker.requestStreamed()directly - Pros: No code duplication, backward-compatible API
- Cons: Adds buffering overhead for callers who don't need streaming
Option B: Replace handleRequest() with streaming, update callers
- Since this is internal to the CLI, we update
start-server.tsto handleStreamedPHPResponse - Pros: Simplest implementation, streaming by default
- Cons: Requires updating caller code to handle async properties
I'd suggest Option A for consistency with PHPRequestHandler, or Option B if we want streaming to be the default. Happy to implement either — what do you prefer?
| } | ||
|
|
||
| /** | ||
| * Determines if a response should be streamed based on its headers. |
There was a problem hiding this comment.
Good point! The shouldStreamResponse logic was added to avoid streaming overhead for small responses (regular page loads), but you're right — the overhead is minimal and always streaming would simplify things significantly.
We could remove:
shouldStreamResponse()handleBufferedResponse()- The conditional buffering in
handleStreamedResponse()(lines 161-179)
And just always use the streaming path. This would cut ~50 lines of code. I am going to try that
…ring - Remove conditional buffering based on response size in start-server.ts - Rename handleRequestStreamed() to handleRequest() in LoadBalancer - Always use streaming for all PHP responses - Fix PHP instance leak when #dispatchToPHP throws an error The previous code did not release the PHP instance if dispatch failed, causing "Too many PHP instances" errors when exceptions occurred.
Add StreamedPHPResponse.fromPHPResponse() to convert buffered responses to streamed format. This allows uniform response handling throughout the CLI server stack: - LoadBalancer.handleRequest() now always returns StreamedPHPResponse - start-server.ts simplified to single handleStreamedResponse() path - Removed duck-typing checks and handleBufferedResponse()
Motivation for the change, related issues
When plugins like All-in-One WP Migration try to export large sites (>2GB), the download fails due to JavaScript's Uint8Array size limitations. The entire response was being buffered into memory before being sent to the client, which fails for files exceeding ~2GB.
This change adds streaming response support to the Playground CLI server, allowing large file downloads to stream directly to the client without buffering the entire response in memory.
Implementation details
@php-wasm/universal:
php-response.ts: AddgetHeadersStream()to expose headers stream for Comlink serializationapi.ts: FixStreamedPHPResponsetransfer handler to include ReadableStreams in the transferables array (they weren't being transferred before)php-request-handler.ts: AddrequestStreamed()method for streaming requestsphp-worker.ts: AddrequestStreamed()method delegating to request handler@wp-playground/cli:
load-balancer.ts: AddhandleRequestStreamed()for load-balanced streamingrun-cli.ts: UsehandleRequestStreamedinstead ofhandleRequeststart-server.ts: Rewrite with streaming support:StreamedPHPResponse(instanceofdoesn't work across Comlink boundaries)Content-Disposition: attachmentor binary content typesTesting Instructions (or ideally a Blueprint)
Blueprint to install All in One Migration: install-aio-migration-blueprint.json
/wordpressso exported files are written to the local filesystem.admin/password, or use--loginto auto-login)