-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeFrom to allow writable streams to "pull" at their own pace #146
Comments
What I planned to do is:
This way, only the amount of data needed/acceptable is communicated. We could add option to This approach allows a This looks adding huge complexity, but even after revising, some part of above will be needed I think. Please see below. I think your If so, we can also address this issue by adding "some methods" on the |
Thanks for this. It is great to have it well thought out and written down. I'll try to process it all over the next couple of days of meetings :).
My current thinking (which may not be very good) is that we subsume all amount-specific negotiation into byte streams and their specific protocol. E.g. if we had What do you think? I am not sure it is a good design. On the one hand, it isolates the complexity into only a few places, and confines it to bytes streams. But on the other hand it seems like a weakness of our model that we are doing that kind of byte stream specialization, and not giving that kind of amount-negotiation capability to other streams. In which case something more general involving the kind of stuff you propose would be good. Or perhaps some combination of the stuff you propose and the current "strategy idea": the strategy idea was designed to abstract away all the byte-related stuff in one small place, but if it ends up leaking elsewhere anyway, then perhaps that should be reconsidered.
I agree it doesn't really make much of a difference. You can have specialized methods on either side. But somehow it seems more natural for the specialize the pulling process than to specialize the pushing process. I wish I had a better argument. |
I haven't thought too much about how to achieve the suggested goals I laid out to @domenic that are important for high performance networking. Let me sketch out some ideas for what I would naively do:
|
@willchan Many of your ideas are captured by the W3C streams spec I was co-authoring as it initially aimed to provide streaming interface only for bytes. I can update it to provide a prototype while incorporating ideas established here and using the same identifiers and API surface as much as possible. |
I think this is too low-level to be useful in a web-exposed way. Lack of queuing is too big of a footgun to expose to users. To me it is fundamental to what we mean by "stream" in JS. If we think these kind of things are valuable as a potential building block then they should be named something else.
To me the biggest concern is that if you get a byte stream it should interoperate with an object stream easily. E.g. you can write a transform from one to the other. Or if you have a generic stream-consuming mechanism, it should be able to consume a specialized byte stream, perhaps slightly slower or with less-controlled backpressure. That is what is behind my current vision, wherein ReadableByteStream derives from ReadableStream. Then all of the ReadableStream methods have sane semantics when used by a consumer that expects a generic ReadableStream, but consumers who are willing to special-case for (or demand) a ReadableByteStream can use the more-specific functionality it exposes. One of those consumers would probably be WritableByteStream.
Would you be up for doing it as a prototype in this repo? Maybe reference-implementation/lib/experimental/* or similar? I agree some concrete prototypes would help. |
will do. |
Great! Let me give you commit access so you can work more freely in that area. We should both do pull requests and/or work in branches w.r.t. non-experimental changes of course. |
I'm fairly agnostic to how they are exposed. I agree that it's a footgun that most users should not want to use. I'd be fine with them being renamed to something else if that's preferred. It begs the question though of whether or not these low-level APIs would be exposed in standard web APIs like fetch().
I agree with this statement. The key part here is that you can transform from one to another. An object stream could wrap a byte stream and deserialize the bytes into objects and serialize the objects into bytes.
So being able to transform between an object stream and a byte stream doesn't necessarily mean they need to share the same interface. I'm OK with it if that's what folks want. There are some tradeoffs here, because sharing the same interface either means giving up some performance or complicating the API with a different I/O model from the traditional stream I/O model. I'm in particular thinking of partial |
This API is more awkward than I can like, as it means you are no longer able to write code that does not care which it gets. It would be sad if the story was "to interface with any user code, wrap your stream-returned-from-fetch in an object stream." Or worse, "check if the code you're passing it to special-cases for byte streams." It's better for the story to be "pass the stream-returned-from-fetch to the stream-consuming API," with the stream-consuming API able to be transparently upgraded in the future to specialize for byte streams. But, I can see how it gives a clean separation.
Right. Giving up performance is not OK. (Things like forcing asynchrony might be acceptable, but certainly not buffer copying or suboptimal backpressure negotiation.) So the question is, how complicated do things get. I remain hopeful that you can layer on a small addition for the partial read/write calls, but it may turn out to be not-that-small. In which case a wrapper-esque approach (ReadableStream.fromPipe or something) might be necessary. |
In an offline thread @acolwell mentioned it would be useful for a writable stream to know whether it was being piped to. We got a bit off-topic here, but the framework outlined in the OP could be used to accomplish this, although in a kind of strange way. (You would subclass WritableStream, and the subclass would override pipeFrom to delegate to super.pipeFrom but also record whether or not it was ongoing.) His concrete use case is making MSE's SourceBuffer backed by a writable stream, and disabling higher-level operations on the SourceBuffer while it is being piped to. We should probably figure out a way to address that use case. It could be as simple as a The issue essentially right now is a conflict between designing piping as a higher-level algorithm that just uses the public APIs to transfer data, versus wanting to make it more specialized in certain cases (super-important for performance). I need to think harder on that. |
This spins out of the "Questions about the Fetch API" thread on the whatwg list, in particular @willchan's reply around here and the subsequent follow-ups. We had a video-chat conversation that helped clarify things and I want to capture them here.
Our current piping algorithm essentially says: whenever the source has data, and the dest is not exerting backpressure, read from the source and write to dest. @willchan calls this "push", because the dest does not really get to decide how it consumes from source. He explains that a "pull" model would be better, wherein you give source to the underlying sink implementation (probably via dest), which then grabs data out of it as it determines is necessary.
This is most important for high-performance binary streams which will allow reading of specific amounts of bytes (#111), because the writable stream implementer (e.g., the UA) could then use smart algorithms to figure out exactly what size chunks they want to try transferring, depending on e.g. how well the streaming has gone so far, what type of network they are on, and other such factors.
This functionality will not be useful for most writable streams: only those who know how to be smart about consuming the data. Object streams in particular are unlikely to want to use this.
The tentative idea I had for solving this was something like the following:
WritableStream.prototype.pipeFrom
, and move all of the existing code inReadableStream.prototype.pipeTo
into that. This is the default pipe-from implementation. This is possible since the pipe code does not depend on any internals, just the public API, and in fact it could be a standalone function.ReadableStream.prototype.pipeTo(dest)
become essentiallydest.pipeFrom(this)
. SopipeTo
just becomes a convenience so that authors can write things in right-to-left order.WritableStream
and override thepipeFrom
method to include custom logic.This actually solves a number of other problems:
It helps give a framework for streams to "recognize" each other, e.g. for off-main-thread-piping (#97) via things like splice: writable streams recognizing file descriptors can recognize that a readable stream representing a file descriptor is being piped to them, and then do splicing instead of the usual algorithm---but then fall back to
super(...args)
if they do not recognize the stream.It also allows dest streams to apply any other stream-specific logic. For example, a stream representing a HTTP request body to be sent out could recognize a file descriptor stream being piped to it, get the file's length, and then set that on its content-length header. The popular request package in Node.js makes use of these sorts of tricks extensively.
These could be done the other way around, too, via overriden
pipeTo
, but it seems more of the dest's responsibility rather than the source to do this kind of recognition.I am optimistic that this does not really add any complexity to the default case, while adding good flexibility for the fastest-possible implementations for high-performance binary streams.
The text was updated successfully, but these errors were encountered: