Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe giving async readInto (and read?) another shot #290

Closed
domenic opened this issue Feb 28, 2015 · 8 comments
Closed

Maybe giving async readInto (and read?) another shot #290

domenic opened this issue Feb 28, 2015 · 8 comments

Comments

@domenic
Copy link
Member

domenic commented Feb 28, 2015

While working through https://gist.github.com/domenic/65921459ef7a31ec2839 for #289 I was a bit worried by the concerns mentioned in the section "Reading a byte stream into ArrayBuffers" about memory fragmentation. I don't think there's a good way to address them without a true readInto-style API. But previously we discussed how this would have to take the form readInto(sourceAB, offset, count) -> Promise<{ bytesRead, newAB }> due to the transfer restrictions, and how this is ugly.

I have a slight reformulation of it that might not be 100% ugly. It hinges on a very old discussion I had with myself, about EOS versus { value, done } design for async read(). My thoughts have always been that { value, done } is theoretically better, but practically more annoying, and so I picked EOS.

But, if we go back to { value, done } for async read(), then I think it the async readInto signature proposed above seems less crazy. Picking nice variable names, we could have

rbs.read(bytesDesired)
    -> Promise<{ value: ArrayBuffer, done: boolean }>

rbs.readInto(sourceAB, offset, bytesDesired)
    -> Promise<{ value: ArrayBuffer, done: boolean, bytesRead: number }>

Here, given rbs.readInto(source, 0, 1024).then(result => ...), we would have that result.value is a transfer of source.

What do people think?


Here is an alternative. We could go with the design in https://gist.github.com/domenic/65921459ef7a31ec2839 (async pull(buffer_opt) + sync read()). But, we could entirely separately add async readInto, ugliness and all, to ReadableByteStreams. We wouldn't make read() async at all---neither { value, done } nor EOS. We would just add this separate async readInto method for use cases that want to avoid potential memory fragmentation and get the best level of control that I think we can give in JS.

In this case we might shape the resulting object something like { transferred, bytesRead } instead of { value, done, bytesRead } since we don't need to try for symmetry with read().


The other thing I could use is a sanity check from people that (a) my reasoning in the gist is sound and (b) that fragmentation is a problem we want to solve.

@domenic
Copy link
Member Author

domenic commented Feb 28, 2015

In #libuv IRC @indutny was kind enough to explain to me one use case for readInto that came up in io.js TLS stream work. Apparently OpenSSL has its own set of buffers that you are meant to put data into, where it will get encrypted. In JS terms, he did

socketStream.readInto(openSSL.putStuffInThisBuffer, ...)

Without readInto, he would have had to allocate a new buffer, then copy it into openSSL.putStuffInThisBuffer. This is beyond the fragmentation problem from my gist. Rather, it's a constraint imposed by the library you're using (in this case OpenSSL).

You could imagine a JS library that behaves in a similar way. However, it's not too plausible... it's unclear what the benefits of a library disallowing bring-your-own-buffer in that way would be.

And even if it were plausible, there's no way we can accommodate it with our restriction against observable data races :(. Maybe if Mozilla convinces everyone to add SharedArrayBuffers to the platform.

Maybe we should not bother with readInto because the use cases for when you'd want to use it and also deal with the transferring are too rare? We could even add the transfer version later if we find a use case for it.

@piscisaureus
Copy link

I think I would punt on the issue and just stick to async read for now. readInto is useful only for performance reasons; it enables nothing that couldn't be done with an async read(bytes). So you might as well wait and get feedback from users (on use cases) and implementers (on optimization constraints).

@indutny's use case is fair, but that's only because openssl API limitations.
It's easy to think of alternative solutions though - read into an abstract buffer that whose contents are not accessible from js, for example.

@indutny
Copy link

indutny commented Feb 28, 2015

@piscisaureus I doubt there is a way to parse TLS properly and securely without actually buffering the contents of the incoming TCP packets. If you are going to read into such buffer - double-allocations and memory copying would be wasteful and will destroy performance (as it did in node.js 0.10 TLS implementation).

To conclude: there is nothing OpenSSL-specific, or hacky in the way we use .readInto() (under other name, though) in io.js. It is perfectly normal to assume that this method would be performance critical for the protocol implementations, where buffering is inevitable.

@piscisaureus
Copy link

I doubt there is a way to parse TLS properly and securely without actually buffering the contents of the incoming TCP packets

I agree, but the buffered data doesn't have to be accessible from javascript. It simply needs to be accesible by both libuv and openssl.

@piscisaureus
Copy link

I agree that my argument kind of devolves to "you could do it in C" - not super useful.

My point still stands though that readInto() is performance-only so no harm in deciding on it later.

@indutny
Copy link

indutny commented Feb 28, 2015

I don't feel like there is any point of postponing the decision on this. Especially, considering that this issue was created for discussion and getting enough reasoning :)

It simply needs to be accesible by both libuv and openssl.

I don't think that this is possible in zero-copy mode without sharing the buffer with JS.

@domenic
Copy link
Member Author

domenic commented Feb 28, 2015

Yeah I don't think it's going to be possible to have a readInto that allows @indutny's use case to be replicated in JS without shared memory :(. And that is blocked on a lot more than me (although I guess io.js Buffers are shared, so there's a loophole for io at least).

The question is whether there is sufficient use case left for a transfer-based readInto, even though it would not be able to address @indutny's use case. The long linked gist has some that basically boil down to "avoid memory fragmentation", when compared with pull(buff_opt) + sync read(). I'm not sure those are enough.

@domenic
Copy link
Member Author

domenic commented Mar 11, 2015

Rolling this into the larger meta-issue at #253. For those watching from home, the result is indeed going to be something similar to an "async readInto", albeit with some transfering going on to prevent observable data races.

@domenic domenic closed this as completed Mar 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants