-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading binary into an existing buffer #111
Comments
Just had a quick in-person conversation with @willchan. Tentatively it seems like the right approach is a Just like the generic It is up to the stream implementation / underlying source to decide how much of Still to figure out: how/whether this ties into the writable side and piping between them. This is straightforward enough, and important enough, that I will try to get it written up pretty soon. Although probably not before transforms. |
Augh, it doesn't quite work. Channeling @willchan again: When you ask the kernel to fill up a buffer for you, it might say "error, not ready: try again later." In that case you wait until it's ready and try again. This is kind of the intention behind our async wait() + sync readInto() structure, but doesn't quite match up. The kernel API requires that you provide the buffer initially, even if in the end it tells you "not ready." So we can't know until the kernel is ready, i.e. know that the stream can transition from The most straightforward fix is to combine them into async Note that Node (which essentially also uses async wait() + sync read()) does do the dummy buffer copy, I believe. (Will confirm.) Time to go re-read @isaacs's original message... |
My understanding of what Node is doing is that it's always calling read() from the kernel to populate its internal buffer. Therefore, it's easy for it to tell if the stream is "waiting". But if you don't do this internal buffer (which is undesirable for certain apps, let me know if you need more explanation), you need to determine readability from the kernel. The normal way of doing that is calling read() and providing a buffer. Alternatively, you could mimic Node's approach with internal buffers and do a dummy read() with MSG_PEEK and a 1 byte buffer if you wanted (to minimize the buffer copies) to determine the read/wait state, but just doing this at all is wasteful. For high performance systems, you want to minimize the number of system calls you invoke because they're relatively expensive when you're doing it for each connection and you've got a bunch of connections. So you don't want to issue "dummy" read() calls. You would call read() with the buffer you want to read into, and if the kernel doesn't have data, you use epoll_ctl() to add the socket to the list of sockets to monitor. Note that writable streams have the analogous issue. If you use an internal buffer, you can always tell whether or not the stream is in a "waiting" state. But if you don't, and you go straight to the kernel, it can always return EAGAIN to tell you to try writing again later, which you use select or epoll to determine when to do so. And unlike read() where we could use dummy read()s with MSG_PEEK, you can't do a "dummy" write to determine write vs wait state. |
So, we use
But this still doesn't work. To determine the Another problem is determination of whether the result of Hmm, can we use this to give the
The What you pointed out here will be important if we want to implement some class with the same interface as the
"done" may mean writing all data to the kernel or some other operation in general. Different from the readable stream's problem we're discussing, there's no issue of wasteful memory copy. We can just have a queue to hold pending ArrayBuffers, I think. |
Can't |
Right, this is the copy we want to avoid. I think the cost here will be much worse than the cost of having to move work from sync to async. So at this point I am feeling that the best API is promise-returning readInto. The alternative would be some kind of crazy inside-out or C-style API that preserves maximum efficiency. E.g. This raises the question of whether promise-returning read(), and giving up on wait() + sync read(), might work. I am really hesitant but am willing to be proven wrong by data per #120. (BTW I am at TAG meetings through Wednesday so that is why work there is slightly stalled.) This could also solve many of the other issues you mention about state transitions, I suppose? I'd want some solid data first as this is a big change and we'd have to be sure before making it.
I think the idea we had was that instead of having the writable stream signal how much data it wants, and pipeTo pushing that much data into it, we should use the pipeFrom idea from #146 to allow the writablestream to pull the appropriate amount of data. What do you think? Again I haven't had time to read through your reply in #146 so I am probably missing something.
I agree, definitely.
Good catch I will update that post. |
Yes, it's the copy to the internal buffer that we want to avoid if we want to:
It's true that a |
I'd like to complement this proposal by applying it also to the initial readInto(arraybuffer, offset = 0, maxDesired = dest.byteLength - offset)(
|
In this sentence, you're talking about your bullet point 2, i.e. cases where we want to use Sorry but I want to sort out problems at different levels of optimization we're trying to solve. |
Revised version of #111 (comment) The underlying sink implements ReadableByteStream
Properties of the ReadableByteStream prototypereadInto(arraybuffer, offset, size)
|
Moved to https://github.com/tyoshino/streams/blob/bytestream/BinaryExtension.md |
<3 |
Added reference implementation. Test coverage is not so high yet but maybe enough for initial commit. |
Closing as progress on this is happening and is tracked by other issues. |
@dherman brought this up originally, and then it came up in a conversation with @willchan and @slightlyoff.
The idea would be something like:
There's lots of bikeshedding that could be done here (overload instead of separate method? What does the prior art in other languages and libraries do? How to handle overflows? etc.). But the argument is quite sound in principle: it would allow you to allocate a single contiguous array buffer, then fill it with multiple reads from the stream.
The alternative now would be concatenating array buffers, which (a) probably involves a copy in all existing browsers; (b) even if it were optimized (in a similar manner to string concatenation), would cause memory fragmentation/cache locality problems compared to the single contiguous buffer.
This would be for binary (and string?) streams only. It may be a good use case for a
BinaryReadableStream
subclass.The text was updated successfully, but these errors were encountered: