Description
I hope this is the right place to ask the question and that this hasn't already been discussed to death elsewhere. Feel free to direct me elsewhere or close the issue if I've missed something.
In short, I am one of the maintainers of exceljs (which is basically a transfrom stream taking in a read stream, unzipping its contents, running an xml parser on the unzipped chunks and then emitting back excel-related events) and we're in the process of adding support for async iteration via for await of (exceljs/exceljs#1135).
In doing that, we've noticed that for await of is significantly slower than the current .on('data',..)
based approach. Our benchmark is not a microbenchmark, but a full end-to-end benchmark incl. creating and analyzing excel objects in memory (exceljs/exceljs#1139). Switching to for await of (vs. handling the events in sync callbacks) decreased performance by around 60%.
I have debugged this issue (lddubeau/saxes#32) and in short, the issue arises because for every chunk/event passed into our transform stream, we emit out a magnitude greater of chunks/events. And so what's causing the performance is that the callback code would run through these emitted chunks/events mostly synchronously, whereas the current implementation of Symbol.asyncIterator
on Readable
calls setImmediate
between each event, which is quite expensive. I wrote a simple microbenchmark to compare for of against for await of on the same array or iterator, and the difference is around 10x.
So we've come up with this 'hack' where instead of emitting one-by-one all of these chunks/events that our transform produces, we now gather them up in an array and emit that once. Or phrased another way, instead of calling this.push()
for every excel related event that we produce, we call, for each chunk written into our stream, a lot of this.events.push()
(where this.events
is just an array that initialized in the constructor) and then finally this.push(this.events)
once we're done consuming the chunk (and we also reset this.events
to an empty array again). Clever, but now consuming the stream is ugly. Instead of writing `` we now write
// We'd like to write this, but it's slow
for await (const chunk of readable) { ... }
// This is within 2% of the runtime of the callback based approach, but not very ergonomic
for await (const chunks of readable) {
for (const chunk of chunks) { ... }
}
I think this performance issue will bite a lot people because it's so easy to fall into and, at least to me, came as a surprise. I remember reading that readline
has similar performance issues (and similarly to the above it produces a lot more events than it takes in) and would probably also see performance improvements from the above approach.
My question boils down to this: Is there a fundamental reason in the spec around async iteration or streams that we have to go to setImmediate if the read buffer still has stuff in it (i.e., if we could call .next()
synchronously? Is it something that v8 can/will eventually optimize? If no to both questions, what should library authors do to give users all the advantages of async iteration while not sacrificing performance?
Roping in @BridgeAR as a fellow performance nerd and the only one I know here ;)
Activity