Skip to content

Improve throughput of read streams by transferring multiple records at once #70

Closed
@vweevers

Description

@vweevers

Working on the level-bench benchmarks got me thinking. Currently level-iterator-stream ignores the size argument of stream._read(size). Per tick it transfers only 1 db record from the underlying iterator to the stream's buffer. I think we can be smarter about this. By connecting the knowledge that size records are requested, all the way down to the db (in the case of leveldown, down to the C++, potentially replacing its current read-ahead cache mechanism).

In pseudo-code it would look something like (ignore error handling for a moment):

// level-iterator-stream
ReadStream.prototype._read = function (size) {
  var self = this

  // Fetch <size> records from the db, then call "visit" repeatedly within a tick
  this._iterator.visit(size, function visit (record) {
     // Record is either null, an object { key, value } or just a key or value
     self.push(record)
  })
})

This also avoids allocating 3 callback functions per record. Alternatively:

this._iterator.nextv(size, function (records) { // aka "chunks" in streams
   for (let record of records) self.push(record)
})

Or if streams were to get a .pushv method similar to .writev:

this._iterator.nextv(size, function (records) {
   self.pushv(records)
})

/cc @mcollina: could such an API be faster? I'm also wondering how _read() behaves in an asyncIterator. Is size always 1 in that case, or does the stream read ahead?

@peakji @ralphtheninja /cc @kesla

Metadata

Metadata

Assignees

No one assigned

    Labels

    benchmarkRequires or pertains to benchmarkingdiscussionDiscussion

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions