Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc, stream: incomplete readable._read description #45072

Open
Antonius-S opened this issue Oct 19, 2022 · 7 comments
Open

doc, stream: incomplete readable._read description #45072

Antonius-S opened this issue Oct 19, 2022 · 7 comments
Labels
doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem.

Comments

@Antonius-S
Copy link

Affected URL(s)

https://nodejs.org/dist/latest-v19.x/docs/api/stream.html#readable_readsize

Description of the problem

Now docs say

When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method.

but they don't say what to do if data is not yet available from the resource. Also, the overall text creates an impression (at least for me) that it's enough to implement this method only for the stream to work. Only the example for readable.push sheds some light on how it is supposed to deal with resources that are being written while reading. If that's a recommended way, I think it should be mentioned as such and also underline that this way fits for Node's Readable streams as well.

@Antonius-S Antonius-S added the doc Issues and PRs related to the documentations. label Oct 19, 2022
@VoltrexKeyva VoltrexKeyva added the stream Issues and PRs related to the stream subsystem. label Oct 20, 2022
@juanarbol
Copy link
Member

cc @nodejs/streams

@benjamingr
Copy link
Member

PR welcome :]

@Antonius-S
Copy link
Author

PR welcome :]

I'd create it if I knew how things supposed to work! Stream API is too overburdened with many hidden catches. Someone with true knowledge has to explain things first.

@mcollina
Copy link
Member

but they don't say what to do if data is not yet available from the resource. Also, the overall text creates an impression (at least for me) that it's enough to implement this method only for the stream to work.

This is not how things work. It's totally enough to implement _read() and call this.push() within it.

Only the example for readable.push sheds some light on how it is supposed to deal with resources that are being written while reading.

I don't understand this sentence. Can you make an example, ideally with code?


I think the source of confusion is that _read() is called to "this.push() some data" at the earliest convenience of the implementor.

In order to safely (= without memory issues) implement a Readable you need to implement _read() to push() some data. Using only one of the two will inevitably lead to memory problems.

@Antonius-S
Copy link
Author

@mcollina thanks for answering!

This is not how things work. It's totally enough to implement _read() and call this.push() within it.

Well, ok, suppose I have this

'use strict';

const fs = require('fs');
const stream = require('stream');

class ReadableWrapper extends stream.Readable
{
  constructor(options, stm)
  {
    // Fix for Node < 14 bug with wrong autoDestroy init
    if (options && options.autoDestroy === undefined)
      options.autoDestroy = true;
    super(options);
    this._stream = stm;
  }

  _read(size)
  {
    const buf = this._stream.read(size);
    console.log('read: ', buf ? buf.length : 'null');
    this.push(buf);
  }
}

const rd = new ReadableWrapper({}, fs.createReadStream('somebigfile.bin'));
rd.on('data', (data) => console.log(data.length));

and when launching we only get

read:  null

output. On the 1st read the source stream isn't ready yet so it returns null and the stream ends. Predictable.
But if I follow the logic (like: no data => wait, got data => write it) and try pushing null only if the source stream has ended:

  _read(size)
  {
    const buf = this._stream.read(size);
    console.log('read: ', buf ? buf.length : 'null');
    if (buf !== null)
      this.push(buf);
    else if (this._stream.readableEnded)
      this.push(null);
  }

nothing changes! _read is called only once.
Now trying to adapt what push() example says

'use strict';

const fs = require('fs');
const stream = require('stream');

class ReadableWrapper extends stream.Readable
{
  constructor(options, stm)
  {
    // Fix for Node < 14 bug with wrong autoDestroy init
    if (options && options.autoDestroy === undefined)
      options.autoDestroy = true;
    super(options);
    this._stream = stm;
    this._stream.pause();

    // Every time there's data, push it into the internal buffer.
    this._stream.ondata = (chunk) => {
      console.log('on data');
      // If push() returns false, then stop reading from source.
      if (!this.push(chunk))
        this._stream.pause();
    };

    // When the source ends, push the EOF-signaling `null` chunk.
    this._stream.onend = () => {
      console.log('on end');
      this.push(null);
    };
  }

  _read(size)
  {
    console.log('read');
    this._stream.resume();
  }
}

const rd = new ReadableWrapper({}, fs.createReadStream('somebigfile.bin'));
rd.on('data', (data) => console.log(data.length));

and... only one 'read', that is, _read was called once. So I'm completely lost in how Node streams are supposed to be wrapped.

I don't understand this sentence. Can you make an example, ideally with code?

I mean streams receiving data while we read it (sockets, pipes, live log files and so on). IOW, a stream that might return null at some moment but later some data could arrive.

@mcollina
Copy link
Member

Here is the first recommendation: do not wrap a stream. If you need another stream, create a PassThrough and pipeline() them together. I might suspect you are trying to clone the stream: use http://npm.im/cloneable-readable instead.

Second, take a look at the docs about the read() method and 'readable' event: you are missing a critical part on how you read() manually from a stream. In your readable event listener, you. Looking at your examples, it seems you don't have a clear distinction between flowing and non flowing mode of a stream.

Third, take a look at the simpler APIs: async iterators, Readable.from() and all the other novelties we shipped recently. Unfortunately, working at a lower level with streams is hard and we cannot do much to fix it.

@Antonius-S
Copy link
Author

Antonius-S commented Oct 27, 2022

I might suspect you are trying to clone the stream

No, actually I'm trying to create something similar to http.Response that is Duplex itself but relies on another stream for data transfer. The underlying stream is supposed to be socket but I used file for simplicity and was trying to implement the most hard Readable part. Writable is much easier.

I'll take a look at the stuff you advised, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants