Skip to content

Callback APIs should come first. Evented APIs should be scaffolded around them #188

Closed
@bjouhier

Description

@bjouhier

I'm posting this as a follow up to #92 #89 and #153. This issue is particularly acute around streams but the layering of events on top of callbacks is actually a more general issue in node APIs (connect event, for example, could/should be layered on top of a connect(cb) call).

Counter truth 1: Stream APIs must be evented.

Wrong: read(cb) is sufficient for a readable stream API and write(data, cb) is sufficient for a writable stream. With these APIs, pipe can be implemented in 3 lines of code.

Counter truth 2: Counter truth 1 is true because a callback API cannot handle backpressure.

Wrong again: it is all a question of encapsulation: the low level resource you are interacting with may have an evented API (pause/resume on the reading side, drain on the writing side) but you can encapsulate this low-level event handling into a callback API (read(cb) and write(data, cb)).

Once you have the callback APIs you don't need to worry about backpressure. It will happen naturally through the event loop. You will reason in terms of buffering rather than backpressure. Think of it: if data comes in fast you'll have to buffer it and then pause the input. Then, when someone calls your read, you can resume the stream to refill the buffer. Backpressure is handled inside the read call, it does not need to be exposed to the consumers of your stream. In other terms, what comes in must come out, if nobody's ready to read all you can do is buffer and you'd rather close the tap.

Truth 1: A callback API is easy to document, an evented one is not.

With a callback API, you just need to document what the method does and what its parameters are; you don't even need to say that cb will be called as cb(err) if there is an error and cb(null, result) otherwise because this is a general rule in node. You don't even need to document that cb will only be called once when the operation completes (successfully or not) because this too is a general rule.

With an evented API you need to document all the events and their parameters but this is not all: you also need to document how the events are sequenced and what expectations the consumer of the API can make on their sequencing. If you are on the producer side (implementing a stream) you must make sure that you meet these sequencing rules. This is the part that gets really tricky and is the source of so many questions/issues.

Truth 2: It is very easy to scaffold an evented APIs around a callback API. The reverse is more difficult.

Proof:

function eventify(read) {
  return function(cb) {
    var self = this;
    read(function(err, data) {
      if (err) self.emit('error', err) 
      else if (data === undefined) self.emit('end');
      else self.emit('data', data);
      cb(err, data);
    });
  }
}

Truth 3: Rigorous error handling is possible (even easy) with a callback API.

This is still tricky with callbacks (but possible). But this is easy with promises, generators and some of the other async solutions.

The big difference between a callback API and an evented API is that pipe will naturally take a callback in a callback API. The signature will be reader.pipe(writer, cb). The callback is called when the pipe operation completes. If the pipe fails cb will receive the error.

Also, it is better to have separate transform and pipe calls. The transform calls do not take a callback, they just pass errors along the chain. Only the pipe call does take a callback; it is always at the end of the chain. So the chain looks likesource.transform(t1).transform(t2).pipe(writer, cb);

No error is lost in such a chain. If something fails, the error will always end up in the pipe callback.

Truth 4: a stream API can be content agnostic

No need to distinguish a text mode, a binary mode and an object mode at the stream level. The core API can just handle data in an agnostic way. The only thing that's needed to keep the API simple is a watchdog value for the end of stream: undefined is the natural candidate.

Truth 5: a callback API lends itself naturally to a monadic, extensible API.

With a callback API, all it takes to implement a readable stream is a simple read(cb) call. All the fancier stream API can be scaffolded around this single call, in a monadic style.

The monadic API will combine two flavors of calls: reducer calls (like pipe) that terminate a chain and take a continuation callback parameter and non-reducer calls (like transform) that produce another stream and can be chained. A chain that only contains non-reducers does not pull anything; it just lazily sits there. The reducer at the end of the chain triggers the pull.

Wrapping-up

The tone is probably not right but I would like to shake the coconut tree. I just see all these discussions going forever around streams being complex, error handling being problematic, etc., when there is a simple way to address all this.

I know that I'm touching a sensitive point and that this may likely get closed with a laconic comment.

If streams were simple and well understood and if error handling was not a problem any more I would not post this. But this is not the case and the debates are coming back (see recent discussions). I know that it is "very late" to change things but apparently io.js is there to shake things up and maybe take some new directions. So I'm trying one last time.

I have a working implementation of all this (https://github.com/Sage/ez-streams) and we are using it extensively in our product. So this is not a fantasy. My goal is not to have core take it literally, just to consider the concept behind it.

Note: there are lots of similarities with @dominictarr's event-streams (https://github.com/dominictarr/event-stream) and with lazy.js (http://danieltao.com/lazy.js/). The main difference is that all the functions that may be grafted on the chain (transforms, filters, mappers) are async functions by default in ez-streams.

Metadata

Metadata

Assignees

No one assigned

    Labels

    streamIssues and PRs related to the stream subsystem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions