Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream: add _end method to write streams (equivilent to _flush) #2314

Closed
wants to merge 1 commit into from

Conversation

calvinmetcalf
Copy link
Contributor

Adds the ability to for write streams to have an _end method which acts
similarly to the _flush method that transform streams have but is called before
the finish event is emitted and if asynchronous delays the stream from
finishing. The end option may also be passed in order to set it.

Inspired to finally write this after losing a day to an error that ended up being because I was listening to finish but writing something in flush.

Method name is bikeshedable

previous discussion here nodejs/readable-stream#112

@brendanashworth brendanashworth added stream Issues and PRs related to the stream subsystem. semver-minor PRs that contain new features and should be released in the next minor version. labels Aug 6, 2015
by child classes, and if so, will be called by the internal Writable
class methods only.

When the stream ends this function will be called before the stream the stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double the stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed thanks!

@indutny
Copy link
Member

indutny commented Aug 8, 2015

Is it the same as indutny@f3503ba ? Or do we need to replace this in io.js?

@calvinmetcalf
Copy link
Contributor Author

The difference is that it blocks prefinish to make it more predictable when
it's async

On Sat, Aug 8, 2015, 6:44 PM Fedor Indutny notifications@github.com wrote:

Is it the same as indutny/io.js@f3503ba
indutny@f3503ba
? Or do we need to replace this in io.js?


Reply to this email directly or view it on GitHub
#2314 (comment).

@ronkorving
Copy link
Contributor

Oh I like this, very useful when a written stream needs footer data.

@chrisdickinson
Copy link
Contributor

This seems (superficially) very close to _flush in Transform, though that method does something completely different (instead of preventing finish, it gates ending the readable side of the stream.) It feels like there's potential for confusion here. If we were to introduce this mechanism, it'd be ideal if Transform (& other "prefinish" users) could take advantage of it as well.

Slightly more worrying to me: the problem this solves is "the writable needs to produce data before it closes." This seems antithetical to what a writable should be doing — consuming data. It could be achieved by putting a transform between the readable source of data and its ultimate destination (or by nesting the transform inside the readable and piping directly from readable → writable.) Maybe it's a matter of documenting this more explicitly: "Node does not offer a mechanism for writable streams to produce their own data, or otherwise write to themselves, absent a transform stream."

@calvinmetcalf
Copy link
Contributor Author

transform would be able to make use of it, and that would likely be something prefinish things would want to move to, but it's not backwards compatible.

you can get this behavior with a transform stream but it's somewhat more complex as you need to write to the transform stream but listen to the finish event from the writable stream, if you want to make this available as a purely writable stream (from an api consumers perspective) there are some hacky work arounds to do so.

Some concrete examples include

  • a writable stream to upload a large amount of file to s3/google cloud storage, instead of doing it one at a time to decrease latency we want to have multiple uploads 'in flight' at the same time, there are a couple relatively strait forward ways to do with with transform streams but all of them break how the 'finish' event work (aka if you where to do .on('finish', function () {process.exit()}) that wouldn't guarantee that all your data would be written.
  • a writable stream to upload to google maps engine, due to a couple reasons including the pricing policy we wanted to minimize the number of api calls to insert data (this was a client ask) so we needed to batch insert the data into calls of at most 50, again could do with with a transform but it's hacky or breaks how write streams should work.

some less concrete examples include writing to a data source where you need to close it asynchronously at the end or add a trailer to some sort of data source with non streaming access, the key is not that it is hard to do it inside of your own app, the hard part is when exposing it as a reusable api, what I'm trying to avoid is where when you stream to a file you listen for finish but when you stream to s3 then you need to listen for end.

@calvinmetcalf
Copy link
Contributor Author

@indutny there is a difference between your implementations besides timing, yours is on _writableState with the signature stream, callback while mine is on the stream with just the callback as an argument

@calvinmetcalf calvinmetcalf force-pushed the stream-write-end branch 3 times, most recently from 1c38a26 to 149d3ae Compare August 13, 2015 19:08
@calvinmetcalf
Copy link
Contributor Author

@indutny added the test from your commit into this (modified to not use writableState) and fixed this so it will actually pass that (previously didn't handle ._write callbacks)

@indutny
Copy link
Member

indutny commented Aug 14, 2015

LGTM

@@ -144,6 +145,9 @@ function Writable(options) {

if (typeof options.writev === 'function')
this._writev = options.writev;

if (typeof options.end === 'function')
this._end = options.end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a Symbol here instead of this._end?

The thought is that we can encourage new-style constructor use while also avoiding overloading a property that might be in use in the wild.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but

  • it would be inconsistent with how the other methods work
  • after making the change the next thing I'd have to do is make a regex to undo it for readable-stream

@chrisdickinson
Copy link
Contributor

I'm still not 100% comfortable with this change.

My discomfort stems from:

  • Adding a new state to an already complex state machine.
    • This state fills roughly the same role as both "prefinish" and _flush, with subtle differences.
  • Overriding the use of a publicly available property (we could avoid this with a Symbol, potentially?)
  • The problem seems addressable through means other than adding a new state to Writable streams.

In both stated cases, the problem seems like it could be solved by nesting a transform and a writable inside an exposed writable:

source -> outerWritable
         /    \
      txf  -> innerWritable

The writable side of the Transform stream is exposed on the outerWritable, and the innerWritable isn't exposed to the outside world at all. The buffering (in the Google API case) or the emission of final framing data would be handled by txf.

@calvinmetcalf
Copy link
Contributor Author

The difference between this and prefinish/flush is that this acts like people tend to assume flush works and how you'd probably make flush work if you could do it all over. I don't advocate keeping both this and flush long term for different uses rather a next step could be to depreciated flush and replace it with end in the documentation.

@chrisdickinson
Copy link
Contributor

The difference between this and prefinish/flush is that this acts like people tend to assume flush works and how you'd probably make flush work if you could do it all over.

I don't think it's how I would make flush work — writable streams currently don't hold onto any data that would necessitate a flush. They also don't generate data (IOW, they don't .write to themselves.) Adding an _end to Writable encourages the writable to write data to itself. I think "prefinish" also suffers from this, a bit — maybe the answer isn't "prefinish" or _end; maybe we should introduce better primitives for nesting streams?

@calvinmetcalf
Copy link
Contributor Author

I don't think it's how I would make flush work — writable streams currently don't hold onto any data that would necessitate a flush.

I was referring to how unintuitive it is that the finish event is not blocked by flush meaning that the semantics of whether a stream you are writing to is done are different depending on whether it is a transform or a writable leading to a refactor hazard.

The other thing besides just caching and batching is resource shutdown, nesting stream primitives would likely be overkill for a 'shut this down async' in a writable stream.

@calvinmetcalf
Copy link
Contributor Author

so still think this is a good idea, if for no other reason than whatwg streams also support this method for the purpose of 'asynchronously closing the underlying resources'

@jasnell
Copy link
Member

jasnell commented Nov 16, 2015

@chrisdickinson @calvinmetcalf ... any further thoughts on this one?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Nov 16, 2015
@calvinmetcalf
Copy link
Contributor Author

I'm still in favor of this and would find it super useful

@jasnell
Copy link
Member

jasnell commented Nov 16, 2015

@calvinmetcalf ... could I ask you to take a moment to rebase and update then?

@calvinmetcalf calvinmetcalf force-pushed the stream-write-end branch 2 times, most recently from a50f752 to 3aea1c6 Compare November 16, 2015 17:49
@olalonde
Copy link
Contributor

@mcollina thanks for that link. @nodejs/streams is the streams working group? How do I join it?

@mcollina
Copy link
Member

@olalonde open an issue on https://github.com/nodejs/readable-stream, and we'll move it from there! :) Just start contributing.

Moving this forward would be a nice contribution!

@mafintosh
Copy link
Member

I see this is tagged as a minor bump. This should be a major bump IMO. I have a bunch of streams that use ._end as an internal method that this would break.

@mcollina mcollina added semver-major PRs that contain breaking changes and should be released in the next major version. and removed semver-minor PRs that contain new features and should be released in the next minor version. labels Sep 12, 2016
@jasnell
Copy link
Member

jasnell commented Sep 12, 2016

Just a heads up, as a semver-major if the hope is to land this in v7.x, it would need to land today before noon. It could land after the v7.x branch is cut with CTC approval but the goal is to set that bar very high so that we can get v7.x as stable as possible before the release in October.

@mcollina
Copy link
Member

@jasnell this will likely go into v8, there is definitely no time for v7.

@mcollina
Copy link
Member

I would like this (or similar) to land in Node.js v8.

@calvinmetcalf what do you think? should we do a fresh PR/implementations, or update this?

cc @yoshuawuyts @chrisdickinson @thealphanerd

@MylesBorins
Copy link
Contributor

@mcollina are you still thinking about this?

@mcollina
Copy link
Member

@thealphanerd definitely yes. It should happen, it would fix a lot of issues on implementations.

@shaoshuai0102
Copy link

Any progress on this? ping @calvinmetcalf

@fhinkel
Copy link
Member

fhinkel commented Mar 26, 2017

Feel free to reopen this, if you get back to working on it.

@fhinkel fhinkel closed this Mar 26, 2017
@fhinkel fhinkel reopened this Mar 26, 2017
@fhinkel fhinkel closed this Mar 26, 2017
@mcollina
Copy link
Member

@fhinkel, this is something we are still discussing and we would like to do, I'm reopening.

cc @nodejs/streams.

@lrlna
Copy link
Contributor

lrlna commented May 4, 2017

from the @nodejs/streams wg meeting:

  • rename to ._final to signify a writable stream end the same way ._flush works for a readable
  • run through citgm to make sure ._final doesn't break; this would pull this down to a semver-minor
  • after this is merged, consider a documentation deprecation warning

@calvinmetcalf calvinmetcalf mentioned this pull request May 4, 2017
4 tasks
@calvinmetcalf
Copy link
Contributor Author

superseded by #12828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-major PRs that contain breaking changes and should be released in the next major version. stalled Issues and PRs that are stalled. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.