Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline aborts all HTTP requests #32105

Closed
szmarczak opened this issue Mar 5, 2020 · 27 comments
Closed

Pipeline aborts all HTTP requests #32105

szmarczak opened this issue Mar 5, 2020 · 27 comments

Comments

@szmarczak
Copy link
Member

  • Version: >= 13.10.0
  • Platform: all
  • Subsystem: stream

This bug breaks hundreds of packages, including Got, Yarn and Renovate.

What steps will reproduce the bug?

const {PassThrough, pipeline} = require('stream');
const https = require('https');

const body = new PassThrough();
const request = https.request('https://example.com');

request.once('abort', () => {
	console.log('The request has been aborted');
});

pipeline(
	body,
	request,
	error => {
		console.log(`Pipeline errored: ${!!error}`);
	}
);

body.end();

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

Pipeline errored: false

What do you see instead?

Pipeline errored: false
+The request has been aborted

Additional information

This is the culprit: https://github.com/nodejs/node/pull/31940/files#diff-eefad5e8051f1634964a3847b691ff84R36

->

https://github.com/nxtedition/node/blob/5a55b1df97a1f93c5fb81bf758050d81524ae834/lib/internal/streams/destroy.js#L166

/cc @ronag

@ronag
Copy link
Member

ronag commented Mar 5, 2020

I'll take a look asap

@ronag
Copy link
Member

ronag commented Mar 5, 2020

Are you able to create a self contained test case? I tried the following on master but was unable to make it fail:

  const server = http.createServer((req, res) => {
    setTimeout(() => {
      req.write('ASDASD');
      req.end();
    }, 1e3);
  });

  server.listen(0, () => {
    const req = http.request({
      port: server.address().port
    });

    const body = new PassThrough();
    pipeline(
      body,
      req,
      common.mustCall((err) => {
        assert(!err);
        server.close();
      })
    );
    body.end();
  });

@szmarczak
Copy link
Member Author

You've missed

request.once('abort', common.mustNotCall);

@ronag
Copy link
Member

ronag commented Mar 5, 2020

Ah, I think this is actually a problem with the http_client, I guess it shouldn't emit 'abort' if it has gracefully ended.

@mcollina

@szmarczak
Copy link
Member Author

No, you misunderstood me. Let me explain...

@szmarczak
Copy link
Member Author

In #31940 you introduced this change:

https://github.com/nodejs/node/pull/31940/files#diff-eefad5e8051f1634964a3847b691ff84R36

It calls destroyer(stream, error), which calls

request.abort() even though it should not.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

even though it should not

Why do you think it should not?

@szmarczak
Copy link
Member Author

pipeline is meant to sent data from one stream to another. It shouldn't destroy streams that have been successfully ended. I mean it should call request.abort() only if an error occurs.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

pipeline is meant to sent data from one stream to another. It shouldn't destroy streams that have been successfully ended. I mean it should call request.abort() only if an error occurs.

I disagree with this. The current semantics of pipeline is to destroy all streams on completion. This is because not all streams properly call destroy on completion, and the idea of pipeline is to normalize this behavior.

I might be wrong here, and it might also be a bad idea. Why do you think it shouldn't be destroyed?

And if we don't call destroy how do we properly handle streams that do no properly destroy themselves? Changing this could break it in the other direction where we end up with leaking resources.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

A quick fix to resolve the break is not have a special case for request and simply don't destroy it like the rest. But I don't think that is a longterm solution. Do we have a more fundamental problem?

@ronag
Copy link
Member

ronag commented Mar 5, 2020

@mcollina: I think we need your guidance here.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

@szmarczak What is the fundamental breaking change here? That the request is aborted/destroyed or that 'abort' is emitted?

@szmarczak
Copy link
Member Author

That the request is aborted.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

That the request is aborted.

If it has completed. Why is this a problem? Aborting a completed request is basically a noop expect for the 'abort' event?

Is the problem here that only the writable side has completed when the pipeline callback is invoked and the readable side is still to be consumed?

@szmarczak
Copy link
Member Author

I disagree with this.

Well, then you should've made an exception for HTTP requests as this change makes #30869 and #31054 moot which is a major breaking change.

If it has completed.

No, it hasn't. It has only sent the request, but it hasn't received the response yet.

Is the problem here that only the writable side has completed when the pipeline callback is invoked and the readable side is still to be consumed?

Exactly! :)

@ronag
Copy link
Member

ronag commented Mar 5, 2020

@szmarczak: Not trying to be a pain. I agree it's a breaking change. I'm just trying to understand whether this is a more fundamental problem that we've overlooked.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

Was hoping to try and sort this out quickly but I have to leave. Hopefully I'll be able to continue digging into this later tonight. Sorry for the delay.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

@mcollina: this sounds to me like we should not destroy the last stream sent to pipeline if it is (still) readable in order to allow composition.

@szmarczak
Copy link
Member Author

Sure. I think that for Duplex streams you should wait until the readable side has ended. But since ClientRequest is not a Duplex stream, a workaround is needed.

And if we don't call destroy how do we properly handle streams that do no properly destroy themselves?

You may want to write a detailed instruction in the docs about this. I see the reason behind destroying read streams, which is a good point!

A quick fix to resolve the break is not have a special case for request and simply don't destroy it like the rest. But I don't think that is a longterm solution.

I think that ClientRequest should be a Duplex stream. The API would be much more easier to use. But I agree with you.

@MylesBorins
Copy link
Contributor

Hey all,

just thinking through appropriate solutions here.

Seems like the simplest thing to do would be to revert #31940. Either on master until a non breaking approach could be found, or alternatively on 13.x only and label the PR as Semver-Major and allow the behavior to land in 14.x (although that seems like kicking the can down the road).

Alternatively if a fix could be found in short order we could do a release with that fix landed.

In either case it would be good to get something out next week to deal with this.

As an aside, we should have caught this in CITGM, it is unfortunate we didn't. Would be good to land a test with the revert / fix to ensure we don't regress here, or if we do it is very intentional. got is included in CITGM, but is currently set to skip. After this problem is resolved it would likely be a good idea to get it un-skipped / non marked flaky on at least a single platform so we will catch future regressions.

@ronag
Copy link
Member

ronag commented Mar 5, 2020

I believe I can have a solution that can land on both 13 and master within some hours. Not by a computer at the moment.

nicolo-ribaudo pushed a commit to nicolo-ribaudo/babel that referenced this issue Mar 5, 2020
ronag added a commit to nxtedition/node that referenced this issue Mar 5, 2020
If the last stream in a pipeline is still usable/readable
don't destroy it to allow further composition.

Fixes: nodejs#32105
ronag added a commit to nxtedition/node that referenced this issue Mar 5, 2020
If the last stream in a pipeline is still usable/readable
don't destroy it to allow further composition.

Fixes: nodejs#32105
Backport-PR-URL: nodejs#32111
andymantell added a commit to surevine/govuk-react-jsx that referenced this issue Mar 6, 2020
@ronag ronag closed this as completed in 4d93e10 Mar 8, 2020
@ronag
Copy link
Member

ronag commented Mar 9, 2020

For those returning to this issue due to breakage in 13.10.x please see the backport PR here #32111

MylesBorins pushed a commit that referenced this issue Mar 9, 2020
If the last stream in a pipeline is still usable/readable
don't destroy it to allow further composition.

Fixes: #32105
Backport-PR-URL: #32111
PR-URL: #32110
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
nicolo-ribaudo pushed a commit to nicolo-ribaudo/babel that referenced this issue Mar 11, 2020
@darksabrefr
Copy link

Is there any chance to get this before 13.11.0 (#32185), in a real patch version ? I'm surprised that a major bug patch like this takes days to be published while the node ecosystem is really affected by this problem.

@BridgeAR
Copy link
Member

@darksabrefr v13.x is not an LTS release and we explicitly have the uneven releases to detect issues like these early so that they do not get backported to the LTS release lines or only with fixes.

It is not as easy as just pushing a button to publish a new release. It requires a significant amount of time to do so. The project itself is all open source and people must find the time to do so. If you would like to contribute and help out, that would be great!

That aside: we have certain rules when commits may land and when not. It was almost immediately fixed after reporting it but merging requires to review the code and to wait 48 hours so that multiple collaborators get the chance to have a look at the change. Publishing a minor or a patch version will also take the same amount of time.

We have lots of tests in place and try hard to guarantee stability for all changes. This is just very difficult with a complex state machine as streams. I am very grateful for the fast response from @ronag @MylesBorins and everyone else involved to make sure that a fixed version is released as soon as possible.

@MylesBorins
Copy link
Contributor

MylesBorins commented Mar 11, 2020

@darksabrefr some extra context that might be helpful. There was another regression introduced in 13.9.0 that broke building from source tarballs. We did not get a fix in time for 13.10.0 as the releasers were not aware of the issue before pushing for the release. This in turn escalated an emegency 13.10.1 which went out last Wednesday. This issue was opened the following day, and we didn't yet have a clear solution. The earliest we could get a solution in would have been Friday, and we do our best to not do releases on Fridays, as to avoid weekend fire drills.

I started working on a new 11.x release on Monday, having something basic prepared and running in CI that evening. For the current release line, as mentioned above by @BridgeAR, we include any commits that are not Semver-Minor that land cleanly on the release line. We of course work really hard to ensure that if we break thing we fix them, as you can see based on the above release going out 24 hours after the original 13.10.0, but we do not offer the same guaranteed support that we do for LTS. It is for this reason that I generally advise folks to only use LTS is production.

While working on a fix for the streams bug a number of missing patches were identified that made it harder to ensure the same behavior on master and 13.x, specifically that the fix actually fixed things and could land. There we also more regressions that were identified.

Multiple people have been putting in time since last Thursday to get this out, so I hope you can understand the situation is a bit complicated

@MylesBorins
Copy link
Contributor

MylesBorins commented Mar 11, 2020

Also worth mentioning that we might be having a further delay due to an infrastructure issue in our build cluster

nodejs/build#2217

edit: this is now resolved

@darksabrefr
Copy link

I want to thank @MylesBorins for the clarifications and @ronag and @szmarczak for the fast interventions.

I just mitigate the advice to use only LTS in production. Like other people, I contribute to node, in an invisible way, by using and testing non-LTS versions days after days, on various (but not critical) real-life projects in various situations, and describing issues, here or in other related repos. @BridgeAR: code and commits are not the only maneer to contribute to a project ;-)

My only reproach is the lack of communication versus this breaking bug patch release (and others related bugs that @MylesBorins mentioned too). There is no one to blame concerning the bug itself on this non-LTS channel. But a confirmed breaking bug related to an identified commit should not be reverted and immediately deployed (or asap)? I understand that's not always as simple as reverting a single commit, but that's a base idea to not let the latest stable version broken for days. A too unstable latest can lead to not have sufficient real-life tests on it, people using more LTS versions, causing backported things to LTS under-tested and potentially bugged too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants