Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster,net: fix random port keying #7043

Closed

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented May 29, 2016

Checklist
  • tests and code linting passes
  • a test and/or benchmark is included
  • the commit message follows commit guidelines
Affected core subsystem(s)
  • cluster
  • net
Description of change

This commit fixes an issue where sockets that are bound to a random port are keyed on port 0 instead of the actual port that was assigned to the socket.

@mscdex mscdex added cluster Issues and PRs related to the cluster subsystem. net Issues and PRs related to the net subsystem. labels May 29, 2016
@mscdex
Copy link
Contributor Author

mscdex commented May 29, 2016

if (!errno)
handles[key] = handle;
data = handle.data;
handle.key = key;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be within the !errno check?

Copy link
Contributor Author

@mscdex mscdex May 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't really matter. I just did it there so that it's overwritten no matter if there should happen to be another reference to the handle now or in the future.

@mscdex mscdex force-pushed the cluster-fix-random-port-handling branch from 9baa080 to dc76278 Compare May 29, 2016 09:11
@mscdex
Copy link
Contributor Author

mscdex commented May 29, 2016

Fixed Windows issue. CI again: https://ci.nodejs.org/job/node-test-pull-request/2845/

@mscdex
Copy link
Contributor Author

mscdex commented May 29, 2016

I'm not sure how to label this one (e.g. semver-major or not). It could be breaking backwards compatibility if there are people relying on the previous behavior (for example the parallel/test-dgram-exclusive-implicit-bind.js test), but it could be seen as a bug fix to some degree.

What does everyone else think?

@Trott
Copy link
Member

Trott commented May 29, 2016

What does everyone else think?

Maybe run CITGM?

@mscdex
Copy link
Contributor Author

mscdex commented May 29, 2016

@mscdex
Copy link
Contributor Author

mscdex commented May 30, 2016

citgm looks green.

assert.equal(common.PORT + 2, address3.port);
assert.equal(common.PORT + 3, address4.port);

assert.strictEqual(typeof address2.port, 'number');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you DRY this by moving the checks into a function.

@mscdex mscdex force-pushed the cluster-fix-random-port-handling branch from dc76278 to 21090a0 Compare June 7, 2016 03:20
@mscdex
Copy link
Contributor Author

mscdex commented Jun 7, 2016

@cjihrig Addressed your comments

CI again: https://ci.nodejs.org/job/node-test-pull-request/2940/

@mscdex
Copy link
Contributor Author

mscdex commented Jun 11, 2016

/cc @nodejs/collaborators Any thoughts?

@ronkorving
Copy link
Contributor

This sounds like a very welcome change. I never understood why on a cluster, port 0 would just mean each worker would be assigned a random port. This solution works, although imho there is one concern:
If I run more than 1 server from a Node cluster, I can't use 2 random ports (1 for each service).

This commit fixes an issue where sockets that are bound to a random
port are keyed on port 0 instead of the actual port that was assigned
to the socket.
@mscdex mscdex force-pushed the cluster-fix-random-port-handling branch from 21090a0 to d87bc5f Compare June 13, 2016 04:53
@mscdex
Copy link
Contributor Author

mscdex commented Jun 13, 2016

@ronkorving I've fixed that now and included a new test. Please take a look and let me know if that was what you had in mind.

@ronkorving
Copy link
Contributor

Conceptually, how would I create both an HTTP server and a separate TCP server that both listen on a random port, and are both clustered? I'm not cluster-savvy enough atm to have a very critical eye to your solution. But humor me please on the conceptual level :)

@mscdex
Copy link
Contributor Author

mscdex commented Jun 13, 2016

@ronkorving So you're asking to put this new functionality behind a new listen() option so that you can still have the old behavior ? If not, I'm not sure what you mean by "clustered."

Without this PR, all servers bound in workers that listen on port "0" all refer to the exact same port (whatever it ultimately gets resolved to by the OS) because the master includes the port number in the key name in its handle table. So if you want to spin up another server on a random port within any of the workers, you're SOL as the master will see a server already listening on port "0" and you will either get unexpected behavior (if you have different types of servers sharing the port) or you will get an EADDRINUSE (in the exclusive: true case).

With this PR, workers can listen on as many random ports as they want. This means that port "0" handles are no longer permanently stored by the master (they get re-indexed using the real port after the port is bound) and thus are not automatically shared amongst all workers when they all listen on port "0." However, the port it ultimately resolves to will be shareable (assuming you don't set exclusive: true), so if one worker listens on a random port you would basically have to let the other workers know the real port to listen on. So it would require a little more work since you'd have to do some message passing to the other workers, but the upside is workers are not all locked to one, same, random port.

I hope that makes some sense.

@ronkorving
Copy link
Contributor

ronkorving commented Jun 13, 2016

I don't want the old behavior at all (I think it's useless). But there are 2 valid behaviors imho, that hopefully could co-exist.

Scenario 1:

  • I have an HTTP server on a random port
  • I run my Node.js setup in cluster mode
  • Every worker's HTTP server listens on the same random port that was determined only once

Scenario 2:

  • I have an HTTP service (a) on a random port
  • I have another HTTP service (b) on another random port
  • I run my Node.js setup in cluster mode
  • Every worker's HTTP server (a) listens on the same random port that was determined only once (eg: 1234)
  • Every worker's HTTP server (b) listens on the same random port that was determined only once (eg: 4321)

I understand from your comments that scenario 1 is possible, and scenario 2 requires some more userland intervention, but is possible?

@mscdex
Copy link
Contributor Author

mscdex commented Jun 13, 2016

With this PR it's the same process no matter the scenario. If a worker listens on a random port, the real port needs to be broadcasted to the other workers in order for them to listen on that same port. This can actually be done a couple of different ways:

Worker 1 could do something like:

server.listen(0, function() {
  process.send({ type: 'rest', port: this.address().port });
});
server2.listen(0, function() {
  process.send({ type: 'healthcheck', port: this.address().port });
});

and in the master:

worker1.on('message', function(msg) {
  if (msg.type && msg.port) {
    Object.keys(cluster.workers).forEach(function(worker) {
      if (worker === worker1)
        return;
      worker.send(msg);
    });
  }
});

then the other workers watch for those messages and then listen on the port(s).

OR

Use the master's listening event to broadcast the port(s) to the other workers. This is probably less feasible since you won't have any context associated with the address/port passed to the event handler (e.g. which server is which).

@sam-github
Copy link
Contributor

With this PR, workers can listen on as many random ports as they want. This means that port "0" handles are no longer permanently stored by the master (they get re-indexed using the real port after the port is bound) and thus are not automatically shared amongst all workers when they all listen on port "0."

First, this is absolutely semver-major, the fact that all cluster workers listen on the same port with an app that listens on port 0 is documented.

Second, I get that there are multiple use-cases, but I'd suggest that an API that expresses the intent would be better than breaking the current use case, particularly when it might enable other use cases, but not completely, you have to write a bunch more code to get anything useful.

Right now, the common case (one listen on 0 in a single workers .js file) works with no additional code.

This PR breaks that use, it would make every worker listen on a different port, which means its effectively not clustered.

It breaks it in favour of a model where if you listen multiple times in one worker every single listen port is ephemeral... which strikes me as wholly undesireable, so now you have to write complex cluster code to coordinate, code that isn't even possible to write (there is no worker to worker messaging supported directly by the cluster lib) if you don't control the implementation of the cluster master, which you don't with most supervisor-type code bases which auto-cluster applications.

If some way could be found so that the listens could be (random bad API example design) on some special token, maybe ephemeral:http and ephemeral:https, so that with no cluster, both of those would be the same as listen 0 (they would get unique ports), but with cluster, all the workers would get the same port for http, and a different but same across worker port for https, it would be better.

@mscdex
Copy link
Contributor Author

mscdex commented Jun 14, 2016

@sam-github I've already proposed putting it behind an option if that's what everyone wants.

@sam-github
Copy link
Contributor

@mscdex where? I'm not finding that proposal in the PR comments.

What kind of option? node CLI option? an option to cluster.setup()?

@sam-github
Copy link
Contributor

@mscdex also, what about my suggestion of reworking the API so instead of some global switch, we allow the code doing the listen to say what it wants, true ephemeral, or shared-ephemeral? These global everything one way or everything the other switches I feel eventually run afoul of code that is combined, wherein some code wants one, some the other, and you are stuck....

Also, I'd be interested in what the use case is for having every worker listening on worker-specific/unclustered ports. This is the cluster module... not clustering seems odd.

@mscdex
Copy link
Contributor Author

mscdex commented Jul 13, 2016

Is an additional boolean config option acceptable for this then? If so, does anyone have a good name they want to throw out there?

@trevnorris
Copy link
Contributor

Sounds like a good option to me, but don't have any good ideas for the name.

@jasnell
Copy link
Member

jasnell commented Aug 9, 2016

pulled the dont-land-on-v4.x label since it's a semver-major that wouldn't land there anyway

@jasnell
Copy link
Member

jasnell commented Aug 9, 2016

@mscdex @Trott @trevnorris ... ping. what's next on this?

@mscdex
Copy link
Contributor Author

mscdex commented Aug 9, 2016

@jasnell If most people are in favor of adding this behavior under a separate option, then I will go ahead and add one.

@jasnell
Copy link
Member

jasnell commented Aug 9, 2016

seems like a good idea to me. /cc @nodejs/collaborators

@imyller
Copy link
Member

imyller commented Sep 24, 2016

CI with current master: https://ci.nodejs.org/job/node-test-pull-request/4261/

@gibfahn
Copy link
Member

gibfahn commented Sep 25, 2016

Something went wrong with that last CI run (node-test-pull-request never triggered node-test-commit), rerunning...

CI: https://ci.nodejs.org/job/node-test-pull-request/4277/

@mscdex
Copy link
Contributor Author

mscdex commented Sep 25, 2016

Don't worry about CI right now, I need to change this PR still.

@imyller imyller added the wip Issues and PRs that are still a work in progress. label Sep 25, 2016
@imyller
Copy link
Member

imyller commented Sep 25, 2016

Added in-progress label

@jasnell
Copy link
Member

jasnell commented Mar 1, 2017

@mscdex ... still want to pursue this?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Mar 1, 2017
@mscdex
Copy link
Contributor Author

mscdex commented Mar 1, 2017

@jasnell At some point, yes.

@@ -591,6 +633,16 @@ function workerInit() {
send(message, function(reply, handle) {
if (obj._setServerData) obj._setServerData(reply.data);

if (+options.port === 0 && reply.sockname) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: As the +options.port coercion is done twice you might just move that in a variable above?

@bnoordhuis
Copy link
Member

bnoordhuis commented Jun 8, 2017

I think this PR can be made more general. Proposal:

  1. Add a shared: true/false message property to the 'queryServer' command.
  2. In the child, do message.shared = (message.port !== 0)
  3. In the master, only do handle.add(...) if message.shared === false (send but don't cache.)
  4. If sending fails, close the handle, don't send it to another worker (implied by not caching it.)

That opens a path for supporting unshared sockets for more use cases than just the port 0 scenario.

EDIT: The unshared behavior should be opt-in for users, though. We can't and shouldn't break the current behavior.

@BridgeAR
Copy link
Member

Ping @mscdex

@mscdex
Copy link
Contributor Author

mscdex commented Aug 27, 2017

@BridgeAR I'm not going to have the time to fix this up unfortunately. If someone else wants to work on it or do it from scratch in the way that @bnoordhuis suggested, that's fine by me.

@BridgeAR
Copy link
Member

BridgeAR commented Dec 5, 2017

Closing due to long inactivity. If someone wants to pick it up, please feel free to open a new PR with the suggested code change.

@BridgeAR BridgeAR closed this Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem. net Issues and PRs related to the net subsystem. semver-major PRs that contain breaking changes and should be released in the next major version. stalled Issues and PRs that are stalled. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.