Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set ZMQ_IDENTITY attribute of router sockets proposal #74

Closed
wants to merge 2 commits into from

Conversation

JohanMabille
Copy link
Member

Set ZMQ_IDENTITY of router sockets

Current specification

A jupyter kernel uses router sockets to receive messages on the shell and control channel. The current specification
does not put any constraint on the ZMQ_IDENTITY attribute of these sockets. Therefore, most of the implementations
(if not all) rely on the default behavior of ZMQ which is to set a random value upon connection.

This allows to use these sockets in the DEALER (client) - ROUTER (kernel) or REQ (client) - ROUTER (kernel) patterns,
but prevents to use the ROUTER (client) - ROUTER (kernel) pattern.

Proposed enhancement

We propose to set the ZMQ_IDENTITY to a value known by the clients, so that it is possible to implement
a ROUTER - ROUTER pattern. This pattern is particularly useful when a single client wants to talk to many
kernels and we want to avoid opening a lot of sockets. It is also a perfect solution for a multiplexer component
that routes messages from many clients to different kernels.

A common pattern is to use the socket's endpoint as its identity. Since the client already knows to which endpoint
it will connect, this avoids complex additional excahnges between the client and the kernel, or complex additional
configuration.

Impact on existing implementations

This requires a minor change in all the existing kernels, but this is a single line change.

@minrk
Copy link
Member

minrk commented Aug 24, 2021

IPython Parallel uses ROUTER-ROUTER a lot because it reverses the connection direction (kernels and clients connect to a controller instead of clients connecting directly to kernels). In IPP, the identity of all engine ROUTERs is the engine's UUID.

I'm not sure I would recommend building a connecting client this way (I think it would not be a good fit in jupyter-server, for instance), but I've no problem recommending that kernels set identities.

The only theoretical reason I see not to use the endpoint as the identity is that a socket can bind to multiple endpoints, which would make this ambiguous, but that's not currently possible in the Jupyter spec, and I don't see it as likely. Though I suppose technically, a kernel could have bound a single ROUTER to both shell and control URLs and would still satisfy the protocol.

@JohanMabille
Copy link
Member Author

I'm not sure I would recommend building a connecting client this way (I think it would not be a good fit in jupyter-server, for instance), but I've no problem recommending that kernels set identities.

What drawbacks do you see with this pattern? In the long run, we could reduce the number of sockets in the jupyter server for instance (having one ROUTER for all the shell channels, one for all the control channels, less sockets to poll).

The only theoretical reason I see not to use the endpoint as the identity is that a socket can bind to multiple endpoints, which would make this ambiguous, but that's not currently possible in the Jupyter spec, and I don't see it as likely. Though I suppose technically, a kernel could have bound a single ROUTER to both shell and control URLs and would still satisfy the protocol.

We could clarify that in the protocol spec, explicitely stating that the shell and the control sockets must bind to a single endpoint. The split of the shell and control channels has been integrated in the spec, so kernels should not use a single ROUTER for shell and control anymore.

The advantage of using the endpoint is that the client already knows it, we don't have to setup an additional complex mechanism. But we could also imagine that the client passes the ROUTER identities to the kernel via the connection file (maybe this would give more flexibility).

Co-authored-by: David Brochart <david.brochart@gmail.com>
@JohanMabille
Copy link
Member Author

JohanMabille commented Aug 24, 2021

Here are the changes that would be required in ipykernel if this proposal get approved:

diff --git a/ipykernel/kernelapp.py b/ipykernel/kernelapp.py
index a85dd6c..352090a 100644
--- a/ipykernel/kernelapp.py
+++ b/ipykernel/kernelapp.py
@@ -193,9 +193,14 @@ class IPKernelApp(BaseIPythonApplication, InteractiveShellApp,
         iface = '%s://%s' % (self.transport, self.ip)
         if self.transport == 'tcp':
             if port <= 0:
+                # We need to select random ports ourselves instead
+                # of using bind_to_random_port if we go with socket
+                # identities
                 port = s.bind_to_random_port(iface)
             else:
-                s.bind("tcp://%s:%i" % (self.ip, port))
+                endpoint = "tcp://%s:%i" % (self.ip, port)
+                s.set(zmq.IDENTITY, endpoint.encode('utf-8')
+                s.bind(endpoint)
         elif self.transport == 'ipc':
             if port <= 0:
                 port = 1
@@ -205,7 +210,9 @@ class IPKernelApp(BaseIPythonApplication, InteractiveShellApp,
                     path = "%s-%i" % (self.ip, port)
             else:
                 path = "%s-%i" % (self.ip, port)
-            s.bind("ipc://%s" % path)
+            endpoint = "ipc://%s" % path;
+            s.set(zmq.IDENTITY, endpoint.encode('utf-8')
+            s.bind(endpoint)
         return port
 
     def _bind_socket(self, s, port):

The relevant function can be found here

@kevin-bates
Copy link
Member

@JohanMabille, Thanks for opening this - it seems like a useful enhancement. I had a couple of questions, mostly regarding backward compatibility and the fact that not all kernels will be updated in a timely manner.

The advantage of using the endpoint is that the client already knows it

Does this make an assumption that the client always specifies the ports for the kernel to use? I'm curious what the timing of things is relative to port acquisition and client & kernel.

From a backward-compatibility standpoint, how does a client know if it can utilize a specific ZMQ_IDENTITY or that the kernel (if the kernel is sending the ports to the client, i.e., the handshaking pattern) is utilizing a specific ZMQ_IDENTITY?

Is this capability advertised in the kernelspec?

But we could also imagine that the client passes the ROUTER identities to the kernel via the connection file (maybe this would give more flexibility).

If the client simply conveys the identities, would this address the handshaking pattern?

@SylvainCorlay
Copy link
Member

The advantage of using the endpoint is that the client already knows it

Does this make an assumption that the client always specifies the ports for the kernel to use? I'm curious what the timing of things is relative to port acquisition and client & kernel.

From a backward-compatibility standpoint, how does a client know if it can utilize a specific ZMQ_IDENTITY or that the kernel (if the kernel is sending the ports to the client, i.e., the handshaking pattern) is utilizing a specific ZMQ_IDENTITY?

With respect to the new handshaking pattern, my understanding is that this would be simple to adapt since the contract is for identity to be exactly the same as the endpoint...

I think this proposal is backward compatible because it only adds a new identity. Now, with respect to kernels advertising that it verifies that constraint, I think it goes with a (minor) protocol version bump... (A client requiring this could decide to ignore kernels not implementing this new constraint).

@JohanMabille
Copy link
Member Author

The advantage of using the endpoint is that the client already knows it

Does this make an assumption that the client always specifies the ports for the kernel to use? I'm curious what the timing of things is relative to port acquisition and client & kernel.

From a backward-compatibility standpoint, how does a client know if it can utilize a specific ZMQ_IDENTITY or that the kernel (if the kernel is sending the ports to the client, i.e., the handshaking pattern) is utilizing a specific ZMQ_IDENTITY?

The only assumption made is "the idendity of the socket is its endpoint". How the endpoint is set and known by the client is orthogonal to that assumption. With the current implementation where the client pass a connection file to the kernel, it already knows the endpoints. If we implement the handshake pattern where the kernel communicates its ports, the client will be able to build the identity from that info.

Regarding the transition we can bump the version of the protocol if this get merged, and the client will know if a kernel supports this feature.

@minrk
Copy link
Member

minrk commented Aug 26, 2021

I realize the identity=endpoint requirement would also break IPython Parallel's bind_kernel functionality (turn existing engine into regular bound Jupyter kernel by calling bind on its already-connected and identified sockets) because IPython parallel relies on using engine identities for routing, and cannot use these URLs for its identities.

So I don't think any standard clients should be built that require this ROUTER-ROUTER functionality, because we already have things that depend on the base protocol not defining what the identity should be (that's different from saying that it should be set and knowable).

What drawbacks do you see with this pattern?

Mostly having to do with what happens while connections are being established - DEALER sockets have more appropriate wait-for-peer when sending before there is a peer to receive the message. ROUTER sockets can either drop messages or raise errors, neither of which is probably what we would want, so this adds further 'wait-for-connect' complexity to Jupyter clients to avoid lost messages / errors (like we have seen with iopub). This is going to be an issue any time a kernel starts or restarts, or a new client connects.

In the long run, we could reduce the number of sockets in the jupyter server for instance

I really don't think we should adopt this pattern in jupyter-server, or any official Jupyter client. I would reserve it for specialized many-kernel clients that need to minimize their FD count in exchange for some added complexity.

@SylvainCorlay
Copy link
Member

I would reserve it for specialized many-kernel clients that need to minimize their FD count in exchange for some added complexity.

Yes, that use case was the original motivation.

@JohanMabille
Copy link
Member Author

GIven the incompatibility of this approach with ipyparallel, and the fact that this would not reduce the FD on OSX, let's close this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants