-
-
Notifications
You must be signed in to change notification settings - Fork 495
UDP Design Notes
We would like to be able to use UDP as a transport for SP protocol. Furthermore, we would like to be able to use some features of UDP, most notably the ability to directly broadcast or multicast, to deliver messages to more than participant, and to receive such messages.
Under the hood, NNG has a concept of "pipes" which are fundamentally based on "persistent" connections, where there are only two parties in the pipe. This conventional is fairly central to NNG, so breaking it is challenging.
We can create a Unicast solution by implementing a communications protocol on top of UDP, and below the core UDP layer, that would permit us to identify a connected party. This is analogous to TCP, but done entirely in UDP, and there would be no requirement for session ordering. We would implement the "connection" management layer, and "pipes" for each connection. Presumably we would also have a protocol for keep alives. UDP being best effort, we'd send a close announcement on shut down, but it would be unacknowledged. (Just for the benefit of allowing our peer to reclaim resources for the associated "connection".
Multicast connections like this could be formed in a similar way, where we use multicast to discover parties, and build up a mesh connection graph of all peers. Presumably we could add some logic so that multicast / broadcast messages could be sent (maybe a message option?) to allow the outbound packet to be sent only once. (Perhaps we can create a "special" pipe for such messages.) SP layer protocols that multicast could specify this in their send -- "use multicast when possible" or something like that.
This has the advantage of retaining the pipe relationships, but it might be a little strange for administration. Also, in order to keep the connection graph alive, we might want or need to set up some kind of keep alives, and this might require managing a lot of state on peers (depending on the size of the network.) For many protocols, like "PUB", there is little value this.
For Unicast, we can create a "connected" pipe in the same way that we do for TCP ... just as the BSD socket API supports connect() for UDP. The act of establishing a "connection" is nothing more than setting up the default destination for message send. For listen/bind(), there is no accept() call. This would allow traffic to flow in a completely stateless manner. One question is whether we can retain the "pipe" information about the peer in this way. For "connect" side this is a non-issue, but as we "listen" to accept inbound traffic from any peer, we would have to create pipes on message arrival. Presumably we would want some mechanism to purge that (expire the cache?) This may require a whole new set of operations around managing the pipe cache.
Unicast multicast -- building upon the above, we could allow connecting to -- or listening to -- multicast (or possibly broadcast) traffic. Again there is some question about managing the pipes lists.
We could eliminate all of this is if we just dispensed with the ability to identify a single peer node when the peer sends traffic to non-unicast address. This would make it impossible to reply to a sender, or even identify a sender precisely. But for some use cases (PUB/SUB), that might be fine. (Specifically this would be ok for PUB/SUB, and for protocols that are "stateless" such as PAIR, or BUS. But for those latter protocols they would likely need to handle discriminating peers in their application layer, if they needed it.)
The absence of any backpressure would create problems for applications using PUSH/PULL or PAIR and relying on "mostly reliable" semantics. It would be very easy for a sender to overwhelm a receiver. Applications should moderate their rate, or use a sequence number or similar scheme to detect lost messages, if lost messages are a concern.
Multicast (and broadcast included here) means that a message can have multiple recipients. Some patterns would not be able to do reasonable things with this. For example, a REP reply should never go to a multicast address. (Perhaps REP could support listening to a multicast message, but probably the SURVEYOR pattern is better for this use case.) Use of multicast with PAIR is paradigm breaking entirely, and it isn't clear whether PIPELINE semantics make sense with multicast or not.
The patterns that have sensible ways of thinking about non-unicast delivery:
- PUB/SUB -- this is obvious -- PUB could send to multicast or broadcast, and SUB could receive. There is no reply here at all.
- BUS - this is basically a poor man's mesh network, and could work well. There are no promises being broken by multicast at all.
- SURVEYOR/RESPONDENT - SURVEYOR could send a multicast message, and get a unicast reply from a RESPONDENT. (It makes no sense to have the RESPONDENT send non-unicast replies). This is the situation where we have the need for ephemeral pipes or something like that for replies. Probably the pattern could place a hold on the "pipe" (creating one I suppose) to keep it from going away until the reply is generated?
- SAMPLER/POSTER - this is the same as PUB/SUB (SAMPLER/POSTER is on a private tree, but we expect to bring it to NNG at some point.)
The following patterns don't make sense for multicast:
- REQ/REP - While the idea of a single REQ outbound could satisfy this, SURVEYOR seems a more natural use case.
- PAIR - PAIR is specifically about backpressure between a 1:1 pair of peers.
- PAIR POLYAMOROUS - I don't want to think about this... will be replaced by a MESH protocol in the future that will need to be multicast aware. Will treat as ordinary PAIR for now.
A question is how much do we want to have the protocols be aware of whether the transports underneath are multicast.
I think we need this. Basically we don't want to have the protocol sending a multicast message to every peer. Instead we only want it to go to peers that exist in unicast mode only.
UDP has a maximum message size of 64K (including headers) based on IP header payload limitations. (Under IPv6 there is a way to send larger messages, but that has gotten almost zero adoption, and we should not rely on it.)
Additionally message fragmentation and reassembly is particularly painful. So much so that IPv6 basically made it almost impossible to use. UDP lacks the safeguards to reasonably handle retries, which means that fragmentation reassembly is likely to be problematic for many implementations.
We could implement another streaming layer on top, such as KCP or QUIC, but that seems to defeat some of the purpose of UDP, and would not coexist well with multicast at all.
Thus, we propose a simple solution --
- Maximum message size of ~64KB. We can return an "error" to the caller if they try to send too large, or we can just silently drop. (Or drop it, but log that we did.)
- Reassembly dependent upon the underlying OS.
- Expose some form of option to send with DO NOT FRAGMENT bit set.
- Strongly recommend in documentation that UDP messages be limited to layer 2 maximum frame size (typically 1500 for Ethernet, but in some cases smaller ... IPv6 says 1280 is the minimum... and sometimes larger on networks with jumbo frames, but rarely over 9000.)