Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Improve priority groups #6087

Closed
Tracked by #989
arkpar opened this issue May 20, 2020 · 7 comments · Fixed by #7700
Closed
Tracked by #989

Improve priority groups #6087

arkpar opened this issue May 20, 2020 · 7 comments · Fixed by #7700
Labels
J0-enhancement An additional feature request.

Comments

@arkpar
Copy link
Member

arkpar commented May 20, 2020

Priority groups API should allow specifying maximum slots for a group and a strategy for picking peers. (E.g ordered or based on reputation).

See discussion in #6078

@arkpar arkpar added the J0-enhancement An additional feature request. label May 20, 2020
@andresilva
Copy link
Contributor

I have followed the discussion on #6078 and I have the same opinion. Here's what I think some of the behavior should be:

  • All priority groups have a maximum number of slots (do we distinguish between in/out here?);
  • Reserved nodes are a priority group where number of slots is equal to the list of reserved nodes (so we are never restricted from connecting to all of them);
  • --reserved-only implies that only the reserved priority group is enabled (any other connection is not allowed);

There are some questions though, given that the priority group has a maximum number of slots how does the peerset make an intelligent decision regarding who to connect to? This logic must exist in some higher layer (e.g. the authority discovery module will be able to distinguish between sentries for the same node), does the user of the peerset just provide some peer rank for ordering the list (not sure if this will be enough)? Also given that the priority group is capped will the peerset also use the normal reputation logic for these peers now as well?

I am also thinking that if priority groups have slots what is the distinction exactly from the normal "non-priority"/global group? I guess if we add slots to the priority group then we can just call them peer groups (or whatever), and the global peers just have their own group as well. Does the peerset need to have a global view among all peers in all groups, or just needs to care about each group distinctly? (we might need some global view for synchronization among the different groups, e.g. we might discover an authority through the normal DHT discovery as well as it being added to the authority discovery priority group).

@tomaka
Copy link
Contributor

tomaka commented Jun 2, 2020

cc paritytech/polkadot-sdk#358

@tomaka
Copy link
Contributor

tomaka commented Jun 2, 2020

Initially, the way I intended to design things (being priority groups were introduced) is that multiple different peersets instances would exist at the same time.

In Polkadot, if I'm not mistaken, we have several overlay networks:

  • An overlay network of only the validators of the relay chain.
  • An overlay network of all the full nodes of the relay chain.
  • For each parachain, an overlay network containing the collators and the validators assigned to said parachain.

For example a relay chain validator would maintain three peersets (relay chain validators, relay chain nodes, parachain), or four during parachain transitions.

The advantage is that we can for example use the peerset for the relay chain, but actually replace with peerset with something and use mechanisms other than reputations for the relay chain validators or the parachain, if it is ever needed.

@infinity0
Copy link
Contributor

infinity0 commented Jul 9, 2020

Going by the above comment, I am a bit worried that this is trying to shoehorn multiple things into the wrong abstraction.

The first two overly networks you mention are effectively the same thing, they are running the same protocol, it is only for various security concerns that I suggested having an additional topology that connects the validators together.

However, the parachain networking is running a completely different protocol, and the structure and data flow characteristics are very different.

  • collators may or may not be connected to each other, but that is something for the parachain to organise, and not strictly a responsibility of polkadot or the parachain networking component, in my eyes. Instead, collators would just directly connect to a randomly-selected validator
  • the validator set changes every few rounds, so everyone will be connecting and disconnecting from everyone else continually
  • the data being sent over this protocol will be generally larger but also less latency-critical than the stuff on the main gossip network

So, reusing the same abstraction for these 2 different protocols, is a hard & dangerous task. I'm not saying it's impossible but I do want to make it clear that not reusing some common abstraction, is also a perfectly fine option and possibly even easier.

@infinity0
Copy link
Contributor

Also, slightly unsure what the word "priority" means in relation to all the other stuff being discussed here. "Groups" is obvious.

@tomaka
Copy link
Contributor

tomaka commented Jul 10, 2020

To give some context, here's some explanation of how the code works at the moment:

  • Libp2p is what handles connecting to other nodes.

  • When libp2p has established a connection, it instantiates a couple of "protocols handlers" structs. Substrate provides its own protocols handlers, one per protocol, and the gossiping-related protocol handlers are initially in a pending state (called "initial" state in the code). It is this code.

  • A few moments later, Substrate will process the event about the newly-established connection and decide, in accordance with the peerset, whether or not the notifications protocols handlers should be enabled or disabled. If enabled, we'll try to initiate gossiping upon this connection, if disabled we won't. It is this code.

  • In addition, the peerset tells the networking which nodes it should be connected to. If the peerset requests a node it's not yet connected to, the networking asks libp2p to connect to it.

  • The peerset can theoretically stop requesting a node from the networking at any moment, in which case the networking will close the gossiping and then the connection as a whole.

  • In order to give a bit of control over the peerset, it has these concepts of "priority group" and "reserved nodes", which are a list of nodes that the peerset will continuously request from the networking. At the moment, the authority-discovery module periodically sets a certain priority group to a list of 10 validators, meaning that the peerset will always ask the networking to be connected to these 10 validators and never ask to disconnect from them.

At the moment, there exists one global peerset for everything. In other words, if the peerset says yes to a connection then we gossip on it with all protocols, and if the peerset says no then we don't gossip on it.

This could be changed by having multiple peersets, one for each overlay network, but we could also bypass the concept of peerset.
The mechanism that permits giving a different purpose for each connection is there with the protocols handlers, but what is missing is the code that keeps track of what the overlay networks should be and informs the networking about which gossiping protocols a certain connection should use.

(disclaimer: at the moment enabling gossiping protocols is an all-or-nothing not only because of the single peerset but also because of backwards-compatibility reasons with the legacy substream, cc #5670)

@tomaka
Copy link
Contributor

tomaka commented Oct 14, 2020

While I've previously been in favour of having multiple peersets, after some more brainstorming/prototyping I'm now more in favour of teaching a single peerset about the various set of the peers that we might want to be connected to. I'm afraid that we will run into design issues later on if we create multiple peersets.

The state of the peerset would therefore consist in a list of PeerIds, each of them associated with:

  • A reputation value.
  • A list of overlay networks the PeerId is believed to belong to.

The list of all possible overlay networks would be passed at initialization when configuring the Peerset.

Rather than adding the authorities to a priority group, the authority-discovery system would inform the Peerset that the validator nodes belong to the overlay network of validators. Priority groups would be gone.

Slots exist only in the context of a specific overlay network. In other words, the Peerset would try to fill a certain number of slots for each overlay network, and slots are no longer a global thing. That means that when the peerset emits a connection request, it mentions which overlay network is concerned. On an incoming connection, we would mention which overlay network the remote desires to "open".

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants