[client/network] Remove unused argument #13444

melekes · 2023-02-23T06:03:38Z

Refs https://github.com/paritytech/infrastructure-alerts/issues/26

remove unused ban argument
improve 💔 Called on_block_justification with a bad peer ID error message

``` sync: Too many full nodes, rejecting 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ sync: 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ disconnected ``` is enough to understand that we've refused to connect to the given peer

This reverts commit c87f755.

altonen · 2023-02-23T07:54:59Z

I wonder why the banning mechanism was disabled. Could you add a test case showing that the peer is actually banned when we disconnect from it?

melekes · 2023-02-23T08:28:15Z

I wonder why the banning mechanism was disabled.

I think only @tomaka knows.

Could you add a test case showing that the peer is actually banned when we disconnect from it?

sure

tomaka · 2023-02-23T09:29:14Z

A ban is not actually about banning the peer completely, it is simply an indication to not establish an outgoing substream again to the same peer.

The peer is still allowed to establish outgoing connections to us. This was completely intended and has always been.

This whole thing has had emergent behaviors in the past, where peers keep banning each other or spam each other and use tons of CPU. Even though the nodes were working as intended, the combination of the behavior of both sides led to issues that we didn't expect. I can't stress enough that modifications to this banning behavior will probably have issues that are very hard to predict.
I would in general strongly recommend to not modify the logic in behavior.rs unless you really really know what you're doing and/or are ready to extensively test things and look at graphs for hours.

melekes · 2023-02-24T10:24:07Z

This whole thing has had emergent behaviors in the past, where peers keep banning each other or spam each other and use tons of CPU.

well, that's happening right now because the remote peer does not get banned and immediately tries to open a connection to us. this leads to an endless loop if the node has too many peers.

I would in general strongly recommend to not modify the logic in behavior.rs unless you really really know what you're doing and/or are ready to extensively test things and look at graphs for hours.

reasonable concern 👍

tomaka · 2023-02-24T10:37:35Z

well, that's happening right now because the remote peer does not get banned and immediately tries to open a connection to us. this leads to an endless loop if the node has too many peers.

The intended solution to that is that it's the remote that politely makes sure to not try connecting again if we are full. In other words, if A opens a substream to B and B is full then A should ban B and shouldn't try again. This is normally already the case in the code.

A node needs outgoing substreams in order to be guaranteed to function properly, which is why making sure that it is able to find outgoing substreams to well-behaving nodes is important. You can't function with inbound substreams alone, because that would make you vulnerable to eclipse attacks.

Accepting inbound substreams on the other hand is nothing more than a service that it graciously provides. A node doesn't need to protect itself from inbound substreams, because it's not a problem if a misbehaving/malicious node connects.

You could say that banning bad peers that establish substreams to us help freeing slots for the good peers. However, this is no way a protection against an attacker that wants to squat a node. We went with the opinion that since there shouldn't be buggy peers and that you can't guard against malicious peers, then there's also no need to create defenses against them. Instead, energy is better spent avoiding/fixing buggy peers in the first place.

This reverts commit 7e59d05.

This reverts commit 3859201.

This reverts commit f1dc623.

melekes · 2023-02-24T12:25:01Z

this leads me to a question: why is our node tries to establish outgoing connections if it knows it's full (all 40 slots are occupied)?

tomaka · 2023-02-24T13:46:48Z

this leads me to a question: why is our node tries to establish outgoing connections if it knows it's full (all 40 slots are occupied)?

It shouldn't, but I feel like there might be some confusion.

All nodes are allowed to "connect" to any other node at any time, as in TCP/WebRTC/QUIC/etc. connections.
This is necessary in order for the DHT, the collations, etc. to work.

There's a limit to the number of incoming "connections" but it's very large (several thousands), and there's no limit to the number of outgoing connections.

The slots (the 40 slots or whatever) and banning system only apply to block announces substreams. There are N out slots, meaning that we'll have only up to 40 outgoing block announces substreams. There are also N in slots, meaning that we'll only accept up to 40 incoming block announces.
The transactions and grandpa substreams are also obligated to follow the block announces substream. In other words we only open/maintain transactions & grandpa substreams with the same peers that already have a block announces substream. So these two are also concerned, but indirectly.
The slots and banning system do not apply to other substreams (kademlia, identify, ping, collation, block requests, etc.)

Note that in my answers above I make sure to talk about substreams, because the term "connection" is vague and confusing.

melekes · 2023-02-27T08:48:08Z

There's a limit to the number of incoming "connections" but it's very large (several thousands), and there's no limit to the number of outgoing connections.
The slots (the 40 slots or whatever) and banning system only apply to block announces substreams.

Thanks for explaining this @tomaka 🙏

client/network/src/protocol.rs

altonen

LGTM but please restore the removed print.

I don't have access to the logs of the issue but if there are hundreds of messages per second from one peer, we need to look into why does that peer not backoff after getting rejected but instead keeps trying without delay. If that is the case, it doesn't sound like correct behavior to me.

This reverts commit a87e65f.

melekes · 2023-02-27T10:43:06Z

bot merge

* improve error message * removed unused argument * docs: disconnect_peer_inner no longer accepts `ban` * remove redundant trace message ``` sync: Too many full nodes, rejecting 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ sync: 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ disconnected ``` is enough to understand that we've refused to connect to the given peer * Revert "removed unused argument" This reverts commit c87f755. * ban peer for 10s after disconnect * do not accept incoming conns if peer was banned * Revert "do not accept incoming conns if peer was banned" This reverts commit 7e59d05. * Revert "ban peer for 10s after disconnect" This reverts commit 3859201. * Revert "Revert "removed unused argument"" This reverts commit f1dc623. * format code * Revert "remove redundant trace message" This reverts commit a87e65f.

melekes added 7 commits February 9, 2023 15:38

improve error message

fdddae8

removed unused argument

c87f755

Merge branch 'master' into anton/13295-sync-issues

a1f5ed3

docs: disconnect_peer_inner no longer accepts ban

c8d8d29

remove redundant trace message

a87e65f

``` sync: Too many full nodes, rejecting 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ sync: 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ disconnected ``` is enough to understand that we've refused to connect to the given peer

Revert "removed unused argument"

f1dc623

This reverts commit c87f755.

ban peer for 10s after disconnect

3859201

melekes changed the title ~~Anton/13295 sync issues~~ [client/network] Ban peer after disconnecting Feb 23, 2023

altonen requested a review from a team February 23, 2023 07:54

do not accept incoming conns if peer was banned

7e59d05

melekes added 3 commits February 24, 2023 15:45

Revert "do not accept incoming conns if peer was banned"

bd87444

This reverts commit 7e59d05.

Revert "ban peer for 10s after disconnect"

7eb4cf0

This reverts commit 3859201.

Revert "Revert "removed unused argument""

3ef6b8b

This reverts commit f1dc623.

melekes changed the title ~~[client/network] Ban peer after disconnecting~~ [client/network] Remove unused argument Feb 27, 2023

Merge branch 'master' into anton/13295-sync-issues

4771824

melekes marked this pull request as ready for review February 27, 2023 08:44

melekes self-assigned this Feb 27, 2023

melekes added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Feb 27, 2023

format code

95bc0f3

melekes requested a review from a team February 27, 2023 08:50

melekes removed the request for review from a team February 27, 2023 08:50

dmitry-markin approved these changes Feb 27, 2023

View reviewed changes

client/network/src/protocol.rs Outdated Show resolved Hide resolved

dmitry-markin requested a review from a team February 27, 2023 09:04

altonen approved these changes Feb 27, 2023

View reviewed changes

Revert "remove redundant trace message"

ee0f51c

This reverts commit a87e65f.

paritytech-processbot bot merged commit 4ad96cf into paritytech:master Feb 27, 2023

melekes deleted the anton/13295-sync-issues branch February 27, 2023 10:43

kiltbot mentioned this pull request Apr 3, 2023

3️⃣ [AUTOMATIC] Update Polkadot dependencies from 0.9.39 to 0.9.40 KILTprotocol/kilt-node#496

Closed

kacperzuk-neti mentioned this pull request Jun 23, 2023

Polkadot v0.9.43 liberland/liberland_substrate#295

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[client/network] Remove unused argument #13444

[client/network] Remove unused argument #13444

melekes commented Feb 23, 2023 •

edited

Loading

altonen commented Feb 23, 2023

melekes commented Feb 23, 2023

tomaka commented Feb 23, 2023 •

edited

Loading

melekes commented Feb 24, 2023 •

edited

Loading

tomaka commented Feb 24, 2023 •

edited

Loading

melekes commented Feb 24, 2023

tomaka commented Feb 24, 2023 •

edited

Loading

melekes commented Feb 27, 2023

altonen left a comment

melekes commented Feb 27, 2023

[client/network] Remove unused argument #13444

[client/network] Remove unused argument #13444

Conversation

melekes commented Feb 23, 2023 • edited Loading

altonen commented Feb 23, 2023

melekes commented Feb 23, 2023

tomaka commented Feb 23, 2023 • edited Loading

melekes commented Feb 24, 2023 • edited Loading

tomaka commented Feb 24, 2023 • edited Loading

melekes commented Feb 24, 2023

tomaka commented Feb 24, 2023 • edited Loading

melekes commented Feb 27, 2023

altonen left a comment

Choose a reason for hiding this comment

melekes commented Feb 27, 2023

melekes commented Feb 23, 2023 •

edited

Loading

tomaka commented Feb 23, 2023 •

edited

Loading

melekes commented Feb 24, 2023 •

edited

Loading

tomaka commented Feb 24, 2023 •

edited

Loading

tomaka commented Feb 24, 2023 •

edited

Loading