Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node sends 1k "Duplicate gossip" msgs #5379

Open
ordian opened this issue Aug 16, 2024 · 4 comments
Open

Node sends 1k "Duplicate gossip" msgs #5379

ordian opened this issue Aug 16, 2024 · 4 comments
Labels
I2-bug The node fails to follow expected behavior. T0-node This PR/Issue is related to the topic “node”.

Comments

@ordian
Copy link
Member

ordian commented Aug 16, 2024

We had a spike of 1k warning messages from a single peer on Kusama:
https://grafana.teleport.parity.io/goto/Vs5GETjIR?orgId=1

WARN tokio-runtime-worker peerset: Report 12D3KooWJPXURGJ1nMa3FwA8aeSZJEpXnaVX36Gyr8LQBPUmHiWz: -4 to -1987730577. Reason: Duplicate gossip. Banned, disconnecting.

where the reputation was very slowly (-4) adjusted after being banned.
The node is https://apps.turboflakes.io/?chain=kusama#/validator/J11jfzJQAkxKJTZozq3SDRAsjZkSQjzoRCUQ5oVBxWsecJT?mode=history which has a low grading (F), but would still good to know the reason why "Duplicate gossip" was triggered.

cc @paritytech/networking

@ordian ordian added T0-node This PR/Issue is related to the topic “node”. I2-bug The node fails to follow expected behavior. labels Aug 16, 2024
@dmitry-markin
Copy link
Contributor

dmitry-markin commented Aug 16, 2024

Thanks for reporting. If it happens we see this error triggered by an offending node under our control, we should try collecting logs from it with -l gossip=trace for the period of time it misbehaves.

Edited. See the next message of @alexggh.

@alexggh
Copy link
Contributor

alexggh commented Aug 16, 2024

The logs are bit misleading because after we banned a node we print the log at every reputation update.
https://github.com/paritytech/polkadot-sdk/blob/master/substrate/client/network/src/peer_store.rs#L265

So you actually have too look for the first occurence of Banned to understand the root cause, so in this case the culprit for banning was.

2024-08-16 00:42:54.458  WARN tokio-runtime-worker peerset: 
Report 12D3KooWJPXURGJ1nMa3FwA8aeSZJEpXnaVX36Gyr8LQBPUmHiWz: -536870912 to -1825819477. 
Reason: No requested block data. Banned, disconnecting.

@ordian ordian changed the title Node banned for "Duplicate gossip" Node sends 1k "Duplicate gossip" msgs Aug 16, 2024
@lexnv
Copy link
Contributor

lexnv commented Aug 16, 2024

We ve seen this before indeed, also reported by some external operator.

I have this on my todo list, will publish a pr Monday probably, to make the reputation warning of a peer under the banned threshold more explicit.

As to the initial warning which caused the ban, i think we can stil look into that

@ordian
Copy link
Member Author

ordian commented Aug 16, 2024

As to the initial warning which caused the ban, i think we can stil look into that

I see 3 places this log is issued from: 2 in chain sync:

} else {
debug!(target: LOG_TARGET, "Unexpected gap block response from {peer_id}");
return Err(BadPeer(*peer_id, rep::NO_BLOCK))
}
},
PeerSyncState::DownloadingStale(_) => {
peer.state = PeerSyncState::Available;
if blocks.is_empty() {
debug!(target: LOG_TARGET, "Empty block response from {peer_id}");
return Err(BadPeer(*peer_id, rep::NO_BLOCK))
}

and one in warp sync:
if blocks.is_empty() {
debug!(
target: LOG_TARGET,
"Downloading target block failed: empty block response from {peer_id}",
);
return Err(BadPeer(peer_id, rep::NO_BLOCK))
}

Unfortunately, we don't seem to have sync=debug logs enabled on our Kusama validators.

github-merge-queue bot pushed a commit that referenced this issue Aug 21, 2024
This PR aims to make the logging from the peer store a bit more clear.

In the past, we aggressively produced warning logs from the peer store
component, even in cases where the reputation change was not malicious.
This has led to an extensive number of logs, as well to node operator
confusion.

In this PR, we produce a warning message if:
- The peer crosses the banned threshold for the first time. This is the
actual reason of a ban
- The peer misbehaves again while being banned. This may happen during a
batch peer report

cc @paritytech/networking 

Part of: #5379.

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

No branches or pull requests

4 participants