Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fork id the default and try to recover the DiscoveryPeer for incoming connections from the PeerTable #5628

Merged
merged 43 commits into from
Jan 22, 2024

Conversation

pinges
Copy link
Contributor

@pinges pinges commented Jun 21, 2023

PR description

This PR introduces these changes:

  • Make "--filter-on-enr-fork-id" default to true. That means that Besu by default will request the ENR from peers it has bonded with, which usually contains the fork id. The fork id then is used to find out whether that peer is on the same chain.
  • When other peers are initiating a handshake with us we now will check our PeerTable to see whether we have bonded with that peer. If the peer is found in the PeerTable we can check if the fork id is available to find out whether that peer is on the same chain.

Only if the fork id indicates that the peer is on the same chain we will try to establish a connection with that peer.

Currently Besu tries to connect to each peer in the PeerTable every 30 seconds. If a peer responds with an ENR that contains the fork id that peer will only be tried if that fork id passes the check. Checking the fork id at that point will safe Besu from doing a handshake, exchange the HelloMessage, and potentially exchanging status messages with a peer before finding out that the peer is on a different chain. Metrics produced during the testing of these changes indicate that on mainnet about 70-75% of peers are failing the fork id check.

Fixed Issue(s)

#5272

…scoveryPeer for incoming connections from the PeerTable

Signed-off-by: Stefan <stefan.pingel@consensys.net>
@github-actions
Copy link

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

pinges added 11 commits June 26, 2023 10:22
Signed-off-by: Stefan <stefan.pingel@consensys.net>
Signed-off-by: Stefan <stefan.pingel@consensys.net>
Signed-off-by: Stefan <stefan.pingel@consensys.net>
Signed-off-by: Stefan <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
@pinges pinges marked this pull request as ready for review December 7, 2023 04:26
pinges and others added 7 commits December 15, 2023 13:20
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
pinges and others added 16 commits December 22, 2023 14:40
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments and def needs a changelog entry

@macfarla
Copy link
Contributor

I'm happy with this PR but would like @fab-10 or someone else from chupa to take a look before it gets merged

Copy link
Contributor

@fab-10 fab-10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor logs opt and a question about the rate of connection attempts

Comment on lines 416 to 418
.setMessage("Mismatched network id: {}, peer {}")
.addArgument(status.networkId())
.addArgument(peer.getShortNodeId())
.log();
LOG.atTrace()
.setMessage("Mismatched network id: {}, EthPeer {}")
.addArgument(status.networkId())
.addArgument(peer)
.addArgument(getPeerOrPeerId(peer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid doing the computation when debug level is not enabled, in the addArgument you need to use a lambda. I cannot make a suggestion since there are deleted lines, but the code should look like:

.addArgument(status::networkId)
.addArgument(() -> getPeerOrPeerId(peer))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated all the places where we do log information to use lambdas where possible.

Comment on lines 422 to 427
LOG.atDebug()
.setMessage("{} has matching network id ({}), but non-matching fork id: {}")
.addArgument(getPeerOrPeerId(peer))
.addArgument(networkId)
.addArgument(forkId)
.log();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.atDebug()
.setMessage("{} has matching network id ({}), but non-matching fork id: {}")
.addArgument(getPeerOrPeerId(peer))
.addArgument(networkId)
.addArgument(forkId)
.log();
LOG.atDebug()
.setMessage("{} has matching network id ({}), but non-matching fork id: {}")
.addArgument(() -> getPeerOrPeerId(peer))
.addArgument(networkId)
.addArgument(forkId)
.log();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.a.

Comment on lines 430 to 435
LOG.atDebug()
.setMessage("{} has matching network id ({}), but non-matching genesis hash: {}")
.addArgument(getPeerOrPeerId(peer))
.addArgument(networkId)
.addArgument(status.genesisHash())
.log();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.atDebug()
.setMessage("{} has matching network id ({}), but non-matching genesis hash: {}")
.addArgument(getPeerOrPeerId(peer))
.addArgument(networkId)
.addArgument(status.genesisHash())
.log();
LOG.atDebug()
.setMessage("{} has matching network id ({}), but non-matching genesis hash: {}")
.addArgument(() -> getPeerOrPeerId(peer))
.addArgument(networkId)
.addArgument(status::genesisHash)
.log();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.a.

peer.disconnect(DisconnectReason.SUBPROTOCOL_TRIGGERED);
} else if (mergePeerFilter.isPresent()
&& mergePeerFilter.get().disconnectIfPoW(status, peer)) {
LOG.atDebug()
.setMessage("Post-merge disconnect: peer still PoW {}")
.addArgument(peer.getShortNodeId())
.addArgument(getPeerOrPeerId(peer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.addArgument(getPeerOrPeerId(peer))
.addArgument(() -> getPeerOrPeerId(peer))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.a.

.setMessage("Received status message from {}: {} with connection {}")
.addArgument(peer)
.addArgument(status)
.addArgument(message.getConnection())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.addArgument(message.getConnection())
.addArgument(message::getConnection)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.a.

Comment on lines 385 to +392
void attemptPeerConnections() {
LOG.trace("Initiating connections to discovered peers.");
rlpxAgent.connect(
final Stream<DiscoveryPeer> toTry =
streamDiscoveredPeers()
.filter(peer -> peer.getStatus() == PeerDiscoveryStatus.BONDED)
.filter(peerDiscoveryAgent::checkForkId)
.sorted(Comparator.comparing(DiscoveryPeer::getLastAttemptedConnection)));
.sorted(Comparator.comparing(DiscoveryPeer::getLastAttemptedConnection));
toTry.forEach(rlpxAgent::connect);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood this correctly, this is called every 30 seconds, how many peers can be attempted every time?
Asking to see if we are doing too many attempts, and if there is a need to put some limits, so we try at max tot connections at time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is reducing the number of peers we are trying to connect to to the ones that have a valid fork id or did not send a fork id.
I agree that we should look into limiting this, e.g. if we do have reached max peers. But that is for another time :-)

Copy link
Contributor Author

@pinges pinges Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a new task to make sure that we can have a look at optimising that!
https://app.zenhub.com/workspaces/team-revenant-5e6accf93892a67e1d7a7f34/issues/gh/hyperledger/besu/6442

Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing I spotted in the changelog but it doesn't necessarily need to be fixed in your PR

- The time that can be spent selecting transactions during block creation is not capped at 5 seconds for PoS and PoW networks, and for PoA networks, at 75% of the block period specified in the genesis, this to prevent possible DoS in case a single transaction is taking too long to execute, and to have a stable block production rate, but it could be a breaking change if an existing network used to have transactions that takes more time to executed that the newly introduced limit, if it is mandatory for these network to keep processing these long processing transaction, then the default value of `block-txs-selection-max-time` or `poa-block-txs-selection-max-time` needs to be tuned accordingly.
- Requesting the Ethereum Node Record (ENR) to acquire the fork id from bonded peers is now enabled by default, so the following change has been made [#5628](https://github.com/hyperledger/besu/pull/5628):
- `--Xfilter-on-enr-fork-id` has been removed. To disable the feature use `--filter-on-enr-fork-id=false`.
- The time that can be spent selecting transactions during block creation is not capped at 5 seconds for PoS and PoW networks, and for PoA networks, at 75% of the block period specified in the genesis, this to prevent possible DoS in case a single transaction is taking too long to execute, and to have a stable block production rate, but it could be a breaking change if an existing network used to have transactions that takes more time to executed that the newly introduced limit, if it is mandatory for these network to keep processing these long processing transaction, then the default value of `block-txs-selection-max-time` or `poa-block-txs-selection-max-time` needs to be tuned accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The time that can be spent selecting transactions during block creation is not capped at 5 seconds for PoS and PoW networks, and for PoA networks, at 75% of the block period specified in the genesis, this to prevent possible DoS in case a single transaction is taking too long to execute, and to have a stable block production rate, but it could be a breaking change if an existing network used to have transactions that takes more time to executed that the newly introduced limit, if it is mandatory for these network to keep processing these long processing transaction, then the default value of `block-txs-selection-max-time` or `poa-block-txs-selection-max-time` needs to be tuned accordingly.
- The time that can be spent selecting transactions during block creation is now capped at 5 seconds for PoS and PoW networks, and for PoA networks, at 75% of the block period specified in the genesis, this to prevent possible DoS in case a single transaction is taking too long to execute, and to have a stable block production rate, but it could be a breaking change if an existing network used to have transactions that takes more time to executed that the newly introduced limit, if it is mandatory for these network to keep processing these long processing transaction, then the default value of `block-txs-selection-max-time` or `poa-block-txs-selection-max-time` needs to be tuned accordingly.

@pinges pinges merged commit 921bc17 into hyperledger:main Jan 22, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants