Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework p2p req resp protocol #11781

Open
wants to merge 50 commits into
base: develop
Choose a base branch
from
Open

Rework p2p req resp protocol #11781

wants to merge 50 commits into from

Conversation

anacrolix
Copy link
Contributor

@anacrolix anacrolix commented Sep 6, 2024

Primarily addresses #11779.

  • Adds the ability to configure the SyncClient peer rate limit and ReqRespServer global rate limit via the SetupP2P interface.
  • Adds a test that has a bunch of syncers connect to a sequencer after a large number of blocks have been lost to gossip.
  • Adds PayloadSource to L2PayloadIn handlers, so it can be differentiated between gossip and alt sync.
  • Changes GossipIn to L2PayloadIn in a few places where it's not just gossip.
  • Removes non-functional/unused active range request tracking. There's just one gap at a time.

Reworks p2p alt sync to:

  • Prioritize the next trusted block and blocks that can be promoted soonest.
  • Adds request concurrency so a single peer can't block progression.
  • Retry failed requests to other peers.
  • Correctly wait if an error is received from a server.
  • Avoid overflowing quarantine and trusted caches.
  • Always promote the very next trusted block.
  • Handle range request updates without throwing away or forgetting about existing requests.

TODO/open questions/future PRs (not blocking unless someone flags them):

  • Should I add a metric in OpNode where it can differentiate the L2 unsafe payload source? Do we have a precedent for handling error and non error cases (separate metrics?).
  • Limit the number of blocks to quarantine, and evict blocks furthest from the promote head.
  • Add a rate limit error code to the req/resp protocol so clients don't penalize servers for rate limiting them.
  • Trigger range request updates when the unsafe head moves. I put the code in place for this but it doesn't seem to trigger.
  • The active request block list could be a slice instead of a map.
  • Possibly handle abandoning the gap entirely when we get an empty range request.

@anacrolix anacrolix marked this pull request as ready for review September 9, 2024 07:26
@anacrolix anacrolix requested review from a team as code owners September 9, 2024 07:26
@anacrolix
Copy link
Contributor Author

The contracts-bedrock-tests failure is unrelated I think...

This is ready for review.

op-node/p2p/sync.go Outdated Show resolved Hide resolved
op-node/p2p/sync.go Outdated Show resolved Hide resolved
op-e2e/alt-sync_test.go Outdated Show resolved Hide resolved
Also figured out answers to some of the TODOs
@anacrolix
Copy link
Contributor Author

Thanks @tynes, addressed your nits, and answered the TODO question so it's not lingering forever as an unknown.

@@ -4,6 +4,11 @@ go 1.21

require (
github.com/BurntSushi/toml v1.4.0
github.com/anacrolix/chansync v0.6.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is it to use these packages? We need to be really sure when adding new packages to be certain there aren't backdoors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely need chansync or I'd just end up duplicating all that again.

Some of the generic stuff is exceptionally useful for avoiding a lot of verbosity in Go.

The rest I was using for debugging.

@sebastianst sebastianst removed the request for review from zhwrd September 18, 2024 09:56
@@ -223,6 +192,7 @@ type SyncPeerScorer interface {
// If the user does sync a long range of blocks through this mechanism,
// it does end up traversing through the chain, but receives the blocks in reverse order.
// It is up to the user to persist the blocks for later processing, or drop & resync them if persistence is limited.
// TODO: Should this be renamed to AltSyncClient?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a GitHub ticket for this TODO.

Ignore this finding from todos_require_linear.

Copy link
Contributor

semgrep-app bot commented Sep 18, 2024

Semgrep found 1 nil-check-after-call finding:

Potential p2pmetrics nil dereference when NewBandwidthCounter is called

Ignore this finding from nil-check-after-call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants