Skip to content

Conversation

etan-status
Copy link
Contributor

The p2p specs state:

Non-faulty, optimistic nodes may send blocks which result in an INVALID response from an execution engine. To prevent network segregation between optimistic and non-optimistic nodes, transmission of an INVALID execution payload via the Req/Resp domain SHOULD NOT cause a node to be down-scored or disconnected. Transmission of a block which is invalid due to any consensus layer rules (i.e., not execution layer rules) MAY result in down-scoring or disconnection.

They further state:

  • Optimistic nodes must listen to block gossip to obtain a view of the head of the chain.
  • Therefore, optimistic nodes must propagate gossip blocks. Otherwise, they'd be censoring.
  • If optimistic nodes will propagate blocks via gossip, then they must respond to requests for the parent via RPC.
  • Therefore, optimistic nodes must send optimistic blocks via RPC.

It is unclear where the requirement to respond to parent comes from (in theory the parent could be outside the sync range...), and it is also vague what "RPC" means. However, the intention can be inferred when looking at practical triggers for Req/Resp messages:

  • Status message advertising a locally unknown root
  • Attestation on gossip that refers to a locally unknown block
  • Gossip message of a descendant block with an unknown parent block
  • Sidecar referring to unknown block

As we participate in gossip while optimistically synced, we trigger this request logic in other peers. Currently, Nimbus answers with empty resp when the primary head is optimistically synced (i.e., EL is syncing). This is actively hurting the sync performance of others, as such a resp asserts explicit non-existence of the requested data.

We could improve that by answering with 3: ResourceUnavailable error to indicate that we are actively syncing / have pruned the data, and let the peer ask someone else for the data. Further, we could also serve the known-good data up to the finalized checkpoint or the latest valid head.

However, withholding data that we actually have available doesn't help the network, especially so in difficult to sync non-finality scenarios or when genesis syncing from archive node peers. Therefore, recommending to remove the checks that prevent answering Req/Resp while opt synced.

The p2p specs state:

> Non-faulty, optimistic nodes may send blocks which result in an INVALID response from an execution engine. To prevent network segregation between optimistic and non-optimistic nodes, transmission of an INVALID execution payload via the Req/Resp domain SHOULD NOT cause a node to be down-scored or disconnected. Transmission of a block which is invalid due to any consensus layer rules (i.e., not execution layer rules) MAY result in down-scoring or disconnection.

They further state:

> - Optimistic nodes must listen to block gossip to obtain a view of the head of the chain.
> - Therefore, optimistic nodes must propagate gossip blocks. Otherwise, they'd be censoring.
> - If optimistic nodes will propagate blocks via gossip, then they must respond to requests for the parent via RPC.
> - Therefore, optimistic nodes must send optimistic blocks via RPC.

- https://github.com/ethereum/consensus-specs/blob/v1.5.0/specs/bellatrix/p2p-interface.md#why-allow-invalid-payloads-on-the-p2p-network

It is unclear where the requirement to respond to parent comes from
(in theory the parent could be outside the sync range...), and it is
also vague what "RPC" means. However, the intention can be inferred
when looking at practical triggers for Req/Resp messages:

- Status message advertising a locally unknown root
- Attestation on gossip that refers to a locally unknown block
- Gossip message of a descendant block with an unknown parent block
- Sidecar referring to unknown block

As we participate in gossip while optimistically synced, we trigger this
request logic in other peers. Currently, Nimbus answers with empty resp
when the primary head is optimistically synced (i.e., EL is syncing).
This is actively hurting the sync performance of others, as such a resp
asserts explicit non-existence of the requested data.

We could improve that by answering with `3: ResourceUnavailable` error
to indicate that we are actively syncing / have pruned the data, and let
the peer ask someone else for the data. Further, we could also serve the
known-good data up to the finalized checkpoint or the latest valid head.

However, withholding data that we actually have available doesn't help
the network, especially so in difficult to sync non-finality scenarios
or when genesis syncing from archive node peers. Therefore, recommending
to remove the checks that prevent answering Req/Resp while opt synced.
@etan-status etan-status mentioned this pull request May 27, 2025
@@ -487,10 +470,6 @@ p2pProtocol BeaconSync(version = 1,
for i in startIndex..endIndex:
for k in reqColumns:
if dag.db.getDataColumnSidecarSZ(blockIds[i].root, ColumnIndex k, bytes):
if blockIds[i].slot.epoch >= dag.cfg.DENEB_FORK_EPOCH and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was DENEB. DataColumns are a Fulu feature, and executionValid is bellatrix related...

Copy link
Contributor

@agnxsh agnxsh May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should make it FULU, might be a regression since the time we used Deneb for peerdas devnets, but overall, getting removed so won't matter anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FULU makes equally no sense, same as DENEB xD But yes, proposing to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yes, executionValid isn't related to Fulu in anyway, not sure why i put DENEB earlier, my bad :")

@cheatfate
Copy link
Contributor

This is one of the reasons why nimbus nodes provides empty block responses. It happen when nimbus node is optimistically synced - it will return empty response for any blocks_by_range/blobs_by_range/columns_by_range request.

Copy link

Unit Test Results

       15 files  ±0    2 630 suites  ±0   1h 16m 51s ⏱️ -5s
  6 531 tests ±0    6 007 ✔️ ±0  524 💤 ±0  0 ±0 
45 326 runs  ±0  44 578 ✔️ ±0  748 💤 ±0  0 ±0 

Results for commit 9eca256. ± Comparison against base commit 58f5309.

@tersec tersec merged commit 7a509e5 into unstable May 27, 2025
12 checks passed
@tersec tersec deleted the dev/etan/bf-serveopt branch May 27, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants