-
Notifications
You must be signed in to change notification settings - Fork 896
Don't create child lookup if parent is faulty #7118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't create child lookup if parent is faulty #7118
Conversation
I found other instances of the same issue. For example, lookup id: 62124, block_root: 0x48d5c54417da73dbec50da
|
In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
beacon_node/network/src/sync/manager.rs:920
- The variable 'parent_root' is not defined in this scope. Consider extracting it from 'block_component' before using it in the debug log.
debug!(?block_root, ?parent_root, "No lookup created for child and parent");
beacon_node/network/src/sync/manager.rs:928
- The variable 'parent_root' is not available in this context. Extract and define it (e.g. via block_component.parent_root()) to ensure the debug log has access to the correct parent identifier.
debug!(%block_root, %parent_root, reason, "Ignoring unknown parent request");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Would you mind adding a regression test for this?
I made an attempt but didn't end up finishing it (and it didn't work), but I think a test would like this?
// Regression test for https://github.com/sigp/lighthouse/pull/7118
// Trigger an unknown parent block that references a previously failed chain and assert
// that the lookup is not created.
let mut child_block = rig.rand_block();
*child_block.message_mut().parent_root_mut() = chain_hash;
rig.trigger_unknown_parent_block(peer_id, child_block.into());
rig.expect_no_active_lookups();
All required checks have passed and there are no merge conflicts. This pull request may now be ready for another review. |
@dapplion I'll add a test for this - would like to get this merged. |
@dapplion Added a test that fails on |
// At this point, the chain should have been deemed too deep and pruned. | ||
// The tip root should have been inserted into failed chains. | ||
rig.assert_failed_chain(tip_root); | ||
rig.expect_no_penalty_for(peer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test follows current behaviour where we do NOT penalize peers for sending us long chains that maybe be canonical (honest behaviour).
// THEN: The extending block should not create a lookup because the tip was inserted into failed chains. | ||
rig.expect_no_active_lookups(); | ||
// AND: The peer should be penalized for extending a failed chain. | ||
rig.expect_single_penalty(peer_id, "failed_chain"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we then penalize peers who do the same action. This is inconsistent and can lead to penalties for honest peers. We should not penalize peers who extend known long chains. We could have to hashsets
- failed_chains: reject descendant lookups and penalize
- long_chains: reject descendant lookups and don't penalize
We should fix this in a future PR
✅ good to merge @jimmygchen |
Issue Addressed
Issue discovered on PeerDAS devnet (node
lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io
). Summary:0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12
0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12
is added to the failed chains cache0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12
0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12
and hit a processor errorUnknownParent
, hitting this linelighthouse/beacon_node/network/src/sync/block_lookups/mod.rs
Lines 686 to 688 in bf955c7
search_parent_of_child
does not create a parent lookup because the parent root is in the failed chain cache. However, we have already marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist.Now we have a lookup (the child of
0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12
) that is awaiting a parent lookup that doesn't exist: hence stuck.Impact
This bug can affect Mainnet as well as PeerDAS devnets.
This bug may stall lookup sync for a few minutes (up to
LOOKUP_MAX_DURATION_STUCK_SECS = 15 min
) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot ofWARN
logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup.This bug is triggered if:
Proposed Changes
Ensure that we never create (or leave existing) a lookup that references a non-existing parent.
I added
must_use
lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So ifsearch_parent_of_child
returnsfalse
now returnLookupRequestError::Failed
instead ofLookupResult::Pending
.As a bonus I have a added more logging and reason strings to the errors