Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lookup disconnect peer #5815

Merged
merged 2 commits into from
May 20, 2024
Merged

Conversation

dapplion
Copy link
Collaborator

Issue Addressed

Looking at holesky nodes for Lookup maybe stuck logs, found the following state:

May 19 09:00:04.308 DEBG Lookup maybe stuck                      summary: SingleBlockLookup { id: 6956, block_request_state: BlockRequestState { state: SingleLookupRequestState { state: AwaitingDownload, available_peers: 0, failed_processing: 0, failed_downloading: 0 } }, blob_request_state: BlobRequestState { state: SingleLookupRequestState { state: AwaitingDownload, available_peers: 1, failed_processing: 0, failed_downloading: 0 } }, block_root: 0x77921f4f47635e2a95efb4753625523aaba24a4636b7dae8544b45e894c04661, awaiting_parent: None, created: Instant { tv_sec: 20313759, tv_nsec: 73955749 } }, block_root: 0x77921f4f47635e2a95efb4753625523aaba24a4636b7dae8544b45e894c04661, id: 6956, service: lookup_sync, service: sync

This state is problematic because the block request has 0 peers and the blob request has 1 peer. The peer sets of all request should be identical, and the cause if this return early here

pub fn remove_peer(&mut self, peer_id: &PeerId) -> bool {
self.block_request_state.state.remove_peer(peer_id)
&& self.blob_request_state.state.remove_peer(peer_id)
}

The tests did not caught the bug, because the covered test case always returned RPCError for all active requests. In the case a lookup only sends a block request (not a blob request) and the peer disconnects, the lookup may get stuck.

Proposed Changes

Duplicating the list of peers (= peers that claim to have imported the set of block components) between block and blob requests in not necessary. This PR hoists the peer set out of the request state into the lookup struct; indirectly fixing the return early bug.

Also add test to cover the case a lookup loses all peers but does not receive a RPCError.

Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice this makes a lot of sense considering the other recent changes

@realbigsean
Copy link
Member

@mergify queue

Copy link

mergify bot commented May 20, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 2a87016

@mergify mergify bot merged commit 2a87016 into sigp:unstable May 20, 2024
28 checks passed
@dapplion dapplion deleted the fix-lookup-disconnect-peer branch May 22, 2024 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants