Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/downloader: resolve local header by hash for beacon sync #24691

Merged
merged 5 commits into from
Apr 14, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions eth/downloader/beaconsync.go
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ func (d *Downloader) findBeaconAncestor() (uint64, error) {
// fetchBeaconHeaders feeds skeleton headers to the downloader queue for scheduling
// until sync errors or is finished.
func (d *Downloader) fetchBeaconHeaders(from uint64) error {
head, _, err := d.skeleton.Bounds()
head, tail, err := d.skeleton.Bounds()
if err != nil {
return err
}
Expand All @@ -266,8 +266,17 @@ func (d *Downloader) fetchBeaconHeaders(from uint64) error {
)
for i := 0; i < maxHeadersProcess && from <= head.Number.Uint64(); i++ {
header := d.skeleton.Header(from)

// The header is not found in skeleton space, try to resolve the header
// reversely via skeleton tail. Note we can't retrieve the header by
// number directly, since there is no guarantee it's the header we want.
if header == nil {
header = d.getAncestorHeader(from, tail)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. The new getAncestorHeader is a linear scanning from tail down to from. Given that we're in a loop already, that would be an O(n^2) blowup.

However, I'm unsure I understand why this lookup was introduced in the first place. When filling the chain from the beacon headers, why are we looking at the chain? IMHO if something's missing from the beacon chain, we should abort sync, not "sync from ourselves".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the use case was that if I just switched to merge transition then I may not have 64 headers yet, so the pivot is pre-merge (and pre-skeleton) and thus feeding pivot and post-pivot headers needs a local lookup. Still feels kind of wonky, very weird special case that will keep haunting us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, with the scanning thing, we're scanning over and over and over again the exact same header range. We could have stopped after one scan and collected all the needed headers.

Also IMHO if this case should handle some very short (transition length) range, we should put some cap on the iteration count, otherwise it's asking for trouble if someone manages to iterate all the way back to the genesis somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for adding lookup in local chain is exactly what you said, in this special scenario we need the feed the header from local. But this scenario is kind of common right now, we have a bunch of panics recently are all relevant with this scenario.

And also it's true that the scanning range should be short, we can limit the scanning length to be 64 to avoid the bad situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think adding a 64 limit should be welcome as that should cover the pivot leaking into the legacy sync part, but should still prevent any weird attacks. Perhaps let's add an error log if we can't find the needed header within the fsMinFullBlocks threshold and return an error + tear down sync to avoid the panics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps I would also add a warning log for the normal case - just for a few months before/after the merge - so we can see if/when it's happening and see if the bug is indeed just a transition issue or if it appears somewhere else too that can be fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All pushed, please take another look

}
// The header is still missing, the beacon sync is corrupted and bail out
// the error here.
if header == nil {
header = d.lightchain.GetHeaderByNumber(from)
return fmt.Errorf("missing beacon header %d", from)
}
headers = append(headers, header)
hashes = append(hashes, headers[i].Hash())
Expand Down
27 changes: 23 additions & 4 deletions eth/downloader/downloader.go
Original file line number Diff line number Diff line change
Expand Up @@ -159,9 +159,6 @@ type LightChain interface {
// GetHeaderByHash retrieves a header from the local chain.
GetHeaderByHash(common.Hash) *types.Header

// GetHeaderByNumber retrieves a block header from the local chain by number.
GetHeaderByNumber(number uint64) *types.Header

// CurrentHeader retrieves the head header from the local chain.
CurrentHeader() *types.Header

Expand Down Expand Up @@ -486,7 +483,8 @@ func (d *Downloader) syncWithPeer(p *peerConnection, hash common.Hash, td, ttd *
// Retrieve the pivot header from the skeleton chain segment but
// fallback to local chain if it's not found in skeleton space.
if pivot = d.skeleton.Header(number); pivot == nil {
pivot = d.lightchain.GetHeaderByNumber(number)
_, tail, _ := d.skeleton.Bounds() // error is already checked
pivot = d.getAncestorHeader(number, tail)
}
// Print an error log and return directly in case the pivot header
// is still not found. It means the skeleton chain is not linked
Expand Down Expand Up @@ -1772,3 +1770,24 @@ func (d *Downloader) DeliverSnapPacket(peer *snap.Peer, packet snap.Packet) erro
return fmt.Errorf("unexpected snap packet type: %T", packet)
}
}

// getAncestorHeader retrieves the header with given number. Different from obtaining
// the canonical header through the number directly, a verified tail header is used
// here as the base point and obtain the header eventually by resolving the parent.
func (d *Downloader) getAncestorHeader(number uint64, last *types.Header) *types.Header {
var current = last
for {
parent := d.lightchain.GetHeaderByHash(current.ParentHash)
if parent == nil {
break // The chain is not continuous
}
if parent.Number.Uint64() < number {
break // It shouldn't happen though
}
if parent.Number.Uint64() == number {
return parent
}
current = parent
}
return nil
}