Skip to content

Rationalize Downloader Webpeer Discovery #14170

@mh0lt

Description

@mh0lt

The current webpeer discovery is error prone and has several spot bug fixes that have been made for specific issues.

The underlying code flow needs to be rationalised so the start-up of downloading startup and metadata discovery has a consistent model and well defined state managed in one consistent place in the code.

Once this is done add we can look at adding additional functionality to make the start-up and maintenance of files more consistent and think about adding functionality such as:

what I thought it does: fetch a new preverified.toml , and then canonicalize the set of on-disk snapshots/$x.torrent files against the new preverified.toml, and then verify data files against torrent files

due to this, if you ever end up with "bad snapshots" (really meaning: bad torrent files), you're somewhat stuck with them — the node itself has no mechanism to get you out of this situation; to repair a node in this situation, you'd actually have to shut down the node and go delete (I think) snapshots/preverified.toml + snapshots/prohibit_new_downloads.lock + all snapshots/*.torrent files + the whole downloader/ dir (and then maybe also reset the snapshot stage progress in the chaindata DB?) — which is all fiddly, and is why the advice is usually just "blow away the whole node" 
and I feel like this is a missed opportunity

y "canonicalize the set of on-disk torrent files", I picture something like:
ensure all .torrent files that should be there according to preverified.toml, are there — if not, fetch the missing ones
if there are any .torrent files on-disk without corresponding entries in latest preverified.toml, then blow those .torrent files away, along with their associated snapshot data files
compute the metainfo hashes of each on-disk .torrent file, and compare to the metainfo hashes in new preverified.toml ; if they don't match, re-download the torrent file (why not blow away data file? because ideally, in the case of a version update that fixes a minor corruption in the file, this allows the torrent client to simply find the "pieces" of the file that have changed on verify, and then redownload only those pieces on leech)

(in other words, this is basically an MVP, synchronous version of the larger design I described here a few months ago, just above) 
of course, this "canonicalize torrents from preverified.toml" step could never run, currently — because once snapshots/prohibit_new_downloads.lock exists, erigon stops caring about updating snapshots from the network

but my understanding is that prohibit_new_downloads.lock is not exactly meant to "prohibit new downloads", but rather meant to "prohibit downloads of new segments" (i.e. to not bother to download segments after torrent sync completes, because erigon will have built those locally, so there's no point and it just wastes time on boot for no gain)

but while node operators don't want to waste time (potentially stalling some useful work their node is doing) downloading segment files they already have built locally, they would likely care very much about downloading new versions of files for existing segments, that repair corruption in previous versions of those files — and would definitely be willing to wait for such a download to complete

I think, all that is needed to get this working (besides making --downloader.verify trigger the above canonicalization steps before actual torrent verification), would be to change snapshots/prohibit_new_downloads.lock from being a persisted set of snapshot types to freeze, to being a persisted map from snapshot types to the max block/tx cursor position (the same range-looking keys used in the snapshot filenames) that is meant to be "owned" by the torrent-backed files in the snapshot store

then, in this snapshot canonicalization process, any keys in a newly-fetched preverified.toml with a cursor-range minimum above the max cursor position specified in prohibit_new_downloads.lock, would simply be ignored, as if not listed in the file

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions