-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Snapshot Baselines #9742
[wip] Snapshot Baselines #9742
Conversation
Context:
Example:
Answer questions:
also can see https://github.com/ledgerwatch/erigon/blob/e35/cmd/downloader/readme.md#e3-datadir-structure |
|
||
Fixed downloads are expected to be immutable which means that the file contents and thier associated hash are expected not to change. On order to prevent disruption on the downloading node the hash of a fixed download is read from the pre-verfied hashes in `chains.toml` when a process is first started and written to the downloads section of `snapshot-lock`. | ||
|
||
After this the process will attempt to download the hashed file from the torrent and once it is downloaded will re-check the file contents for consistency on each process restart. If the file is inconsistent with the `snapshot-lock` hash it will be re-downloaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
will re-check the file contents for consistency on each process restart
- which file? if data-file - why/how need check on each restart? -
If the file is inconsistent with the
snapshot-lockhash it will be re-downloaded
- in another words "snapshot-lock is new source of truth" - what is wrong with current source of truth? what problem you solving? -
Can solve points 1 and 2 by removing
ts.DisableInitialPieceCheck = true
line: https://github.com/ledgerwatch/erigon/blob/devel/erigon-lib/downloader/util.go#L300 ? -
preserve read performance via the file's page-cache
- not only because of this. also because re-read 10Tb of data on every startup it's "too paranoid" mode. to ensure consistency we have next toolbox:
4.1. ACID-db: has some tricks here https://github.com/ledgerwatch/erigon/blob/1879c764c1dd0be566f1b42371d1fbe14e3f2a00/erigon-lib/downloader/mdbx_piece_completion.go#L75 and https://github.com/ledgerwatch/erigon/blob/1879c764c1dd0be566f1b42371d1fbe14e3f2a00/erigon-lib/downloader/downloader.go#L2106 and https://github.com/ledgerwatch/erigon/blob/1879c764c1dd0be566f1b42371d1fbe14e3f2a00/erigon-lib/downloader/downloader.go#L1161
4.2 fsync on files. It's here https://github.com/anacrolix/torrent/blob/59ec9d6dd211210d30edc31e3bb1ca2b8785fb50/torrent.go#L2142 and here is the Flush
method of mmap
backend (we using it now): https://github.com/anacrolix/torrent/blob/59ec9d6dd211210d30edc31e3bb1ca2b8785fb50/storage/mmap.go#L75 aaand i don't see Flush
method in non-mmap backend
: https://github.com/anacrolix/torrent/blob/59ec9d6dd211210d30edc31e3bb1ca2b8785fb50/storage/file.go#L109
4.3. Check hash once must be enough - but maybe we do "less then once" because of ts.DisableInitialPieceCheck = true
This branch contains an update to snapshot-json.lock to deal with e3
baseline
downloads.A baseline is an initial download which is then incremented by the local node.
Status:
Initially this PR is just an update to the downloader readme to describe the intended functionality.
Open Questions:
Also what is the implication for the network for multiple downloads of large files with relatively small deltas