Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Snapshot Baselines #9742

Closed
wants to merge 1 commit into from
Closed

[wip] Snapshot Baselines #9742

wants to merge 1 commit into from

Conversation

mh0lt
Copy link
Contributor

@mh0lt mh0lt commented Mar 18, 2024

This branch contains an update to snapshot-json.lock to deal with e3 baseline downloads.

A baseline is an initial download which is then incremented by the local node.

Status:
Initially this PR is just an update to the downloader readme to describe the intended functionality.

Open Questions:

  • Once a baseline is changed locally - when is it republished, how do other nodes find out about it
  • When a baseline update is seen by the node what does it do:
    • Ignore it ?
    • Check against its own hashes ?
    • Update its own data (if its stall at the baseline for example)

Also what is the implication for the network for multiple downloads of large files with relatively small deltas

@AskAlexSharov
Copy link
Collaborator

AskAlexSharov commented Mar 18, 2024

Context:

  1. E3 has 3 type of files snapshots/history, snapshots/idx are similar to E2 (it's just some piece of history, and it's merge stops at some point). But snapshots/domain it's very different thing - it's stores "latests state" (like PlainState table in E2). And it never-ending merging.
  2. E3 does shard files by txNum instead of blockNum. Even more - there is new thing step - const HistoryV3AggregationStep = 1_562_500 and files are sharded by stepNum.
  3. 1 file can have many steps
  4. if file is big-enough it becomes Seedable - const Erigon3SeedableSteps = 64

Example:

  • snapshots/domain/v1-accounts.0-32.kv - Not Seedable
  • snapshots/domain/v1-accounts.0-64.kv - Seedable
  • snapshots/domain/v1-accounts.0-96.kv - Not Seedable
  • snapshots/domain/v1-accounts.0-128.kv - Seedable
  1. v1-accounts.0-128.kv stores "latest state at end of step 127" (all snapshots have semantic [from, to) ). It means this file stores "latest state" (like PlainState table in E2) at TxNum=128*HistoryV3AggregationStep.
    Merge example: v1-accounts.0-64.kv + v1-accounts.64-128.kv = v1-accounts.0-128.kv
    .kv files inside are like map: key -> value - nothing fancy.
    v1-accounts.64-128.kv - store all updates of latest state which happened at [64, 128) steps - but only latests value - no history in .kv files (history is in history/*.v files).

  2. Reasoning of never-ending-merge: GetLatests(key) operation needs to be fast and reduce amount of files from N to log(N) may help. Also if stop never-ending-merge then files will store same keys (popular accounts always get updated) and disk-size will grow. Size of .kv files is important - because it's hot data (from block-execution point of view).

Answer questions:

  • When a baseline update is seen by the node what does it do: E3 node similar to E2 node - downloading files only at first start and then only produce/seed. So, if new (generated/merged) file is Seedable just create .torrent file for it and seed it (remove smaller files). after v1-accounts.0-128.kv node will create v1-accounts.128-129.kv
  • Also what is the implication for the network for multiple downloads of large files with relatively small deltas - smallest Seedable file is Erigon3SeedableSteps*HistoryV3AggregationStep = 64 * 1_562_500 = 100M transactions - means delta of latest state is very big.

also can see https://github.com/ledgerwatch/erigon/blob/e35/cmd/downloader/readme.md#e3-datadir-structure


Fixed downloads are expected to be immutable which means that the file contents and thier associated hash are expected not to change. On order to prevent disruption on the downloading node the hash of a fixed download is read from the pre-verfied hashes in `chains.toml` when a process is first started and written to the downloads section of `snapshot-lock`.

After this the process will attempt to download the hashed file from the torrent and once it is downloaded will re-check the file contents for consistency on each process restart. If the file is inconsistent with the `snapshot-lock` hash it will be re-downloaded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. will re-check the file contents for consistency on each process restart - which file? if data-file - why/how need check on each restart?

  2. If the file is inconsistent with the snapshot-lock hash it will be re-downloaded - in another words "snapshot-lock is new source of truth" - what is wrong with current source of truth? what problem you solving?

  3. Can solve points 1 and 2 by removing ts.DisableInitialPieceCheck = true line: https://github.com/ledgerwatch/erigon/blob/devel/erigon-lib/downloader/util.go#L300 ?

  4. preserve read performance via the file's page-cache - not only because of this. also because re-read 10Tb of data on every startup it's "too paranoid" mode. to ensure consistency we have next toolbox:

4.1. ACID-db: has some tricks here https://github.com/ledgerwatch/erigon/blob/1879c764c1dd0be566f1b42371d1fbe14e3f2a00/erigon-lib/downloader/mdbx_piece_completion.go#L75 and https://github.com/ledgerwatch/erigon/blob/1879c764c1dd0be566f1b42371d1fbe14e3f2a00/erigon-lib/downloader/downloader.go#L2106 and https://github.com/ledgerwatch/erigon/blob/1879c764c1dd0be566f1b42371d1fbe14e3f2a00/erigon-lib/downloader/downloader.go#L1161

4.2 fsync on files. It's here https://github.com/anacrolix/torrent/blob/59ec9d6dd211210d30edc31e3bb1ca2b8785fb50/torrent.go#L2142 and here is the Flush method of mmap backend (we using it now): https://github.com/anacrolix/torrent/blob/59ec9d6dd211210d30edc31e3bb1ca2b8785fb50/storage/mmap.go#L75 aaand i don't see Flush method in non-mmap backend: https://github.com/anacrolix/torrent/blob/59ec9d6dd211210d30edc31e3bb1ca2b8785fb50/storage/file.go#L109

4.3. Check hash once must be enough - but maybe we do "less then once" because of ts.DisableInitialPieceCheck = true

@AskAlexSharov AskAlexSharov changed the title Snapshot Baselines [wip] Snapshot Baselines Apr 15, 2024
@mh0lt mh0lt closed this Apr 27, 2024
@AskAlexSharov AskAlexSharov deleted the snaplock_with_baselines branch July 5, 2024 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants