Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Warp sync from non genesis #35

Open
bkchr opened this issue Sep 13, 2022 · 13 comments
Open

Support Warp sync from non genesis #35

bkchr opened this issue Sep 13, 2022 · 13 comments
Labels
I5-enhancement An additional feature request.

Comments

@bkchr
Copy link
Member

bkchr commented Sep 13, 2022

Currently Warp sync always starts from genesis. We should make it work from the latest available finalized block. This should make warp syncing work after a restart.

We should add multiple zombienet tests that abort the execution of the syncing node in different stages to ensure that we can always continue to sync after the next restart.

@arkpar
Copy link
Member

arkpar commented Sep 13, 2022

The challenge here is that we'd need to delete the existing state from the database somehow.

@bkchr
Copy link
Member Author

bkchr commented Sep 13, 2022

The challenge here is that we'd need to delete the existing state from the database somehow.

Isn't this just the same as pruning?

@arkpar
Copy link
Member

arkpar commented Sep 13, 2022

Each block execution produces a set of trie nodes that must be inserted/deleted into the state db to reflects insertions/deletions into the trie. Pruning simply delays deletions till some later point in time by keeping temporary journals of what has been deleted. Warping to a new state means there's no delta update to the trie, but instead the old trie must be deleted and the new one is populated from the warp snapshot.

Deleting the whole trie may be done either by enumerating the trie to get all the nodes (slow) or by clearing the STATE column in the DB. However with the latter approach if sync is interrupted after the old state has been deleted, but the new is not imported yet, you'll end up with no state at all.

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/community-bootnodes-and-snapshot-providers/630/1

@melekes
Copy link
Contributor

melekes commented Jun 13, 2023

Deleting the whole trie may be done either by enumerating the trie to get all the nodes (slow) or by clearing the STATE column in the DB. However with the latter approach if sync is interrupted after the old state has been deleted, but the new is not imported yet, you'll end up with no state at all.

But the latter approach is still "better" because it's more or less atomic, no? Whereas the former approach (enumerating the trie) leaves the DB in a corrupted state.

@bkchr
Copy link
Member Author

bkchr commented Jun 13, 2023

The best would probably be to clear and write the new data to the state column in one go. This would be some kind of "overwrite transaction" to the db.

@arkpar
Copy link
Member

arkpar commented Jun 13, 2023

But the latter approach is still "better" because it's more or less atomic, no? Whereas the former approach (enumerating the trie) leaves the DB in a corrupted state.

Well, you can warp sync and insert the new state first and then enumerate-delete the old one. This way even if deletion is interrupted you may end up with junk in the DB, but it will still be working.

The best would probably be to clear and write the new data to the state column in one go. This would be some kind of "overwrite transaction" to the db.

This is indeed the most simple way as long as everyting fits in memory. However if we want incremental writes to the DB while state sync is in progress it becomes more complicated. Theoretically sync should insert into temp columns. Replace the state and state_meta columns once it is done.

It makes more sense to sync to a new DB rather than fiddle with columns. Block pruning logic, offline storage, etc might all leave junk when block suddenly jumps to e.g. +1M. Also polkadot storage needs to be able to handle this.

@bkchr
Copy link
Member Author

bkchr commented Jun 13, 2023

Also polkadot storage needs to be able to handle this.

What you mean by Polkadot storage?

@arkpar
Copy link
Member

arkpar commented Jun 13, 2023

parachains db, av-store, etc. For example, I'm not sure if av-store pruning will handle the case when block number suddenly changes from 1M to 2M.

Basically, all the subsystems that rely on finality notifications being sent for at least once for every few hundlred blocks, need to be checked if they clean up correctly when suddenly a million blocks are finalized.

@bkchr
Copy link
Member Author

bkchr commented Jun 13, 2023

Warping to a new state means there's no delta update to the trie,

While thinking about this again. When we warp sync to block X, we will finalize this block. When we finalize this block the db logic should start to prune all the old blocks and thus we should generate the required delta updates for the trie? Other logic in the node should see the finality notification and then be able to react and do its own pruning?

For sure all of this isn't super performant and we could may include some information that we skipped a lot of blocks.

@arkpar
Copy link
Member

arkpar commented Jun 14, 2023

I'm not sure what you mean. Pruning finalized state X simply means that trie nodes that were removed in X as part of the state transition, are now actually removed from the database. This won't help cleanup the existing trie nodes. I.e. supposed there's a trie node that is inserted at block 10 and removed in block 30. The client is at block 20 and warps to 40. The trie node will not be removed from the database because the "delta" that removed it was generated during execution of block 30 and the client skipped it when doing warp sync. The only way to get to this node now (and remove it) is to take the state root for block 10 and iterate the trie.

Other logic in the node should see the finality notification and then be able to react and do its own pruning?

Probably. I'm just syaing that other modules might not not expecting to cleanup after finalizing huge chains. We might see things like querying long tree routes, or loading a lot of stuff into memory again. It makes more sense for them either start with a fresh db after a warp sync, or handle some kind of special signal that tells that ALL data must be cleared.

@bkchr
Copy link
Member Author

bkchr commented Jun 30, 2023

Yeah sorry, you are right and I had mixed up some stuff.

@the-right-joyce the-right-joyce transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I5-enhancement An additional feature request. and removed J0-enhancement labels Aug 25, 2023
kostekIV added a commit to Cardinal-Cryptography/polkadot-sdk that referenced this issue Nov 8, 2023
…ytech#35)

* rm force_delayed_canonicalize fn

* rm integrity check
kostekIV added a commit to Cardinal-Cryptography/polkadot-sdk that referenced this issue Nov 20, 2023
…ytech#35)

* rm force_delayed_canonicalize fn

* rm integrity check
@bkchr
Copy link
Member Author

bkchr commented Dec 18, 2023

When this is implemented, we should think about what to do with old justifications/headers of era changes: #2710 (comment)

lesniak43 pushed a commit to Cardinal-Cryptography/polkadot-sdk that referenced this issue Feb 9, 2024
…ytech#35)

* rm force_delayed_canonicalize fn

* rm integrity check
smohan-dw referenced this issue in dhiway/substrate-sdk Mar 2, 2024
* rm force_delayed_canonicalize fn

* rm integrity check
lesniak43 pushed a commit to Cardinal-Cryptography/polkadot-sdk that referenced this issue Mar 11, 2024
…ytech#35)

* rm force_delayed_canonicalize fn

* rm integrity check
lesniak43 pushed a commit to Cardinal-Cryptography/polkadot-sdk that referenced this issue Apr 8, 2024
…ytech#35)

* rm force_delayed_canonicalize fn

* rm integrity check
lesniak43 pushed a commit to Cardinal-Cryptography/polkadot-sdk that referenced this issue Apr 22, 2024
…ytech#35)

* rm force_delayed_canonicalize fn

* rm integrity check
lesniak43 pushed a commit to Cardinal-Cryptography/polkadot-sdk that referenced this issue May 24, 2024
…ytech#35)

* rm force_delayed_canonicalize fn

* rm integrity check
liuchengxu added a commit to subcoin-project/polkadot-sdk that referenced this issue Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request.
Projects
Status: Backlog 🗒
Status: backlog
Development

No branches or pull requests

6 participants