-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that state snapshot works for new shards #11600
Labels
Comments
Longarithm
added
A-storage
Area: storage and databases
A-resharding
Area: State resharding
labels
Jun 17, 2024
github-merge-queue bot
pushed a commit
that referenced
this issue
Jun 18, 2024
Solution to #11583. The current logic to update flat storage for shard doesn't work for memtrie loading in some rare case. If shard doesn't contain active validators and didn't get any tx and receipts, the block for flat storage head will get GC-d, and attempt to read state root to assert with it will naturally panic. This happens because we have non-strict mode, which itself is used to make `StateSnapshot` work. But essentially it is enough to **not** move flat storage head past `epoch_last_block.chunks(shard_id).prev_block_hash()`. The flat storage state **past** this block exactly corresponds to the state we are syncing, see also #11600. So, this is exactly the new flat head candidate we compute and pass to `update_flat_head`. Passing tests show that non-strict mode is not needed. After that, we will have GC problem if there are no chunks for shard or no finality in the stored epochs, which is the assumption we make during development anyway. Nayduck will be at https://nayduck.nearone.org/#/run/149 ## Practical example One edge case when state snapshot will still work is when client just processed **second** block in an epoch. Then last final block will be not earlier last block in prev epoch; then new flat head will be not earlier than prev_block_hash for last chunk for our shard in it. Then state snapshot still works. For old implementation, this was guaranteed because while we pass last final block, we made _two steps back by non-empty state transitions_. First jump guarantees to skip last block **because it may contains validator updates**, the second jump guarantees to skip last chunk. So guarantees are the same. ## test_load_memtrie_after_empty_chunks * Add GCActor to the TestLoop. It clears blocks in background and doesn't need external control. * Ensure that shard 0 doesn't have validators and empty chunks for a long time. * Unload memtrie for shard 0 and load it back. I checked that in non-strict mode, as before the fix, it panics. * Additionally, check that if 2 chunks in the end of epoch are always missing, and we always move flat head to the final known block, then snapshotting always fails - so accounting for the latest chunk is actually needed!
shreyan-gupta
pushed a commit
that referenced
this issue
Jun 18, 2024
Solution to #11583. The current logic to update flat storage for shard doesn't work for memtrie loading in some rare case. If shard doesn't contain active validators and didn't get any tx and receipts, the block for flat storage head will get GC-d, and attempt to read state root to assert with it will naturally panic. This happens because we have non-strict mode, which itself is used to make `StateSnapshot` work. But essentially it is enough to **not** move flat storage head past `epoch_last_block.chunks(shard_id).prev_block_hash()`. The flat storage state **past** this block exactly corresponds to the state we are syncing, see also #11600. So, this is exactly the new flat head candidate we compute and pass to `update_flat_head`. Passing tests show that non-strict mode is not needed. After that, we will have GC problem if there are no chunks for shard or no finality in the stored epochs, which is the assumption we make during development anyway. Nayduck will be at https://nayduck.nearone.org/#/run/149 One edge case when state snapshot will still work is when client just processed **second** block in an epoch. Then last final block will be not earlier last block in prev epoch; then new flat head will be not earlier than prev_block_hash for last chunk for our shard in it. Then state snapshot still works. For old implementation, this was guaranteed because while we pass last final block, we made _two steps back by non-empty state transitions_. First jump guarantees to skip last block **because it may contains validator updates**, the second jump guarantees to skip last chunk. So guarantees are the same. * Add GCActor to the TestLoop. It clears blocks in background and doesn't need external control. * Ensure that shard 0 doesn't have validators and empty chunks for a long time. * Unload memtrie for shard 0 and load it back. I checked that in non-strict mode, as before the fix, it panics. * Additionally, check that if 2 chunks in the end of epoch are always missing, and we always move flat head to the final known block, then snapshotting always fails - so accounting for the latest chunk is actually needed!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I suspect we want to support state snapshots for shards which were freshly created. However, in this line
nearcore/core/store/src/trie/state_snapshot.rs
Line 100 in 61c67c6
cc @wacban
The text was updated successfully, but these errors were encountered: