Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that state snapshot works for new shards #11600

Open
Longarithm opened this issue Jun 17, 2024 · 0 comments
Open

Ensure that state snapshot works for new shards #11600

Longarithm opened this issue Jun 17, 2024 · 0 comments
Labels
A-resharding Area: State resharding A-storage Area: storage and databases

Comments

@Longarithm
Copy link
Member

I suspect we want to support state snapshots for shards which were freshly created. However, in this line

if let Some(chunk) = block.chunks().get(shard_uid.shard_id as usize) {
we assume that shard uids correspond to the previous epoch only. We may want to support new shard layout as well.

cc @wacban

@Longarithm Longarithm added A-storage Area: storage and databases A-resharding Area: State resharding labels Jun 17, 2024
github-merge-queue bot pushed a commit that referenced this issue Jun 18, 2024
Solution to #11583.

The current logic to update flat storage for shard doesn't work for
memtrie loading in some rare case. If shard doesn't contain active
validators and didn't get any tx and receipts, the block for flat
storage head will get GC-d, and attempt to read state root to assert
with it will naturally panic.

This happens because we have non-strict mode, which itself is used to
make `StateSnapshot` work.

But essentially it is enough to **not** move flat storage head past
`epoch_last_block.chunks(shard_id).prev_block_hash()`. The flat storage
state **past** this block exactly corresponds to the state we are
syncing, see also #11600. So, this is exactly the new flat head
candidate we compute and pass to `update_flat_head`. Passing tests show
that non-strict mode is not needed.

After that, we will have GC problem if there are no chunks for shard or
no finality in the stored epochs, which is the assumption we make during
development anyway.

Nayduck will be at https://nayduck.nearone.org/#/run/149

## Practical example

One edge case when state snapshot will still work is when client just
processed **second** block in an epoch. Then last final block will be
not earlier last block in prev epoch; then new flat head will be not
earlier than prev_block_hash for last chunk for our shard in it. Then
state snapshot still works.

For old implementation, this was guaranteed because while we pass last
final block, we made _two steps back by non-empty state transitions_.
First jump guarantees to skip last block **because it may contains
validator updates**, the second jump guarantees to skip last chunk. So
guarantees are the same.

## test_load_memtrie_after_empty_chunks

* Add GCActor to the TestLoop. It clears blocks in background and
doesn't need external control.
* Ensure that shard 0 doesn't have validators and empty chunks for a
long time.
* Unload memtrie for shard 0 and load it back. I checked that in
non-strict mode, as before the fix, it panics.
* Additionally, check that if 2 chunks in the end of epoch are always
missing, and we always move flat head to the final known block, then
snapshotting always fails - so accounting for the latest chunk is
actually needed!
shreyan-gupta pushed a commit that referenced this issue Jun 18, 2024
Solution to #11583.

The current logic to update flat storage for shard doesn't work for
memtrie loading in some rare case. If shard doesn't contain active
validators and didn't get any tx and receipts, the block for flat
storage head will get GC-d, and attempt to read state root to assert
with it will naturally panic.

This happens because we have non-strict mode, which itself is used to
make `StateSnapshot` work.

But essentially it is enough to **not** move flat storage head past
`epoch_last_block.chunks(shard_id).prev_block_hash()`. The flat storage
state **past** this block exactly corresponds to the state we are
syncing, see also #11600. So, this is exactly the new flat head
candidate we compute and pass to `update_flat_head`. Passing tests show
that non-strict mode is not needed.

After that, we will have GC problem if there are no chunks for shard or
no finality in the stored epochs, which is the assumption we make during
development anyway.

Nayduck will be at https://nayduck.nearone.org/#/run/149

One edge case when state snapshot will still work is when client just
processed **second** block in an epoch. Then last final block will be
not earlier last block in prev epoch; then new flat head will be not
earlier than prev_block_hash for last chunk for our shard in it. Then
state snapshot still works.

For old implementation, this was guaranteed because while we pass last
final block, we made _two steps back by non-empty state transitions_.
First jump guarantees to skip last block **because it may contains
validator updates**, the second jump guarantees to skip last chunk. So
guarantees are the same.

* Add GCActor to the TestLoop. It clears blocks in background and
doesn't need external control.
* Ensure that shard 0 doesn't have validators and empty chunks for a
long time.
* Unload memtrie for shard 0 and load it back. I checked that in
non-strict mode, as before the fix, it panics.
* Additionally, check that if 2 chunks in the end of epoch are always
missing, and we always move flat head to the final known block, then
snapshotting always fails - so accounting for the latest chunk is
actually needed!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-resharding Area: State resharding A-storage Area: storage and databases
Projects
None yet
Development

No branches or pull requests

1 participant