Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: extract state sync logic from Chain #12833

Merged
merged 3 commits into from
Jan 31, 2025

Conversation

Longarithm
Copy link
Member

@Longarithm Longarithm commented Jan 29, 2025

Main result: number of LoC in chain/chain/src/chain.rs goes from 4690 to 4125.

Please let me know if you want me to split this in multiple PRs. I can do this but I'm very tempted to just go forward because there are no functional changes...

Biggest part (568+ lines)

It is moving state sync-related methods from chain/chain/src/chain.rs to chain/chain/src/state_sync/adapter.rs. This is achieved by creating ChainStateSyncAdapter which needs access to much less data than Chain.

Another change (156+ lines)

To achieve the previous one, I wanted to read chain data from ChainStoreAdapter, not ChainStore. We are currently refactoring ChainStore and I didn't want to create new instances of it. We can allow read-only access to data which already exist in storage.

For this, in turn, I needed to make get_chunk_clone_from_header, get_block_header_on_chain_by_height and get_incoming_receipts_for_shard require only ChainStoreAdapter access. I was able to move these to chain/chain/src/store/utils.rs without changing logic.

Another bit of context: these methods previously required ChainStoreAccess in order to be queried by ChainStoreUpdate. But as @shreyan-gupta discovered, we don't actually need that, at least in almost all places of the code. So it was natural to me to simplify access to these functions already.

Bonus

It was very easy to move get_state_sync_info to ShardTracker and StateRequestTracker to new state sync folder - a nice example of hiding implementation details.

@Longarithm Longarithm changed the title wip: chain state sync adapter wip: extract state sync logic from Chain Jan 29, 2025
@Longarithm Longarithm changed the title wip: extract state sync logic from Chain refactor: extract state sync logic from Chain Jan 29, 2025
@Longarithm Longarithm marked this pull request as ready for review January 29, 2025 20:27
@Longarithm Longarithm requested a review from a team as a code owner January 29, 2025 20:27
Copy link

codecov bot commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 77.99043% with 138 lines in your changes missing coverage. Please review.

Project coverage is 70.38%. Comparing base (94abfca) to head (b694808).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
chain/chain/src/state_sync/adapter.rs 75.56% 55 Missing and 43 partials ⚠️
chain/chain/src/store/utils.rs 82.29% 12 Missing and 5 partials ⚠️
chain/epoch-manager/src/shard_tracker.rs 88.23% 0 Missing and 6 partials ⚠️
tools/state-viewer/src/state_parts.rs 0.00% 5 Missing ⚠️
chain/chain/src/chain.rs 88.46% 0 Missing and 3 partials ⚠️
chain/chain/src/store/mod.rs 0.00% 3 Missing ⚠️
tools/replay-archive/src/cli.rs 0.00% 3 Missing ⚠️
chain/chain/src/chain_update.rs 91.66% 0 Missing and 1 partial ⚠️
...client/src/stateless_validation/shadow_validate.rs 0.00% 1 Missing ⚠️
nearcore/src/state_sync.rs 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12833      +/-   ##
==========================================
+ Coverage   70.37%   70.38%   +0.01%     
==========================================
  Files         848      850       +2     
  Lines      174870   174908      +38     
  Branches   174870   174908      +38     
==========================================
+ Hits       123064   123113      +49     
+ Misses      46547    46540       -7     
+ Partials     5259     5255       -4     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.41% <0.00%> (-0.01%) ⬇️
linux 69.98% <77.99%> (+<0.01%) ⬆️
linux-nightly 70.02% <77.99%> (+<0.01%) ⬆️
pytests 1.70% <0.00%> (-0.01%) ⬇️
sanity-checks 1.52% <0.00%> (-0.01%) ⬇️
unittests 70.22% <77.99%> (+0.01%) ⬆️
upgradability 0.20% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

.chain_store_update
.get_block_header_on_chain_by_height(&sync_hash, chunk.height_included())?;
let block_header = get_block_header_on_chain_by_height(
&self.chain_store_update.chain_store(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part a no-op? So before we were calling the ChainStoreAccess trait method on the ChainStoreUpdate<'a>, and it looks like that one checks the in memory stuff before going to the disk with . chain_store, but after this change we will skip any of the in-memory changes. Maybe it's fine if I'm not really sure if this function won't ever be called with relevant uncommitted changes, not sure. But if you meant for this part to be a no-op, maybe you could just have get_block_header_on_chain_by_height() take a T: ChainStoreAccess?

Copy link
Member Author

@Longarithm Longarithm Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I have short comment below about that, but it seems better to copy-paste it here as well :)
So, this method uses only block headers, and 1) they are already synced 2) save_block_header is never called here, so this is a no-op.
As for ChainStoreAccess - this is the part of some larger refactoring:

The existing design is to abstract away ChainStoreAccess to handle both ChainStore and ChainStoreUpdate.
While idea makes sense, in reality we don't seem to get much from it. The hypothesis is that originally ChainStoreUpdate was made to commit many blocks to access temporary updates. But in reality, we almost always commit one block at a time, and we can fix it if we don't. So all the complexity ChainStoreUpdate has can likely be removed. I learned yesterday that @shreyan-gupta is testing this idea in #12162.

That's why I lean towards reducing usages to ChainStoreAccess :) If you want, I can make this a separate PR, because that's indeed a serious change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also say that the "set_state_finalize" moves forward from current height to sync_hash while get_block_header_on_chain_by_height moves backwards, starting from sync_hash, so there is no chance to reuse saved data.

self.chain_store_update.get_block_header_on_chain_by_height(&sync_hash, height);
// Note that block headers are already synced and can be taken
// from store on disk.
let block_header_result = get_block_header_on_chain_by_height(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above for this one

Copy link
Contributor

@shreyan-gupta shreyan-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the change!

@Longarithm Longarithm added this pull request to the merge queue Jan 31, 2025
Merged via the queue into near:master with commit f24decd Jan 31, 2025
28 of 29 checks passed
@Longarithm Longarithm deleted the sync-adapter branch January 31, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants