Split state on parts by SmaGMan · Pull Request #983 · broxus/tycho

SmaGMan · 2025-12-18T18:03:03Z

RATIONALE

Support local state data sharding on parts - shard accounts cells tree is split into parts by shards at the configured split_depth. The top of the state tree is stored in the main cells db, and parts subtrees are stored each in separate physical databases that can be placed on separate disks.

Pull Request Checklist

NODE CONFIGURATION MODEL CHANGES

[Yes]

Added core_storage.state_parts

...
"core_storage": {
  ...
  "state_parts": {
    "split_depth": N, // depth of the state split on 2^N parts
    "part_dirs" : {
      // key - hex representation of part shard prefix; value - path to the part database
      "a000000000000000": "path/to/cells-part-a000000000000000",
      ...
    }
  }
  ...
}
...

when split_depth: 0 - no parts used;
we can set a custom database path, even only for one part, so we can move only one database to a separate disk if required;
if the path to part database is not specified, then the relative path used "cells-parts/cells-part-{shard prefix hex}".

Default value is state_parts: null that means no parts configured.

BLOCKCHAIN CONFIGURATION MODEL CHANGES

[None]

COMPATIBILITY

Affected features:

[State]
[Persistent State]
[Storage. Blocks]
[Storage. States]
[Rpc]

Fully compatible.

State will be saved with parts if they are specified in config. If state was saved with parts it will be read with parts. If it was saved without parts (e.g. before update) it will be read without.

Parts map {key - cell hash: value - shard prefix} will be saved to CellsDB.shard_states right after root cell hash. If no parts used nothing will be added to ShardStates value. So existing values in ShardStates table will be treated as "no parts used".

A new flag HAS_STATE_PARTS = 1 << 13 added to the BlockHandle bit flags. It means that no parts were used / or all required state parts were successfully stored in separate storages. Now BlockHandle.has_state() returns true only when both new flag and old one HAS_STATE_MAIN = 1 << 3 are set.
The migration script (0.0.4 -> 0.0.5) sets HAS_STATE_PARTS for all existing block handles.

One more new flag HAS_PERSISTENT_SHARD_STATE_PARTS = 1 << 14 added. It means that when persistent stored, no parts were used / or all required persistent parts were successfully stored. Now BlockHandle.has_persistent_shard_state() returns true only when both new flag and old one HAS_PERSISTENT_SHARD_STATE_MAIN= 1 << 4 are set.
The migration script (0.0.4 -> 0.0.5) also sets HAS_PERSISTENT_SHARD_STATE_MAIN for existing block handles which have HAS_PERSISTENT_SHARD_STATE_MAIN flag.

BUT parts configuration changes (e.g. from 4 to 8 partitions, or from 8 to 2 or 0) are not auto compatible. Will be implemented in a separate task.

Persistent state files will be split on main file and parts if persistent was stored when state split on parts configured. E.g.:

main file: {block_id}.boc
parts files: {block_id}_part_{shard prefix hex}.boc

When we store persistent main file we take parts roots, then get their children and make absent cells from them. This way we preserv shard state tree root_hash unchanged and consistent but do not store parts subtrees in main file, only their roots. This way we store part subtree root cell both in main file and in the part.

We additionally store metadata file {block_id}.meta.json.

{
  "part_split_depth": 2
}

It stores the split depth used to store persistent parts. It used to preload persistent states to descriptor cache for rpc server.

When node find persistent state it get part_split_depth value. Then node stores state from main file. If part_split_depth > 0 then it calculates parts info (hash - prefix) and request parts using prefixes. Then it stores parts and check that their root hashes match with calculated parts info.

Previous persistent state file format is fully backward compatible.

Manual compatibility tests were passed:

set core_storage.state_parts = null or remove param from config
build last master version
gen local network

just gen_network 1 --force

run node

 just node 1

run 20k transfers test

 ./transfers-20k.sh

stop node
build feat/split-state-storage-persistent barch version
run node without reset

 just node 1

see successful core db migration to 0.0.5 in logs
continue 20k transfers test, ensure all is going well

 ./transfers-20k.sh --continue

stop node
set up 4 parts in .temp/config1.json

...
  "state_parts" : {
    "split_depth": 2
  }
...

run node without reset

 just node 1

continue 20k transfers test, ensure all is going well

 ./transfers-20k.sh --continue

stop node
move some parts databases

 mkdir .temp/db1/cells-parts-moved
 mv .temp/db1/cells-parts/cells-part-a000000000000000 .temp/db1/cells-parts-moved/
 mv .temp/db1/cells-parts/cells-part-6000000000000000 .temp/db1/cells-parts-moved/

set up paths to moved databases in .temp/config1.json

...
  "state_parts" : {
    "split_depth": 2,
	"part_dirs": {
	  "a000000000000000": "/workspace/tycho/.temp/db1/cells-parts-moved/cells-part-a000000000000000",
      "6000000000000000": "cells-parts-moved/cells-part-6000000000000000"
	}
  }
...

run node without reset

 just node 1

continue 20k transfers test, ensure all is going well

 ./transfers-20k.sh --continue

Manual persistent state test 1:

set 4 parts in the node config.json

...
  "state_parts" : {
    "split_depth": 2
  }
...

gen local network of 4 nodes

just gen_network 4 --force

run 3 nodes with hack when every key block is persistent

RUSTFLAGS="--cfg tycho_unstable" HACK_EACH_KEY_BLOCK_IS_PERSISTENT=1 just node 1

RUSTFLAGS="--cfg tycho_unstable" HACK_EACH_KEY_BLOCK_IS_PERSISTENT=1 just node 2

RUSTFLAGS="--cfg tycho_unstable" HACK_EACH_KEY_BLOCK_IS_PERSISTENT=1 just node 3

run 20k transfers test to deploy 20k wallets

 ./transfers-20k.sh

stop transfers and change bc config params to force a key block

target/debug/tycho tool bc set-param --rpc http://localhost:8001 --key <key> \
22 \
'{
  "bytes": {
    "underload": 1000,
    "soft_limit": 5000,
    "hard_limit": 6000
  },
  "gas": {
    "underload": 900000,
    "soft_limit": 15000000,
    "hard_limit": 20000000
  },
  "lt_delta": {
    "underload": 1000,
    "soft_limit": 10000,
    "hard_limit": 20000
  }
}'

check that persistent state was created with parts

ls -lah .temp/db1/files/states

run node 4 without zerostate (make a separate run-node4.sh script where --import-zerostate arg is omitted)

RUSTFLAGS="--cfg tycho_unstable" HACK_EACH_KEY_BLOCK_IS_PERSISTENT=1 ./scripts/run-node4.sh

check that node 4 downloaded zerostate and persistent state from other nodes, successfully stored them and joined further blocks collation

If nodes 1-3 where configured without state split on parts and persistent state was created without parts it will be correctly downloaded and stored on the node 4 (that configured with 4 parts). We will have a single persistent state file but state will be split stored into 4 different storages.

SPECIAL DEPLOYMENT ACTIONS

[Not Required]

Without additional changes in the node config it works with a single part without split.

PERFORMANCE IMPACT

[Expected impact]

Better perfomance of non-zero states (~20-30%)
- master: degradation on 20k transfers from empty to 30kk state: from ~35k tps to ~15-20k tps

4 local parts: degradation on 20k transfers from empty to 30kk state: from ~35k tps to ~20-30k tps

No states GC lag growth
Faster state store

TESTS

Unit Tests

test_preload_persistent_states
test_store_shard_state_from_file

Network Tests

[No coverage]

Manual Tests

Performance testing:

20k transfers
30k transfers
deploy 30kk accounts
20k transfers
30k transfers

(metrics are in the PERFORMANCE IMPACT block)

github-actions · 2025-12-18T18:03:18Z

🧪 Network Tests

To run network tests for this PR, use:

gh workflow run network-tests.yml -f pr_number=983

Available test options:

Run all tests: gh workflow run network-tests.yml -f pr_number=983
Run specific test: gh workflow run network-tests.yml -f pr_number=983 -f test_selection=ping-pong

Test types: destroyable, ping-pong, one-to-many-internal-messages, fq-deploy, nft-index, persistent-sync

Results will be posted as workflow runs in the Actions tab.

github-actions · 2025-12-18T18:04:23Z

❌ Python formatting check failed in CI.

Please run just fmt_py locally and push the updated files.

codecov · 2025-12-18T18:07:45Z

Codecov Report

❌ Patch coverage is 59.75568% with 1186 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.69%. Comparing base (a96d518) to head (5d89709).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
core/src/storage/shard_state/mod.rs	51.00%	276 Missing and 16 partials ⚠️
...storage/persistent_state/parts/impls/local_impl.rs	35.13%	140 Missing and 4 partials ⚠️
core/src/storage/persistent_state/mod.rs	63.08%	99 Missing and 28 partials ⚠️
...e/src/storage/persistent_state/descriptor_cache.rs	57.19%	105 Missing and 14 partials ⚠️
core/src/storage/shard_state/cell_storage.rs	59.22%	73 Missing and 11 partials ⚠️
core/src/storage/db/migrations.rs	0.00%	60 Missing ⚠️
core/src/storage/persistent_state/tests.rs	84.22%	7 Missing and 46 partials ⚠️
core/src/block_strider/starter/cold_boot.rs	0.00%	46 Missing ⚠️
...src/storage/persistent_state/shard_state/writer.rs	71.23%	35 Missing and 7 partials ⚠️
core/src/storage/shard_state/store_state_raw.rs	68.00%	30 Missing and 10 partials ⚠️
... and 21 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #983      +/-   ##
==========================================
+ Coverage   54.40%   54.69%   +0.29%     
==========================================
  Files         402      406       +4     
  Lines       67237    69440    +2203     
  Branches    67237    69440    +2203     
==========================================
+ Hits        36582    37982    +1400     
- Misses      28846    29549     +703     
- Partials     1809     1909     +100

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- add HAS_STATE_PARTS flag to existing blocks in the `block_handles`

+ do not double store partition subtree root cell in the main storage

…artitions

…d check downloaded

…state

…rts populated

…t parts

…split state on parts

…AS_STATE_MAIN exists, add HAS_PERSISTENT_SHARD_STATE_PARTS when HAS_PERSISTENT_SHARD_STATE_MAIN exists

core/src/storage/persistent_state/shard_state/writer.rs

                            let index = indices_buffer[i];
-                            let hash = unsafe { *keys[i].cast::<[u8; 32]>() };
-                            stack.push((index, StackItem::New(hash)));
+                            let child_hash = unsafe { *keys[i].cast::<[u8; 32]>() };


General fix: Avoid storing raw pointers that may be null or otherwise invalid and later dereferencing them with unsafe. Instead, store the data itself (here, the 32‑byte hash) or a safe reference whose lifetime is clearly valid, and let Rust’s type system enforce safety.

Concrete approach here:

Replace keys from [*const u8; 4] with an array of owned hashes: [[u8; 32]; 4].

When iterating for hash in &references_buffer, copy *hash directly into keys[preload_count] instead of storing hash.as_ptr().

Later, when preloading, use let child_hash = keys[i]; instead of dereferencing a raw pointer.

This removes the need for unsafe in this block and makes keys always contain valid, fully initialized values for indices < preload_count.

Changes are all local to core/src/storage/persistent_state/shard_state/writer.rs in the shown region around lines 465 and 511. No new methods or imports are required; HashBytes is already known to be a [u8; 32]-like type, and copying it by value is fine.

SmaGMan force-pushed the feat/split-state-storage-persistent branch from cfd4ce6 to 18f40f7 Compare December 18, 2025 18:09

SmaGMan mentioned this pull request Dec 19, 2025

split state storage on partitions (step 1) #946

Closed

SmaGMan force-pushed the feat/split-state-storage-persistent branch from 38f8a69 to 4146a59 Compare December 24, 2025 10:43

SmaGMan requested review from 0xdeafbeef, Rexagon and pashinov December 24, 2025 10:44

SmaGMan marked this pull request as ready for review December 24, 2025 10:45

SmaGMan marked this pull request as draft December 24, 2025 10:53

SmaGMan force-pushed the feat/split-state-storage-persistent branch from 7f35ecc to 9723205 Compare December 24, 2025 11:23

SmaGMan marked this pull request as ready for review December 24, 2025 14:22

This was linked to issues Dec 26, 2025

Implement shard accounts state sharding into local partitions #940

Open

Implement support for partitions in persistent states #941

Open

SmaGMan force-pushed the feat/split-state-storage-persistent branch 2 times, most recently from 8d2bc28 to af375e8 Compare December 27, 2025 19:13

SmaGMan changed the title ~~Split state storage on parts~~ Split state persistent on parts Dec 27, 2025

SmaGMan removed a link to an issue Dec 27, 2025

Implement shard accounts state sharding into local partitions #940

Open

SmaGMan marked this pull request as draft December 27, 2025 19:23

SmaGMan force-pushed the feat/split-state-storage-persistent branch 7 times, most recently from 929cb77 to 45c1d4e Compare January 26, 2026 07:21

SmaGMan changed the title ~~Split state persistent on parts~~ Split state on parts Jan 26, 2026

SmaGMan marked this pull request as ready for review January 26, 2026 08:52

SmaGMan linked an issue Jan 26, 2026 that may be closed by this pull request

Implement shard accounts state sharding into local partitions #940

Open

SmaGMan added 28 commits January 30, 2026 17:40

migration: core db to v0.0.5:

6cde901

- add HAS_STATE_PARTS flag to existing blocks in the `block_handles`

feature(core): check if shard state or partition already stored

d9f48c7

feature(core): store shard partitions router in ShardStates

a624d5a

+ do not double store partition subtree root cell in the main storage

perf(core): use shard prefix for router

901521b

feature(core): state partitions config

a24ada3

chore(core): rename shard state parts to avoid confusion with queue p…

35701af

…artitions

refactor(core): states gc metrics

564cbe8

feature(rpc): support parts in persistent state exchange

4d0e7ac

feature(core): store shard state from files with parts

b842ed5

feature(core): store persistent shard part info in the file

3f256b7

feature(core): store shard split depth to the main persistent file an…

2ac21d1

…d check downloaded

feature(core): check all required shard parts stored from persistent …

9915c88

…state

feature(core): store state from persistent with parts when not all pa…

827586e

…rts populated

feature(core): test preload persistent states with and without parts

616be58

feature(core): test store shard state from persistent with and withou…

7ca5d65

…t parts

refactor(core): move persistent storage helper methods

03b4cb1

feature(core): update block handle flags when persistent state removed

f110462

feature(core): do not store persistent state main file if reused

bc5e9a7

chore(core): removed local_debug logs

434a4b0

chore(core): store subtree in storage part only when storage part exists

216cba1

chore: fix clippy

77a4bdc

fix(core): trigger_compaction usage for cells db

4eaca5c

feature(core): store absent cells instead of part root children when …

feb1254

…split state on parts

feature(core): ref tycho-types branch with absent cells support

7339ed1

feature(core): updated persistent split tests

ae8e18a

chore: log environment config

77deb15

feature(core): db migration v4 -> v5: add HAS_STATE_PARTS only when H…

e8bbba3

…AS_STATE_MAIN exists, add HAS_PERSISTENT_SHARD_STATE_PARTS when HAS_PERSISTENT_SHARD_STATE_MAIN exists

chore(core): move core db migration function

5d89709

SmaGMan force-pushed the feat/split-state-storage-persistent branch from 5291a09 to 5d89709 Compare January 30, 2026 17:51

github-advanced-security bot found potential problems Jan 30, 2026

View reviewed changes

@@ -462,7 +462,7 @@
                                 let mut reference_indices = SmallVec::with_capacity(references_buffer.len());
                                 let mut indices_buffer = [0; 4];
-                                let mut keys = [std::ptr::null(); 4];
+                                let mut keys: [[u8; 32]; 4] = [[0; 32]; 4];
                                 let mut preload_count = 0;
                                 for hash in &references_buffer {
@@ -473,7 +473,7 @@
                                             entry.insert((remap_index, false));
                                             indices_buffer[preload_count] = remap_index;
-                                            keys[preload_count] = hash.as_ptr();
+                                            keys[preload_count] = *hash;
                                             preload_count += 1;
                                             remap_index
@@ -482,7 +482,7 @@
                                             let (remap_index, written) = *entry.get();
                                             if !written {
                                                 indices_buffer[preload_count] = remap_index;
-                                                keys[preload_count] = hash.as_ptr();
+                                                keys[preload_count] = *hash;
                                                 preload_count += 1;
                                             }
                                             remap_index
@@ -508,7 +508,7 @@
                                     for i in 0..preload_count {
                                         let index = indices_buffer[i];
-                                        let child_hash = unsafe { *keys[i].cast::<[u8; 32]>() };
+                                        let child_hash = keys[i];
                                         stack.push((index, StackItem::New(child_hash)));
                                     }
                                 }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split state on parts#983

Split state on parts#983
SmaGMan wants to merge 31 commits intomasterfrom
feat/split-state-storage-persistent

SmaGMan commented Dec 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

codecov bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SmaGMan commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

NODE CONFIGURATION MODEL CHANGES

BLOCKCHAIN CONFIGURATION MODEL CHANGES

COMPATIBILITY

SPECIAL DEPLOYMENT ACTIONS

PERFORMANCE IMPACT

TESTS

Unit Tests

Network Tests

Manual Tests

Uh oh!

github-actions bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 Network Tests

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

codecov bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SmaGMan commented Dec 18, 2025 •

edited

Loading

github-actions bot commented Dec 18, 2025 •

edited

Loading

codecov bot commented Dec 18, 2025 •

edited

Loading