feat: prune old blocks and states on finalization#170
Conversation
🤖 Kimi Code ReviewReview SummaryThe PR introduces state and block pruning functionality to manage storage growth. While the implementation is generally correct, there are several critical issues that need addressing: Critical Issues
Security & Correctness Issues
Performance Issues
Minor Issues
Recommendations
Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code Review
I could not run tests in this environment because Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewNow I have a full picture of the PR. Let me write up the review. PR #170 Review: Prune old blocks and states on finalizationOverall: The design is sound — states before blocks, protected roots, compile-time invariant assertion, correct ordering of operations. A few issues worth fixing before merge. Medium:
|
Greptile SummaryThis PR introduces retention-based pruning for Issues found:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| crates/storage/src/store.rs | Adds prune_old_states and prune_old_blocks with correct retention logic and protected-root guarantees, but the reported prune count is inaccurate in steady state (counts block-header candidates instead of actual deletions), orphaned states are never discovered, both methods redundantly scan BlockHeaders, and the ordering requirement between the two pub methods is only documented, not enforced. |
Sequence Diagram
sequenceDiagram
participant Caller
participant Store
participant Backend as StorageBackend
Caller->>Store: update_checkpoints(checkpoints)
Store->>Backend: begin_write() — commit new head/justified/finalized
Backend-->>Store: ok
alt finalized.slot > old_finalized_slot
Store->>Store: prune_live_chain(finalized.slot)
Store->>Store: prune_gossip_signatures(finalized.slot)
Store->>Store: prune_attestation_data_by_root(finalized.slot)
Store->>Store: prune_aggregated_payload_table(×2)
Note over Store,Backend: States pruned BEFORE blocks (headers needed for slot lookup)
Store->>Backend: begin_read() — scan all BlockHeaders
Backend-->>Store: (root_bytes, slot) entries
Store->>Store: sort by slot desc, skip STATES_TO_KEEP (900), filter protected
Store->>Backend: begin_write() — delete_batch(States, keys_to_delete)
Backend-->>Store: pruned_states (candidate count, NOT actual deletes)
Store->>Backend: begin_read() — scan all BlockHeaders (again)
Backend-->>Store: (root_bytes, slot) entries
Store->>Store: sort by slot desc, skip BLOCKS_TO_KEEP (21600), filter protected
Store->>Backend: begin_write() — delete_batch(BlockHeaders + BlockBodies + BlockSignatures)
Backend-->>Store: pruned_blocks
Store->>Caller: log "Pruned finalized data"
end
Last reviewed commit: 9b373c5
| let keys_to_delete: Vec<Vec<u8>> = entries | ||
| .into_iter() | ||
| .skip(STATES_TO_KEEP) | ||
| .filter(|(key, _)| !protected.contains(key)) | ||
| .map(|(key, _)| key) | ||
| .collect(); | ||
|
|
||
| let count = keys_to_delete.len(); | ||
| if count > 0 { | ||
| let mut batch = self.backend.begin_write().expect("write batch"); | ||
| batch | ||
| .delete_batch(Table::States, keys_to_delete) | ||
| .expect("delete old states"); | ||
| batch.commit().expect("commit"); | ||
| } | ||
| count |
There was a problem hiding this comment.
pruned_states count is severely inflated in steady state
keys_to_delete.len() counts the number of block header roots that are older than STATES_TO_KEEP, not the number of states that were actually removed from the States table. delete_batch silently skips missing keys, so there is no feedback on what was actually deleted.
In the steady state (running for > 1 day), prune_old_states will see ~BLOCKS_TO_KEEP = 21 600 block header entries on every call. It will compute ~BLOCKS_TO_KEEP − STATES_TO_KEEP = 20 700 "candidate" roots and return that number — even though almost none of those states exist any more (they were deleted in earlier pruning rounds). The pruned_states = 20700 log line will appear on every finalization update despite only one new state actually being pruned.
The same issue exists in prune_old_blocks at line 675, though there it is less misleading because block headers are only deleted once.
A straightforward fix is to count only keys that actually exist before deleting, or to track the real deletion count. For example:
// In prune_old_states, filter to keys that actually exist in States before returning count
let view = self.backend.begin_read().expect("read view");
let keys_to_delete: Vec<Vec<u8>> = keys_to_delete
.into_iter()
.filter(|k| view.get(Table::States, k).expect("get").is_some())
.collect();
drop(view);
let count = keys_to_delete.len();Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 619-634
Comment:
**`pruned_states` count is severely inflated in steady state**
`keys_to_delete.len()` counts the number of *block header roots* that are older than `STATES_TO_KEEP`, not the number of states that were actually removed from the `States` table. `delete_batch` silently skips missing keys, so there is no feedback on what was actually deleted.
In the steady state (running for > 1 day), `prune_old_states` will see ~`BLOCKS_TO_KEEP` = 21 600 block header entries on every call. It will compute ~`BLOCKS_TO_KEEP − STATES_TO_KEEP` = 20 700 "candidate" roots and return that number — even though almost none of those states exist any more (they were deleted in earlier pruning rounds). The `pruned_states = 20700` log line will appear on **every** finalization update despite only one new state actually being pruned.
The same issue exists in `prune_old_blocks` at line 675, though there it is less misleading because block headers are only deleted once.
A straightforward fix is to count only keys that actually exist before deleting, or to track the real deletion count. For example:
```rust
// In prune_old_states, filter to keys that actually exist in States before returning count
let view = self.backend.begin_read().expect("read view");
let keys_to_delete: Vec<Vec<u8>> = keys_to_delete
.into_iter()
.filter(|k| view.get(Table::States, k).expect("get").is_some())
.collect();
drop(view);
let count = keys_to_delete.len();
```
How can I resolve this? If you propose a fix, please make it concise.| pub fn prune_old_states(&mut self, protected_roots: &[H256]) -> usize { | ||
| let view = self.backend.begin_read().expect("read view"); | ||
|
|
||
| // Collect (root_bytes, slot) from BlockHeaders to determine state age. | ||
| let mut entries: Vec<(Vec<u8>, u64)> = view | ||
| .prefix_iterator(Table::BlockHeaders, &[]) | ||
| .expect("iterator") | ||
| .filter_map(|res| res.ok()) | ||
| .map(|(key, value)| { | ||
| let header = BlockHeader::from_ssz_bytes(&value).expect("valid header"); | ||
| (key.to_vec(), header.slot) | ||
| }) | ||
| .collect(); | ||
| drop(view); | ||
|
|
||
| if entries.len() <= STATES_TO_KEEP { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Orphaned states are never pruned
prune_old_states uses the BlockHeaders table as the source of truth for which state roots exist and how old they are. Any state stored in the States table whose root does not have a corresponding entry in BlockHeaders (e.g., a pre-anchor genesis state, or a state written independently for testing/recovery) will be invisible to this function and will never be pruned, regardless of how old it is.
Additionally, if prune_old_blocks is called before prune_old_states (both methods are pub), block headers for the oldest slots will already have been deleted. prune_old_states will then only see BLOCKS_TO_KEEP headers; the states corresponding to the headers that were just deleted will have no slot information and will be permanently orphaned.
The comment in update_checkpoints correctly documents the required ordering, but the pub visibility of both methods creates an API footgun. Consider making them pub(crate) or pub(super) if they are not intended to be called from outside the storage crate, or combining them into a single prune_old_data method that enforces the correct order internally.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 593-610
Comment:
**Orphaned states are never pruned**
`prune_old_states` uses the `BlockHeaders` table as the source of truth for which state roots exist and how old they are. Any state stored in the `States` table whose root does not have a corresponding entry in `BlockHeaders` (e.g., a pre-anchor genesis state, or a state written independently for testing/recovery) will be invisible to this function and will never be pruned, regardless of how old it is.
Additionally, if `prune_old_blocks` is called before `prune_old_states` (both methods are `pub`), block headers for the oldest slots will already have been deleted. `prune_old_states` will then only see `BLOCKS_TO_KEEP` headers; the states corresponding to the headers that were just deleted will have no slot information and will be permanently orphaned.
The comment in `update_checkpoints` correctly documents the required ordering, but the `pub` visibility of both methods creates an API footgun. Consider making them `pub(crate)` or `pub(super)` if they are not intended to be called from outside the storage crate, or combining them into a single `prune_old_data` method that enforces the correct order internally.
How can I resolve this? If you propose a fix, please make it concise.| pub fn prune_old_blocks(&mut self, protected_roots: &[H256]) -> usize { | ||
| let view = self.backend.begin_read().expect("read view"); | ||
|
|
||
| let mut entries: Vec<(Vec<u8>, u64)> = view | ||
| .prefix_iterator(Table::BlockHeaders, &[]) | ||
| .expect("iterator") | ||
| .filter_map(|res| res.ok()) | ||
| .map(|(key, value)| { | ||
| let header = BlockHeader::from_ssz_bytes(&value).expect("valid header"); | ||
| (key.to_vec(), header.slot) | ||
| }) | ||
| .collect(); | ||
| drop(view); | ||
|
|
||
| if entries.len() <= BLOCKS_TO_KEEP { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Duplicate full scan of BlockHeaders
Both prune_old_states (lines 597-605) and prune_old_blocks (lines 647-655) independently iterate the complete BlockHeaders table, parse every SSZ-encoded header, and build the same (root_bytes, slot) vector. This means every finalization that triggers both functions does two full sequential scans and two full sort passes over the same data.
Since update_checkpoints always calls them back-to-back, the sorted entries could be computed once and passed as an argument, or the two functions could be merged into a single prune_old_data method that shares the scan result.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 644-660
Comment:
**Duplicate full scan of `BlockHeaders`**
Both `prune_old_states` (lines 597-605) and `prune_old_blocks` (lines 647-655) independently iterate the complete `BlockHeaders` table, parse every SSZ-encoded header, and build the same `(root_bytes, slot)` vector. This means every finalization that triggers both functions does **two** full sequential scans and two full sort passes over the same data.
Since `update_checkpoints` always calls them back-to-back, the sorted entries could be computed once and passed as an argument, or the two functions could be merged into a single `prune_old_data` method that shares the scan result.
How can I resolve this? If you propose a fix, please make it concise.
Summary
BlockHeaders,BlockBodies,BlockSignatures, andStatestablesBLOCKS_TO_KEEP = 21,600) and ~1 hour of states (STATES_TO_KEEP = 900)Motivation
Previously only the
LiveChainindex and signature/attestation tables were pruned on finalization. The four main data tables grew indefinitely, causing unbounded disk growth.Design
Two new methods on
Store:prune_old_states(protected_roots): scansBlockHeadersfor slot info, keeps newestSTATES_TO_KEEP+ protected roots, deletes fromStatesprune_old_blocks(protected_roots): same pattern withBLOCKS_TO_KEEP, deletes fromBlockHeaders/BlockBodies/BlockSignaturesCalled in
update_checkpoints()after existing pruning, states before blocks (state pruning uses headers for slot lookup).Protected roots (finalized + justified) are always retained regardless of the retention window, preventing panics in code paths that expect these blocks/states to exist.