Skip to content

perf(l1): use Bytes for trie values to enable O(1) clones#6057

Open
pablodeymo wants to merge 38 commits intomainfrom
perf/avoid-copying-trie-leaves
Open

perf(l1): use Bytes for trie values to enable O(1) clones#6057
pablodeymo wants to merge 38 commits intomainfrom
perf/avoid-copying-trie-leaves

Conversation

@pablodeymo
Copy link
Contributor

Summary

  • Change ValueRLP type from Vec<u8> to Bytes (reference-counted) for O(1) clones in trie cache lookups
  • Storage reads return Bytes (zero-copy from RocksDB)
  • Storage writes use references (RocksDB passes directly to put_cf)
  • In-memory backend copies on write (acceptable for testing, prioritizing RocksDB perf)

Key Changes

Trie layer:

  • ValueRLP = Bytes instead of Vec<u8>
  • LeafNode and BranchNode store Bytes values
  • Added BytesWrapper for rkyv serialization

Storage API:

  • get() returns Option<Bytes> - zero-copy from RocksDB's Vec<u8>
  • put_batch(&[(&[u8], &[u8])]) - references passed directly to RocksDB

Performance focus:

  • RocksDB (production): writes pass slices directly, reads are zero-copy
  • In-memory (testing): stores Bytes, copies on write
  • Trie cache lookups: O(1) Bytes clones instead of O(n) Vec copies

Test plan

  • cargo check --workspace passes
  • Run existing tests
  • Benchmark trie operations to measure improvement

…tory

Reorganize state_healing.rs and storage_healing.rs into a shared
sync/healing/ module structure with clearer naming conventions:

- Create sync/healing/ directory with mod.rs, types.rs, state.rs, storage.rs
- Rename MembatchEntryValue to HealingQueueEntry
- Rename MembatchEntry to StorageHealingQueueEntry
- Rename Membatch type to StorageHealingQueue
- Rename children_not_in_storage_count to missing_children_count
- Rename membatch variables to healing_queue throughout
- Extract shared HealingQueueEntry and StateHealingQueue types to types.rs
- Update sync.rs imports to use new healing module
Reorganize snap protocol code for better maintainability:

- Split rlpx/snap.rs into rlpx/snap/ directory:
  - codec.rs: RLP encoding/decoding for snap messages
  - messages.rs: Snap protocol message types
  - mod.rs: Module re-exports

- Split snap.rs into snap/ directory:
  - constants.rs: Snap sync constants and configuration
  - server.rs: Snap protocol server implementation
  - mod.rs: Module re-exports

- Move snap server tests to dedicated tests/ directory
- Update imports in p2p.rs, peer_handler.rs, and code_collector.rs
Document the phased approach for reorganizing snap sync code:
- Phase 1: rlpx/snap module split
- Phase 2: snap module split with server extraction
- Phase 3: healing module unification
Split the large sync.rs (1631 lines) into focused modules:

- sync/full.rs (~260 lines): Full sync implementation
  - sync_cycle_full(), add_blocks_in_batch(), add_blocks()

- sync/snap_sync.rs (~1100 lines): Snap sync implementation
  - sync_cycle_snap(), snap_sync(), SnapBlockSyncState
  - store_block_bodies(), update_pivot(), block_is_stale()
  - validate_state_root(), validate_storage_root(), validate_bytecodes()
  - insert_accounts(), insert_storages() (both rocksdb and non-rocksdb)

- sync.rs (~285 lines): Orchestration layer
  - Syncer struct with start_sync() and sync_cycle()
  - SyncMode, SyncError, AccountStorageRoots types
  - Re-exports for public API
…p/client.rs

Move all snap protocol client-side request methods from peer_handler.rs
to a dedicated snap/client.rs module:
- request_account_range and request_account_range_worker
- request_bytecodes
- request_storage_ranges and request_storage_ranges_worker
- request_state_trienodes
- request_storage_trienodes

Also moves related types: DumpError, RequestMetadata, SnapClientError,
RequestStateTrieNodesError, RequestStorageTrieNodes.

This reduces peer_handler.rs from 2,060 to 670 lines (~68% reduction),
leaving it focused on ETH protocol methods (block headers/bodies).

Added SnapClientError variant to SyncError for proper error handling.
Updated plan_snap_sync.md to mark Phase 4 as complete.
…napError type

Implement Phase 5 of snap sync refactoring plan - Error Handling.

- Create snap/error.rs with unified SnapError enum covering all snap protocol errors
- Update server functions (process_account_range_request, process_storage_ranges_request,
  process_byte_codes_request, process_trie_nodes_request) to return Result<T, SnapError>
- Remove SnapClientError and RequestStateTrieNodesError, consolidate into SnapError
- Keep RequestStorageTrieNodesError struct for request ID tracking in storage healing
- Add From<SnapError> for PeerConnectionError to support error propagation in message handlers
- Update sync module to use SyncError::Snap variant
- Update healing modules (state.rs, storage.rs) to use new error types
- Move DumpError struct to error.rs module
- Update test return types to use SnapError
- Mark Phase 5 as completed in plan document

All phases of the snap sync refactoring are now complete.
Change missing_children_count from u64 to usize in HealingQueueEntry
and node_missing_children function to match StorageHealingQueueEntry
and be consistent with memory structure counting conventions.
Resolve conflicts from #5977 and #6018 merge to main:
- Keep modular sync structure (sync.rs delegates to full.rs and snap_sync.rs)
- Keep snap client code in snap/client.rs (removed from peer_handler.rs)
- Add InsertingAccountRanges metric from #6018 to snap_sync.rs
- Remove unused info import from peer_handler.rs
@pablodeymo pablodeymo requested a review from a team as a code owner January 28, 2026 21:00
@greptile-apps
Copy link

greptile-apps bot commented Jan 28, 2026

Greptile Overview

Greptile Summary

This PR refactors the trie value type from Vec<u8> to Bytes (reference-counted) to enable O(1) clones during cache lookups, improving performance for production workloads using RocksDB.

Key changes:

  • Changed ValueRLP type alias from Vec<u8> to Bytes throughout trie layer
  • Updated storage API: reads return Bytes (zero-copy from RocksDB), writes accept slice references
  • Added BytesWrapper for rkyv serialization compatibility
  • RocksDB backend: zero-copy conversion from Vec<u8> to Bytes on reads, direct slice passing on writes
  • In-memory backend: stores Bytes, copies on write (acceptable tradeoff for testing)
  • Made verify_range generic over AsRef<[u8]> for flexibility

Performance impact:

  • Trie cache lookups: O(1) Bytes clones (ref count increment) vs O(n) Vec copies
  • RocksDB reads: zero-copy wrap of returned Vec<u8>
  • RocksDB writes: pass references directly to put_cf

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The refactoring is well-executed with consistent changes across all layers. The conversion from Vec to Bytes is straightforward and follows best practices: zero-copy on reads, reference-based on writes. The rkyv serialization wrapper is correctly implemented. All conversion points properly handle the type change.
  • No files require special attention

Important Files Changed

Filename Overview
crates/common/trie/node.rs Updated ValueRLP type from Vec to Bytes, added Vec to ValueOrHash conversion
crates/common/trie/rkyv_utils.rs Added BytesWrapper for rkyv serialization of Bytes type
crates/common/trie/trie.rs Changed ValueRLP type alias from Vec to Bytes
crates/storage/api/mod.rs Changed storage API to return Bytes and accept slice references for put operations
crates/storage/backend/rocksdb.rs Updated RocksDB backend to return Bytes from get operations and accept slice references for put_batch
crates/storage/backend/in_memory.rs Changed in-memory storage to use Bytes keys/values, copies on write as expected for testing
crates/storage/store.rs Updated all storage calls to handle Bytes return type and convert put_batch calls to use slice references

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Trie as Trie Layer
    participant TrieDB as TrieDB Interface
    participant Storage as Storage API
    participant RocksDB as RocksDB Backend
    
    Note over App,RocksDB: Read Path (Zero-Copy)
    App->>Trie: get(key) -> Option<ValueRLP>
    Trie->>TrieDB: get(Nibbles) -> Option<Vec<u8>>
    TrieDB->>Storage: get(table, key) -> Option<Bytes>
    Storage->>RocksDB: get_cf(cf, key)
    RocksDB-->>Storage: Vec<u8>
    Storage-->>TrieDB: Bytes::from(Vec) (zero-copy)
    TrieDB-->>Trie: Vec<u8>
    Trie-->>App: Bytes (O(1) clone)
    
    Note over App,RocksDB: Write Path (Reference-Based)
    App->>Trie: insert(key, value)
    Trie->>TrieDB: put_batch(Vec<(Nibbles, Bytes)>)
    TrieDB->>Storage: put_batch(table, &[(&[u8], &[u8])])
    Storage->>RocksDB: put_cf(cf, key_slice, value_slice)
    Note over RocksDB: RocksDB copies slices internally
    RocksDB-->>Storage: Ok
    Storage-->>TrieDB: Ok
    TrieDB-->>Trie: Ok
    Trie-->>App: Ok
    
    Note over Trie: Cache Lookup (O(1) Clone)
    Trie->>Trie: Clone Bytes (ref count increment)
Loading

@github-actions
Copy link

github-actions bot commented Jan 28, 2026

Lines of code report

Total lines added: 46
Total lines removed: 0
Total lines changed: 46

Detailed view
+------------------------------------------+-------+------+
| File                                     | Lines | Diff |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/db.rs          | 96    | +1   |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/logger.rs      | 46    | +1   |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/node.rs        | 357   | +5   |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/node/branch.rs | 574   | +4   |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/node/leaf.rs   | 306   | +4   |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/rkyv_utils.rs  | 49    | +15  |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/rlp.rs         | 137   | +1   |
+------------------------------------------+-------+------+
| ethrex/crates/common/trie/trie.rs        | 467   | +3   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/api/mod.rs         | 38    | +1   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/backend/rocksdb.rs | 318   | +3   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/layering.rs        | 173   | +1   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs           | 2432  | +4   |
+------------------------------------------+-------+------+
| ethrex/crates/storage/trie.rs            | 151   | +3   |
+------------------------------------------+-------+------+

@pablodeymo pablodeymo changed the title Use Bytes for trie values to enable O(1) clones perf(l1) use Bytes for trie values to enable O(1) clones Jan 28, 2026
@pablodeymo pablodeymo changed the title perf(l1) use Bytes for trie values to enable O(1) clones perf(l1): use Bytes for trie values to enable O(1) clones Jan 28, 2026
@github-actions github-actions bot added L1 Ethereum client performance Block execution throughput and performance in general labels Jan 28, 2026
@github-project-automation github-project-automation bot moved this to In Review in ethrex_l1 Jan 29, 2026
@github-actions
Copy link

Kimi AI Code Review

This PR aims to improve the performance of trie operations by using Bytes for trie values, enabling O(1) clones. Let's review the changes based on the provided criteria:

Code Correctness and Potential Bugs

  1. Change to Bytes Type: The PR replaces Vec<u8> with Bytes in various parts of the trie implementation. This is done consistently across different modules (db.rs, node.rs, rlp.rs, etc.). The change appears to be correct, as it maintains the functionality while improving performance.

  2. Zero-Copy Conversions: The PR uses zero-copy conversions where possible, which is a good practice. However, it's important to ensure that the conversions are done correctly and do not lead to invalid memory access. For example, in rlp.rs, Bytes::copy_from_slice(value) is used, which is correct for creating a Bytes object without copying the data.

Security Vulnerabilities

  1. Memory Safety: The use of Bytes and zero-copy conversions does not introduce any new security vulnerabilities related to memory safety. The code still ensures that the data is valid and properly managed.

  2. Consensus Rules and EIP Compliance: The changes do not affect the consensus rules or EIP compliance, as they are purely performance optimizations.

Performance Implications

  1. O(1) Clones: The use of Bytes enables O(1) clones, which is a significant performance improvement, especially for trie operations that involve frequent cloning of values.

  2. Memory Usage: The zero-copy conversions help in reducing memory usage, as they avoid unnecessary data copies.

Rust Best Practices and Idiomatic Patterns

  1. Use of Bytes: The use of Bytes is idiomatic in Rust for handling byte slices efficiently. It aligns with Rust's focus on zero-cost abstractions and performance.

  2. Consistent Style: The code style is consistent with Rust best practices, and the changes are well-integrated with the existing codebase.

Memory Safety and Proper Error Handling

  1. Memory Safety: The changes do not introduce any memory safety issues. The use of Bytes and zero-copy conversions is safe, as long as the underlying data remains valid.

  2. Error Handling: The error handling remains consistent with the existing codebase. The PR does not introduce any new error handling mechanisms but maintains the existing ones.

Code Readability and Maintainability

  1. Readability: The code remains readable, and the changes are well-documented through comments and commit messages.

  2. Maintainability: The changes are maintainable and do not introduce any complexity that would hinder future maintenance.

Ethereum-Specific Considerations

  1. EVM Opcode Correctness and Gas Accounting: The changes do not affect EVM opcode correctness or gas accounting, as they are focused on trie performance optimizations.

  2. State Trie and Storage Operations: The changes improve the performance of state trie and storage operations, which is critical for Ethereum clients.

  3. RLP Encoding/Decoding Correctness: The changes maintain the correctness of RLP encoding and decoding, as they only optimize the underlying data representation.

Overall, the PR looks good and addresses the performance optimization goal effectively. The changes are well-implemented and do not introduce any new issues. The use of Bytes for trie values is a significant improvement for performance, and the zero-copy conversions help in reducing memory usage.

Suggested Change:

  • In rlp.rs, consider adding a comment explaining why Bytes::copy_from_slice is used, to ensure that future maintainers understand the reasoning behind this choice.
// Use zero-copy conversion to avoid unnecessary data copies
let value = Bytes::copy_from_slice(value);

Final Thoughts:

The PR is well-executed and aligns with the goals of improving trie performance. It is ready to be merged after addressing the suggested change.


Automated review by Kimi (Moonshot AI)

@pablodeymo
Copy link
Contributor Author

Regarding the suggested comment for Bytes::copy_from_slice: skipping this because the suggested comment ("zero-copy conversion") would be misleading. copy_from_slice does copy the data - the zero-copy benefit comes later when cloning Bytes (O(1) ref-count increment). The code is self-explanatory in this context.

@github-actions
Copy link

Kimi AI Code Review

This PR aims to improve trie value handling by using Bytes from the bytes crate to enable O(1) clones. Here's a detailed review:

Code Correctness and Potential Bugs

  1. Change to Bytes Type: The PR correctly replaces Vec<u8> with Bytes for trie values. This change is propagated throughout the codebase, including in TrieDB trait methods and related structs and implementations.

    Good: Consistent replacement of Vec<u8> with Bytes.

  2. Clone and Copy Semantics: The use of Bytes allows for O(1) cloning due to its internal implementation as a reference-counted, contiguous array. This is correctly leveraged in the code.

    Good: Proper utilization of Bytes for efficient cloning.

Security Vulnerabilities

  1. Memory Safety: The use of Bytes does not introduce any memory safety issues. It correctly manages memory through its internal reference counting.

    Good: No memory safety issues introduced.

  2. Data Integrity: The PR maintains data integrity by ensuring that trie values are correctly encoded and decoded when stored and retrieved.

    Good: Data integrity is preserved.

Performance Implications

  1. O(1) Cloning: The primary goal of using Bytes is to enable O(1) cloning of trie values. This is achieved by the Bytes type's design.

    Good: Performance improvement through efficient cloning.

  2. Memory Usage: The reference-counted nature of Bytes can lead to increased memory usage in scenarios with many clones, but this is a trade-off for the performance gain in cloning.

    Neutral: Potential increase in memory usage, but justified by performance benefits.

Rust Best Practices and Idiomatic Patterns

  1. Use of Bytes: The PR correctly identifies and uses Bytes for efficient handling of byte slices, which is idiomatic in Rust for such use cases.

    Good: Adherence to Rust best practices.

  2. Code Readability: The code remains readable, with clear intentions and minimal changes to the existing logic, focusing on type replacements.

    Good: High code readability.

Memory Safety and Proper Error Handling

  1. Error Handling: The PR maintains proper error handling by propagating errors through the Result and Option types, ensuring that failures are handled gracefully.

    Good: Consistent error handling.

  2. Memory Safety: The use of Bytes does not compromise memory safety, as it correctly manages the lifecycle of the underlying data.

    Good: Maintains memory safety.

Code Readability and Maintainability

  1. Consistency: The PR maintains consistency in the use of Bytes across the board, making the codebase easier to maintain and understand.

    Good: High consistency in code changes.

  2. Documentation and Comments: While the PR does not add specific comments explaining the Bytes usage, the existing documentation and comments are sufficient to understand the changes.

    Neutral: Could benefit from additional comments explaining the rationale behind using Bytes.

Ethereum-Specific Considerations

  1. Consensus Rules and EIP Compliance: The PR does not affect consensus rules or EIP compliance, as it focuses on internal data handling.

    Good: No impact on consensus or EIP compliance.

  2. State Trie and Storage Operations: The changes correctly handle state trie and storage operations, ensuring that the trie structure and its operations remain valid.

    Good: Maintains trie structure integrity.

Conclusion

The PR is well-executed, focusing on improving performance through the use of Bytes for trie values. It adheres to Rust best practices, maintains memory safety, and ensures data integrity. The changes are consistent and readable, making the codebase easier to maintain.

Suggested Change: Consider adding comments explaining the use of Bytes and its benefits, especially in areas where the type is first introduced or where its usage significantly impacts performance.

Overall, the PR is ready to be merged after incorporating the suggested change.


Automated review by Kimi (Moonshot AI)

@github-actions
Copy link

github-actions bot commented Jan 30, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 65.105 ± 0.241 64.799 65.516 1.00 ± 0.00
head 64.996 ± 0.086 64.839 65.115 1.00

pablodeymo and others added 4 commits February 4, 2026 17:57
in the multisync monitoring script (docker_monitor.py).

The sync completion logs already contain per-phase completion markers
(e.g. "✓ BLOCK HEADERS complete: 25,693,009 headers in 0:29:00")
but this data was not surfaced in the Slack messages or run summaries.

This adds a parse_phase_timings() function that reads saved container
logs and extracts timing, count, and duration for all 8 snap sync
phases: Block Headers, Account Ranges, Account Insertion, Storage
Ranges, Storage Insertion, State Healing, Storage Healing, and
Bytecodes. The breakdown is appended to both the Slack notification
(as a code block per network instance) and the text-based run log
(run_history.log and per-run summary.txt). When a phase did not
complete (e.g. on a failed run), it is simply omitted from the
breakdown.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@pablodeymo pablodeymo mentioned this pull request Feb 5, 2026
1 task
Base automatically changed from refactor/snapsync-healing-unification to main February 6, 2026 21:52
Copilot AI review requested due to automatic review settings February 6, 2026 22:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors trie/storage value handling to use reference-counted bytes::Bytes instead of Vec<u8> to make clones O(1) and enable more zero-copy paths (especially for RocksDB reads/writes).

Changes:

  • Switch trie value RLP type to Bytes and update trie node structures + serialization to support it.
  • Update the storage backend API: reads return Bytes, and batched writes accept slices of (&[u8], &[u8]) to reduce allocations.
  • Add snap server tests and update various storage read/decode call sites to work with Bytes.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
crates/storage/api/mod.rs Changes storage API to return Bytes and take slice-based batches.
crates/storage/backend/rocksdb.rs Makes RocksDB reads return Bytes and updates batch write signature.
crates/storage/backend/in_memory.rs Switches in-memory tables to Bytes and updates read/write methods (but currently has trait-signature mismatches).
crates/storage/trie.rs Adapts trie DB wrapper to Bytes values and slice-based put_batch.
crates/storage/store.rs Updates decode paths and write batching to accommodate Bytes-returning storage reads.
crates/storage/layering.rs Updates trie wrapper put_batch signature to use Bytes.
crates/common/trie/trie.rs Changes ValueRLP to Bytes and adjusts insert/commit plumbing.
crates/common/trie/node/*.rs Updates leaf/branch node value types + constructors and rkyv serialization for Bytes.
crates/common/trie/rlp.rs Decodes trie node values into Bytes.
crates/common/trie/rkyv_utils.rs Adds BytesWrapper remote for rkyv serialization.
crates/common/trie/node.rs Adjusts commit accumulation for Bytes values (currently reintroduces copies in one spot).
crates/common/trie/verify_range.rs Generalizes verification to accept value types as AsRef<[u8]>.
crates/common/trie/trie_sorted.rs Adjusts empty/default values and conversions for Bytes.
crates/common/trie/logger.rs Updates trie logger put/put_batch signatures to use Bytes.
crates/common/trie/db.rs Updates TrieDB::put_batch to take Bytes and adapts helpers.
crates/networking/p2p/sync/healing/state.rs Converts trie writes to Bytes when calling put_batch.
crates/networking/p2p/tests/snap_server_tests.rs Adds snap protocol server tests (Hive-based AccountRange vectors).
CHANGELOG.md Notes the performance change to Bytes trie values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

let header_value = txn.get(HEADERS, hash_key.as_slice())?;
let mut header = header_value
.map(|bytes| BlockHeaderRLP::from_bytes(bytes).to())
.map(|bytes| BlockHeaderRLP::from_bytes(bytes.to_vec()).to())
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txn.get(HEADERS, ...) returns Bytes, but bytes.to_vec() allocates just to decode the header. This can be avoided by decoding directly from the Bytes slice (e.g., BlockHeader::decode(bytes.as_ref())) or by changing the RLP wrapper to accept Bytes/&[u8] without cloning.

Suggested change
.map(|bytes| BlockHeaderRLP::from_bytes(bytes.to_vec()).to())
.map(|bytes| BlockHeaderRLP::from_bytes(bytes.as_ref()).to())

Copilot uses AI. Check for mistakes.
Comment on lines +105 to 114
fn put_batch(&self, key_values: Vec<(Nibbles, Bytes)>) -> Result<(), TrieError> {
let mut tx = self.db.begin_write().map_err(|e| {
TrieError::DbError(anyhow::anyhow!("Failed to begin write transaction: {}", e))
})?;
for (key, value) in key_values {
let prefixed_key = self.make_key(key);
let table = self.table_for_key(&prefixed_key);
tx.put_batch(table, vec![(prefixed_key, value)])
tx.put_batch(table, &[(&prefixed_key, value.as_ref())])
.map_err(|e| TrieError::DbError(anyhow::anyhow!("Failed to write batch: {}", e)))?;
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackendTrieDB::put_batch loops and calls tx.put_batch(...) once per key/value pair (with a single-element slice), which largely defeats the purpose of batching and adds per-call overhead. Since table_for_key only routes to either nodes_table or fkv_table, consider grouping the incoming key_values by table and issuing one put_batch call per table.

Copilot uses AI. Check for mistakes.
@@ -122,12 +123,11 @@ impl<'a> StorageReadView for InMemoryReadTx<'a> {
.map_err(|_| StoreError::Custom("Failed to acquire read lock".to_string()))?;

let table_data = db.get(table).cloned().unwrap_or_default();
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This StorageReadView::prefix_iterator impl still takes table: &str, but the trait requires table: &'static str (see crates/storage/api/mod.rs). Update the method signature accordingly (and ensure any other StorageReadView/StorageWriteBatch methods in this file match the trait after the API change).

Copilot uses AI. Check for mistakes.
Comment on lines 102 to 104
impl<'a> StorageReadView for InMemoryReadTx<'a> {
fn get(&self, table: &str, key: &[u8]) -> Result<Option<Vec<u8>>, StoreError> {
fn get(&self, table: &str, key: &[u8]) -> Result<Option<Bytes>, StoreError> {
let db = self
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageReadView::get requires table: &'static str (per crates/storage/api/mod.rs), but this impl uses table: &str, which doesn't match the trait method signature and will not compile. Update the parameter type here (and keep other StorageReadView method signatures consistent).

Copilot uses AI. Check for mistakes.
Comment on lines 150 to 155
node.encode(&mut buf);
let hash = *hash.get_or_init(|| NodeHash::from_encoded(&buf));
if let Node::Leaf(leaf) = node.as_ref() {
acc.push((path.concat(&leaf.partial), leaf.value.clone()));
acc.push((path.concat(&leaf.partial), leaf.value.to_vec()));
}
acc.push((path, buf));
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueRLP is now Bytes (cheap to clone), but NodeRef::commit converts leaf values with leaf.value.to_vec(), reintroducing an O(n) copy per leaf. If these leaf values are intended to be written as flat-key-value entries, consider changing the accumulator to store Bytes (or otherwise avoid copying) so commits preserve the intended O(1) clone behavior.

Copilot uses AI. Check for mistakes.
let block_body_opt = txn
.get(BODIES, &hash_key)?
.map(|bytes| BlockBodyRLP::from_bytes(bytes).to())
.map(|bytes| BlockBodyRLP::from_bytes(bytes.to_vec()).to())
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txn.get(BODIES, ...) now returns Bytes, but this code does bytes.to_vec() just to decode the body. This adds an avoidable allocation/copy per body. Prefer decoding directly from the Bytes buffer (e.g., BlockBody::decode(bytes.as_ref())) or adjusting the RLP wrapper to accept Bytes/&[u8] without cloning.

Suggested change
.map(|bytes| BlockBodyRLP::from_bytes(bytes.to_vec()).to())
.map(|bytes| BlockBodyRLP::from_bytes(bytes.as_ref()).to())

Copilot uses AI. Check for mistakes.
Comment on lines 442 to 446
let Some(block_body) = txn
.get(BODIES, &hash_key)?
.map(|bytes| BlockBodyRLP::from_bytes(bytes).to())
.map(|bytes| BlockBodyRLP::from_bytes(bytes.to_vec()).to())
.transpose()
.map_err(StoreError::from)?
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: bytes.to_vec() is only used to feed BlockBodyRLP::from_bytes, causing an unnecessary allocation/copy. Decode directly from the returned Bytes (or extend the wrapper API) to keep reads zero-copy.

Copilot uses AI. Check for mistakes.
@pablodeymo
Copy link
Contributor Author

pablodeymo commented Feb 9, 2026

Phase Breakdown (from runs 103–113, perf/avoid-copying-trie-leaves)

  │       Phase       │ Avg Time │ % of Sync │        Trend        │
  ├───────────────────┼──────────┼───────────┼─────────────────────┤
  │ Block Headers     │ ~16 min  │ 5%        │ Stable              │
  ├───────────────────┼──────────┼───────────┼─────────────────────┤
  │ Account Ranges    │ ~4 min   │ 1%        │ Stable              │
  ├───────────────────┼──────────┼───────────┼─────────────────────┤
  │ Account Insertion │ ~22 min  │ 7%        │ Stable              │
  ├───────────────────┼──────────┼───────────┼─────────────────────┤
  │ Storage Ranges    │ ~16 min  │ 5%        │ Stable              │
  ├───────────────────┼──────────┼───────────┼─────────────────────┤
  │ Storage Insertion │ ~1h 2m   │ 21%       │ Stable              │
  ├───────────────────┼──────────┼───────────┼─────────────────────┤
  │ State Healing     │ ~1 min   │ <1%       │ Negligible          │
  ├───────────────────┼──────────┼───────────┼─────────────────────┤
  │ Storage Healing   │ ~1h 41m  │ 34%       │ Dominant bottleneck │
  └───────────────────┴──────────┴───────────┴─────────────────────┘

Key Observations

  1. Storage Healing is still the # 1 bottleneck — ~1h 41m average, consuming 34% of total sync time.
    This confirms our earlier analysis and the optimization plan in STORAGE_HEALING_OPTIMIZATION_PLAN.md.
  2. Storage Insertion is # 2 — ~1h 2m average (21%). Together with Storage Healing, storage accounts for
    55% of sync time.
  3. No measurable difference from Bytes optimization — perf/avoid-copying-trie-leaves (runs 103–113)
    shows roughly the same times as the baseline branches. The Bytes change only affected state.rs
    (healing path), and state healing is already <1 min, so this is expected.
  4. Bytecodes phase is missing from all breakdowns — the regex might not be matching, or bytecodes
    complete too fast to register.
  5. Run # 99 timeout (8h on background-header-download) was a one-off — subsequent runs on the same
    commit are fine.
  6. Total sync time variance: 4h 39m to 6h 26m — likely driven by network conditions and peer quality.

Copy link
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description claims "Trie cache lookups: O(1) Bytes clones instead of O(n) Vec copies" — but TrieLayerCache (in layering.rs) still stores FxHashMap<Vec<u8>, Vec<u8>> and does value.clone() on reads (line 84). The layer cache was not updated, so that claimed benefit is not realized. The actual benefit is within the trie node structures themselves (cloning LeafNode/BranchNode values is cheaper). If updating the layer cache is planned as a follow-up, might be worth noting that.

Also, BackendTrieDB::get and BackendTrieDBLocked::get add .to_vec() to convert Bytes back to Vec<u8> (since TrieDB::get returns Vec<u8>). This means the RocksDB zero-copy read benefit doesn't flow through to the trie — it would require changing TrieDB::get to return Bytes as well, which is a larger change.

tokio::task::spawn_blocking(move || {
let txn = backend.begin_read()?;
txn.get(table, &key)
txn.get(table, &key).map(|opt| opt.map(|b| b.to_vec()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .to_vec() introduces an extra copy for every non-trie read compared to the previous code. Before: RocksDB returned Vec<u8> directly. Now: RocksDB returns Vec<u8>, wraps in Bytes (zero-copy), then .to_vec() copies it back. Net effect is one additional allocation for all callers of read_async.

Same applies to read() below at line 1238.

This is fine as a transitional step if the plan is to eventually propagate Bytes to callers, but worth acknowledging as a minor regression on the non-trie read path.

use std::sync::{Arc, RwLock};

type Table = BTreeMap<Vec<u8>, Vec<u8>>;
type Table = BTreeMap<Bytes, Bytes>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Using Bytes for BTreeMap keys adds atomic refcount overhead that's never useful (keys are never shared/cloned for zero-copy benefit). Consider keeping keys as Vec<u8> and only using Bytes for values: BTreeMap<Vec<u8>, Bytes>. Very minor since in-memory is testing-only.

pub fn commit(&mut self) -> Result<(), TrieError> {
let acc = self.commit_without_storing();
// Zero-copy conversion: Bytes::from(Vec<u8>) takes ownership
let acc = acc.into_iter().map(|(k, v)| (k, v.into())).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This .collect() allocates a new Vec just to convert Vec<u8>Bytes. Since commit_without_storing() already builds the vec internally, you could have it return Vec<(Nibbles, Bytes)> directly (doing the .into() inside), avoiding this intermediate collection. Alternatively, put_batch could accept an iterator.

TrieError::DbError(anyhow::anyhow!("Failed to begin read transaction: {}", e))
})?;
tx.get(table, prefixed_key.as_ref())
.map(|opt| opt.map(|b| b.to_vec()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .to_vec() here converts the Bytes from StorageReadView::get back to Vec<u8> because TrieDB::get returns Option<Vec<u8>>. This means the zero-copy RocksDB read benefit doesn't reach the trie — every trie node read still allocates. To realize the "zero-copy from RocksDB" benefit claimed in the PR description, TrieDB::get would need to return Bytes (or Cow<[u8]>), which is a bigger refactor but would be the logical next step.

@github-project-automation github-project-automation bot moved this from In Review to In Progress in ethrex_l1 Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general snapsync

Projects

Status: In Progress
Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants