Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

### 2026-02-25

- Speed up snap sync validation with parallelism and deduplication [#6191](https://github.com/lambdaclass/ethrex/pull/6191)
- Disable balance check for prewarming to avoid early reverts [#6259](https://github.com/lambdaclass/ethrex/pull/6259)

### 2026-02-24
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/common/trie/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ smallvec = { version = "1.10.0", features = ["const_generics", "union"] }
digest = "0.10.6"
lazy_static.workspace = true
crossbeam.workspace = true
rayon.workspace = true
rustc-hash.workspace = true
rkyv.workspace = true

Expand Down
80 changes: 80 additions & 0 deletions crates/common/trie/trie.rs
Original file line number Diff line number Diff line change
Expand Up @@ -562,6 +562,86 @@ impl Trie {
}
Ok(())
}

/// Validate the trie structure in parallel by splitting at the root branch node.
/// Each of the root's 16 subtrees is validated independently using rayon.
pub fn validate_parallel(self) -> Result<(), TrieError> {
use rayon::prelude::*;

if !self.root.is_valid() {
return Ok(());
}

let db = &*self.db;
let root_node = self
.root
.get_node_checked(db, Nibbles::default())?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: get_node_checked won't detect a hash mismatch if the root node was already loaded.

This shouldn't matter right now given we usually call this on a freshly opened trie, but it might be a good idea to verify.

.ok_or_else(|| TrieError::Verify("Root node not found".to_string()))?;

match &*root_node {
Node::Branch(branch_node) => {
let children: Vec<(Nibbles, NodeRef)> = branch_node
.choices
.iter()
.enumerate()
.filter(|(_, child)| child.is_valid())
.map(|(i, child)| {
let path = Nibbles::default().append_new(i as u8);
(path, child.clone())
})
.collect();

children.par_iter().try_for_each(|(start_path, start_ref)| {
validate_subtree(db, start_path.clone(), start_ref.clone())
})
}
_ => {
// Non-branch root (rare): validate sequentially
validate_subtree(db, Nibbles::default(), self.root.clone())
}
}
}
}

/// Validate a subtree rooted at `start_ref`, checking that all referenced nodes exist
/// and their hashes match.
fn validate_subtree(
db: &dyn TrieDB,
start_path: Nibbles,
start_ref: NodeRef,
) -> Result<(), TrieError> {
let mut expected_count: isize = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be necessary to keep this. validate() only had this because the iterator has no way of distinguishing "ended because of error" and "ended because all nodes were visited".

let mut stack = vec![(start_path, start_ref)];

while let Some((path, node_ref)) = stack.pop() {
let node = node_ref
.get_node_checked(db, path.clone())?
.ok_or_else(|| TrieError::Verify(format!("Missing node at path {path:?}")))?;

expected_count -= 1;
match &*node {
Node::Branch(branch) => {
for (choice, child) in branch.choices.iter().enumerate().rev() {
if child.is_valid() {
expected_count += 1;
stack.push((path.append_new(choice as u8), child.clone()));
}
}
}
Node::Extension(ext) => {
expected_count += 1;
stack.push((path.concat(&ext.prefix), ext.child.clone()));
}
Node::Leaf(_) => {}
}
}

if expected_count != 0 {
return Err(TrieError::Verify(format!(
"Node count mismatch in subtree, expected {expected_count} more"
)));
}
Ok(())
}

impl IntoIterator for Trie {
Expand Down
1 change: 1 addition & 0 deletions crates/guest-program/bin/openvm/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/guest-program/bin/risc0/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/guest-program/bin/sp1/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/guest-program/bin/zisk/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/l2/tee/quote-gen/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

94 changes: 64 additions & 30 deletions crates/networking/p2p/sync/snap_sync.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ use ethrex_rlp::decode::RLPDecode;
use ethrex_storage::Store;
#[cfg(feature = "rocksdb")]
use ethrex_trie::Trie;
use rayon::iter::{ParallelBridge, ParallelIterator};
use tracing::{debug, error, info, warn};

use crate::metrics::{CurrentStepValue, METRICS};
Expand Down Expand Up @@ -700,7 +699,7 @@ pub async fn validate_state_root(store: Store, state_root: H256) -> bool {
store
.open_locked_state_trie(state_root)
.expect("couldn't open trie")
.validate()
.validate_parallel()
})
.await
.expect("We should be able to create threads");
Expand All @@ -717,21 +716,40 @@ pub async fn validate_state_root(store: Store, state_root: H256) -> bool {
pub async fn validate_storage_root(store: Store, state_root: H256) -> bool {
info!("Starting validate_storage_root");
let is_valid = tokio::task::spawn_blocking(move || {
store
use rayon::prelude::*;
let mut iter = store
.iter_accounts(state_root)
.expect("couldn't iterate accounts")
.par_bridge()
.try_for_each(|(hashed_address, account_state)| {
let store_clone = store.clone();
store_clone
.open_locked_storage_trie(
hashed_address,
state_root,
account_state.storage_root,
)
.expect("couldn't open storage trie")
.validate()
})
.filter(|(_, account_state)| account_state.storage_root != *EMPTY_TRIE_HASH);

const CHUNK_SIZE: usize = 4096;
let mut result: Result<(), ethrex_trie::TrieError> = Ok(());

loop {
let chunk: Vec<_> = iter.by_ref().take(CHUNK_SIZE).collect();
if chunk.is_empty() {
break;
}

result = chunk
.par_iter()
.try_for_each(|(hashed_address, account_state)| {
store
.open_locked_storage_trie(
*hashed_address,
state_root,
account_state.storage_root,
)
.expect("couldn't open storage trie")
.validate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not validate_parallel?

});

if result.is_err() {
break;
}
}

result
})
.await
.expect("We should be able to create threads");
Expand All @@ -744,27 +762,43 @@ pub async fn validate_storage_root(store: Store, state_root: H256) -> bool {

pub fn validate_bytecodes(store: Store, state_root: H256) -> bool {
info!("Starting validate_bytecodes");
let mut is_valid = true;
for (account_hash, account_state) in store

// Collect unique code hashes — many contracts share bytecode (proxies, ERC20 clones)
let mut unique_hashes = HashSet::new();
for (_, account_state) in store
.iter_accounts(state_root)
.expect("we couldn't iterate over accounts")
{
if account_state.code_hash != *EMPTY_KECCACK_HASH
&& !store
.get_account_code(account_state.code_hash)
.is_ok_and(|code| code.is_some())
{
error!(
"Missing code hash {:x} for account {:x}",
account_state.code_hash, account_hash
);
is_valid = false
if account_state.code_hash != *EMPTY_KECCACK_HASH {
unique_hashes.insert(account_state.code_hash);
}
}
if !is_valid {

info!(
"Collected {} unique code hashes for validation",
unique_hashes.len()
);

// Validate in parallel using existence-only check
use rayon::prelude::*;
let missing: Vec<_> = unique_hashes
.par_iter()
.filter(|code_hash| match store.code_exists(**code_hash) {
Ok(exists) => !exists,
Err(e) => {
error!("DB error checking code hash {:x}: {e}", code_hash);
true
}
})
.collect();

if !missing.is_empty() {
for hash in &missing {
error!("Missing code hash {:x}", hash);
}
std::process::exit(1);
}
is_valid
true
}

// ============================================================================
Expand Down Expand Up @@ -874,7 +908,7 @@ async fn insert_storages(
account_storages_snapshots_dir: &Path,
_: &Path,
) -> Result<(), SyncError> {
use rayon::iter::IntoParallelIterator;
use rayon::iter::{IntoParallelIterator, ParallelIterator};

for entry in std::fs::read_dir(account_storages_snapshots_dir)
.map_err(|_| SyncError::AccountStoragesSnapshotsDirNotFound)?
Expand Down
23 changes: 23 additions & 0 deletions crates/storage/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -710,6 +710,29 @@ impl Store {
Ok(Some(code))
}

/// Check if account code exists by its hash, without constructing the full `Code` struct.
/// More efficient than `get_account_code` for existence checks since it skips
/// RLP decoding and `Code` struct construction (no `jump_targets` deserialization).
/// Note: The underlying `get()` still reads the value from RocksDB (including blob files).
pub fn code_exists(&self, code_hash: H256) -> Result<bool, StoreError> {
// Check cache first
if self
.account_code_cache
.lock()
.map_err(|_| StoreError::LockError)?
.get(&code_hash)?
.is_some()
{
return Ok(true);
Comment on lines +718 to +726
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code_exists always takes the account_code_cache mutex even though cache hits are likely rare during bulk validation. When called from a rayon par_iter (as in validate_bytecodes), this mutex can serialize the hot path and reduce parallel speedups. Consider either skipping the code cache for code_exists, or restructuring the cache to allow concurrent reads (e.g., RwLock + a read-only lookup method).

Suggested change
// Check cache first
if self
.account_code_cache
.lock()
.map_err(|_| StoreError::LockError)?
.get(&code_hash)?
.is_some()
{
return Ok(true);
// Opportunistically check cache first without blocking other threads.
match self.account_code_cache.try_lock() {
Ok(cache) => {
if cache.get(&code_hash)?.is_some() {
return Ok(true);
}
}
Err(std::sync::TryLockError::Poisoned(_)) => {
// Preserve previous behavior on poisoned lock.
return Err(StoreError::LockError);
}
Err(std::sync::TryLockError::WouldBlock) => {
// Contended: skip cache and fall back to DB check.
}

Copilot uses AI. Check for mistakes.
}
// Check DB without reading the full value
Ok(self
.backend
.begin_read()?
.get(ACCOUNT_CODES, code_hash.as_bytes())?
.is_some())
Comment on lines +728 to +733
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code_exists claims to avoid "blob value reads", but it calls begin_read().get(ACCOUNT_CODES, ...), which on the RocksDB backend is configured with set_enable_blob_files(true) for ACCOUNT_CODES and will still fetch the value (often from blob files). If the goal is an existence-only check without reading bytecode blobs, consider checking ACCOUNT_CODE_METADATA (fixed-size, non-blob CF) first and only falling back to ACCOUNT_CODES if metadata is missing (older DBs).

Copilot uses AI. Check for mistakes.
}

/// Get code metadata (length) by its hash.
///
/// Checks cache first, falls back to database. If metadata is missing,
Expand Down
1 change: 1 addition & 0 deletions crates/vm/levm/bench/revm_comparison/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions tooling/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.