perf: process chunks in par for get logs in block range `eth_getLogs` #16675

thaodt · 2025-06-05T09:53:37Z

Description

the current eth_getLogs behavior supports two types of block filtering:

AtBlockHash: Gets logs for a specific block by hash
Range: Gets logs for a range of blocks
For range queries, it processes blocks in chunks of max_headers_range (1000 blocks) and uses bloom filters to quickly skip blocks that can't contain matching logs, then fetches receipts and logs only for blocks that might contain matches.

The current implementation of get_logs_in_block_range_inner() processes block ranges sequentially, which can lead to performance bottlenecks when querying logs across large block ranges. This is particularly noticeable in scenarios like:

Indexers scanning large historical ranges
Apps that need to fetch logs across many blocks
Initial data loading for dApps that rely on event logs

Solution

This PR introduces parallel processing of block ranges while maintaining the correct ordering of results and limit concurrency to a reasonable value (4) to avoid overwhelming system resources.

Falls back to sequential processing for small ranges (1 chunk or less)
Preserves all existing optimizations (bloom filter checks, smart block hash retrieval) via process_block_range() fn to encapsulate the logic for processing a single block range.
Maintains all limit checks and error handling behavior.

Please help me review it. Thanks 🙏

thaodt · 2025-06-06T02:43:47Z

crates/rpc/rpc/src/eth/filter.rs

+
+        // adjust based on system capabilities
+        //TODO: configurable, 4 is a good default for both I/O and CPU balance
+        let concurrency = std::cmp::min(ranges.len(), 4);


The get_logs_in_block_range_inner fn involves both I/O opts (fetching headers, receipts) and CPU opts (filtering logs). Therefore I left a value of 4 provides a balance that works well for mixed ones.

can we add a constant here for the concurrency value? and move to its definition the comment about why 4.

Yes it is possible, do you want me to do it? Or you will take care of it once the PR #16441 is merged?

thaodt · 2025-06-06T02:47:37Z

crates/rpc/rpc/src/eth/filter.rs

-        for (from, to) in
-            BlockRangeInclusiveIter::new(from_block..=to_block, self.max_headers_range)
+        // for small ranges (less than 2 chunks)
+        if ranges.len() <= 1 {


still keep the original sequential impl for small ranges because if there is only one, its better to process directly.

mattsse

thanks, all of this makes sense, but we're also working on similar specialization #16441
this pr would introduce conflicts, so I'm parking this for a bit

fgimenez

looks great, smol nit

i think the changes in this PR can complement very well the approach proposed on #16441:

for recent blocks, we can keep using CachedMode
for old blocks, small range, use range queries
for old blocks, large range, we can combine both range queries and the parallelization introduced in this PR

this way we get the best of both PRs, fewer, more efficient operations executed in parallel

we would need to merge #16441 first though, i can take care of fixing conflicts here when done

fgimenez · 2025-06-16T14:54:39Z

crates/rpc/rpc/src/eth/filter.rs

+
+        // adjust based on system capabilities
+        //TODO: configurable, 4 is a good default for both I/O and CPU balance
+        let concurrency = std::cmp::min(ranges.len(), 4);


can we add a constant here for the concurrency value? and move to its definition the comment about why 4.

thaodt · 2025-06-17T04:55:43Z

looks great, smol nit

i think the changes in this PR can complement very well the approach proposed on #16441:

for recent blocks, we can keep using CachedMode

for old blocks, small range, use range queries

for old blocks, large range, we can combine both range queries and the parallelization introduced in this PR

this way we get the best of both PRs, fewer, more efficient operations executed in parallel

we would need to merge #16441 first though, i can take care of fixing conflicts here when done

@fgimenez Thank you for the awesome work! Having it merged is more than what we could ask for. I'll happily rebase and complete this PR afterwards 🙏.

fgimenez · 2025-07-11T09:41:33Z

@thaodt #16441 has been merged, let me know if you need any help to resolve conflicts here

thaodt · 2025-07-13T05:43:36Z

@thaodt #16441 has been merged, let me know if you need any help to resolve conflicts here

woo hoo cool! yes, can you please shed some light here @fgimenez ? 🙏

fgimenez · 2025-07-14T08:23:12Z

woo hoo cool! yes, can you please shed some light here @fgimenez ? 🙏

@thaodt sure, #16441 introduced two modes for getting logs, CachedMode for recent blocks and RangeMode for old blocks. I believe the improvement introduced in this PR makes more sense to be applied to RangeMode and especially for large ranges so that we don't have any overhead querying in parallel and reassembling for small range queries, what do you think?

thaodt · 2025-07-14T15:18:12Z

woo hoo cool! yes, can you please shed some light here @fgimenez ? 🙏

@thaodt sure, #16441 introduced two modes for getting logs, CachedMode for recent blocks and RangeMode for old blocks. I believe the improvement introduced in this PR makes more sense to be applied to RangeMode and especially for large ranges so that we don't have any overhead querying in parallel and reassembling for small range queries, what do you think?

nice. I will rework this PR

thaodt · 2025-07-16T12:30:37Z

Hi @fgimenez i rebased and push updates, can you please take a look if this is right as you meant? 🙏

fgimenez

looks very good! just one question about avoiding to check the cache

fgimenez · 2025-07-16T14:21:57Z

crates/rpc/rpc/src/eth/filter.rs

+                let mut chunk_results = Vec::new();
+
+                for header in chunk_headers {
+                    // First check if already cached to avoid unnecessary provider calls


i believe at this point we don't need to check if the results are cached, RangeMode will be activated for requests of logs older than what we have in the cache, wdyt? would it make sense to directly fetch from the provider

hey I made changes as per your review, can you please take another look? 🙏

fgimenez

lgtm, thx! pending @mattsse

mattsse · 2025-07-18T09:06:04Z

crates/rpc/rpc/src/eth/filter.rs

+            let task = async move {
+                let mut chunk_results = Vec::new();
+
+                for header in chunk_headers {
+                    // Fetch directly from provider - RangeMode is used for older blocks unlikely to
+                    // be cached
+                    let receipts =
+                        match filter_inner.provider().receipts_by_block(header.hash().into())? {
+                            Some(receipts) => Arc::new(receipts),
+                            None => continue, // No receipts found
+                        };
+
+                    if !receipts.is_empty() {
+                        chunk_results.push(ReceiptBlockResult {
+                            receipts,
+                            recovered_block: None,
+                            header,
+                        });
+                    }
+                }


this most likely doesn't improve anything because this isn't actually async so what ends up happening is that these chunks are just processed sequentially because receipts_by_block is blocking

mattsse · 2025-07-18T09:08:48Z

crates/rpc/rpc/src/eth/filter.rs

+                        match filter_inner.provider().receipts_by_block(header.hash().into())? {
+                            Some(receipts) => Arc::new(receipts),
+                            None => continue, // No receipts found
+                        };


so we'd need to spawn_blocking this here in particular or the entire async scope

mattsse · 2025-07-18T09:10:08Z

crates/rpc/rpc/src/eth/filter.rs

+        // Process results and maintain order
+        let mut ordered_results: Vec<Vec<ReceiptBlockResult<Eth::Provider>>> =
+            (0..results.len()).map(|_| Vec::new()).collect();
+        for result in results {
+            match result {
+                Ok((chunk_idx, chunk_results)) => {
+                    ordered_results[chunk_idx] = chunk_results;
+                }
+                Err(e) => return Err(e),
+            }


this looks a bit weird, join_all already returns in order?

mattsse

last nit

mattsse · 2025-07-19T07:43:12Z

crates/rpc/rpc/src/eth/filter.rs

+        let mut header_chunks = Vec::new();
+        for chunk_start in (0..range_headers.len()).step_by(chunk_size) {
+            let chunk_end = std::cmp::min(chunk_start + chunk_size, range_headers.len());
+            header_chunks.push(range_headers[chunk_start..chunk_end].to_vec());


not a fan of using slice indexing here

this is basically just

use itertools::Itertools; for chunk_headers in range_headers .into_iter() .chunks(chunk_size) {}

gakonst · 2025-07-19T10:38:45Z

What do the benchmarks say about the improvements from this PR for which workloads?

thaodt · 2025-07-21T15:33:59Z

What do the benchmarks say about the improvements from this PR for which workloads?

Here is the benchmark results for 10k blocks (bench is run on our RISE devnet):

Before

----- High Volume Test Statistics -----
Number of tests: 100
Min Latency: 1.052575305s
Mean Latency: 1.423230211s
Median Latency: 1.422722982s
Max Latency: 1.845138906s
Standard Deviation: 185.028061ms

After (@fgimenez 's PR and this complement PR)

----- High Volume Test Statistics -----
Number of tests: 100
Min Latency: 273.061434ms
Mean Latency: 407.566407ms
Median Latency: 436.268471ms
Max Latency: 491.721147ms
Standard Deviation: 70.529748ms

…paradigmxyz#16675)

thaodt requested review from Rjected and mattsse as code owners June 5, 2025 09:53

github-project-automation bot added this to Reth Tracker Jun 5, 2025

github-project-automation bot moved this to Backlog in Reth Tracker Jun 5, 2025

thaodt commented Jun 6, 2025

View reviewed changes

mattsse reviewed Jun 6, 2025

View reviewed changes

fgimenez reviewed Jun 16, 2025

View reviewed changes

mattsse assigned fgimenez Jun 17, 2025

emhane added C-perf A change motivated by improving speed, memory usage or disk footprint A-rpc Related to the RPC implementation labels Jun 18, 2025

thaodt mentioned this pull request Jun 23, 2025

Implement dedicated log indexing system for eth_getLogs performance #16999

Open

jenpaff moved this from Backlog to In Progress in Reth Tracker Jul 8, 2025

jenpaff added this to the v1.5.2 milestone Jul 9, 2025

mattsse added the A-sdk Related to reth's use as a library label Jul 15, 2025

mattsse moved this from In Progress to In Review in Reth Tracker Jul 15, 2025

feat: rebase & split process range as suggestion

15368ae

thaodt force-pushed the feat/eth_getlogs_optz branch from 02d3934 to 15368ae Compare July 16, 2025 12:10

thaodt requested a review from RomanHodulak as a code owner July 16, 2025 12:10

fgimenez reviewed Jul 16, 2025

View reviewed changes

fix: no need to check cache

e29749e

thaodt requested review from fgimenez and mattsse July 17, 2025 09:18

fgimenez approved these changes Jul 17, 2025

View reviewed changes

github-project-automation bot moved this from In Review to In Progress in Reth Tracker Jul 17, 2025

mattsse requested changes Jul 18, 2025

View reviewed changes

fix: address reviewer\'s comments

251b2dd

thaodt requested a review from mattsse July 18, 2025 10:33

mattsse requested changes Jul 19, 2025

View reviewed changes

chore: use itertools for chunks instead of slice indexing

41edff5

thaodt requested a review from gakonst as a code owner July 19, 2025 15:31

thaodt requested a review from mattsse July 19, 2025 15:31

jenpaff modified the milestones: v1.6.0, v.1.5.3 Jul 22, 2025

mattsse approved these changes Jul 22, 2025

View reviewed changes

mattsse added this pull request to the merge queue Jul 22, 2025

Merged via the queue into paradigmxyz:main with commit 2446c2f Jul 22, 2025
41 checks passed

github-project-automation bot moved this from In Progress to Done in Reth Tracker Jul 22, 2025

jenpaff modified the milestones: v.1.6.1, v1.6.0 Jul 22, 2025

jenpaff moved this from Done to Completed in Reth Tracker Jul 22, 2025

lwedge99 pushed a commit to sentioxyz/reth that referenced this pull request Sep 16, 2025

perf: process chunks in par for get logs in block range eth_getLogs (…

4ca820f

…paradigmxyz#16675)

perf: process chunks in par for get logs in block range eth_getLogs #16675

perf: process chunks in par for get logs in block range eth_getLogs #16675

Conversation

thaodt commented Jun 5, 2025

Description

Solution

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattsse left a comment

Choose a reason for hiding this comment

Uh oh!

fgimenez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thaodt commented Jun 17, 2025

Uh oh!

fgimenez commented Jul 11, 2025

Uh oh!

thaodt commented Jul 13, 2025

Uh oh!

fgimenez commented Jul 14, 2025

Uh oh!

thaodt commented Jul 14, 2025

Uh oh!

thaodt commented Jul 16, 2025

Uh oh!

fgimenez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fgimenez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattsse left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gakonst commented Jul 19, 2025

Uh oh!

thaodt commented Jul 21, 2025

Before

After (@fgimenez 's PR and this complement PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

perf: process chunks in par for get logs in block range `eth_getLogs` #16675

perf: process chunks in par for get logs in block range `eth_getLogs` #16675