fix(batcher): stop listening to blocks when one of the rpcs disconnects #1961

MarcosNicolau · 2025-06-03T16:19:34Z

Description

This PR improves the batcher's ws connection. The batcher maintains two ws connections: a primary and a fallback. Previously, if either connection failed during listen_to_new_blocks, the entire process would fail.

With this pr now:

The batcher only returns an error if both connections fail. If at least one succeeds, the process continues.
Previously, a select call would return immediately on the first event, when one disconnected then it would be the first one to return making it fail. The new logic listens to both connections and only exits if both fail.

The process now is the following:

Attempts to connect to both nodes.
Listens for new blocks from both connections.
If one connection fails, it continues listening on the other.
If both fail, it retries the connection process from step 1.

Note: when one fails then the connection we don't try to reconnect until both have failed. Adding this logic is not trivial at all as we would need to create s new process that handles it in the background and deal with mutex, etc.

How to test

Start ethereum-pacakge:

make ethereum_package_start

Start batcher:

make batcher_start_ethereum_package

Locate the rpcs container ids in docker handled by ethereum-package:

docker ps

# You should be looking for:
5ae8707b5dbf   ghcr.io/paradigmxyz/reth:latest                       "/usr/local/bin/reth…"   24 minutes ago   Up 15 minutes   0.0.0.0:8549->8549/tcp, 0.0.0.0:8549->8549/udp, 30303/tcp, 30303/udp, 0.0.0.0:8552->8545/tcp, 0.0.0.0:8553->8546/tcp, 0.0.0.0:8550->8551/tcp, 0.0.0.0:8551->9001/tcp   el-2-reth-lighthouse--3e21610d756d4ef588441314a9733ca1
180a0bcae8c2   ghcr.io/paradigmxyz/reth:latest                       "/usr/local/bin/reth…"   24 minutes ago   Up 19 minutes   0.0.0.0:8542->8542/tcp, 0.0.0.0:8545-8546->8545-8546/tcp, 0.0.0.0:8542->8542/udp, 30303/tcp, 30303/udp, 0.0.0.0:8543->8551/tcp, 0.0.0.0:8544->9001/tcp                 el-1-reth-lighthouse--9b08863b6524476092e0770690968857

Play around with starting and stopping the containers you should see the behavior explained above.

Type of change

Bug fix

Checklist

Copilot

Pull Request Overview

This PR fixes the batcher’s websocket behavior so that it only errors out when both primary and fallback connections fail. It revises the retry constants and reworks the block subscription logic by replacing tokio::select! with a join! based approach to concurrently await both streams.

Updated ETHEREUM_CALL_MAX_RETRY_DELAY from 3600 to 60 seconds.
Modified block subscription logic to use join! for awaiting responses from both primary and fallback providers.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
batcher/aligned-sdk/src/common/constants.rs	Adjusted constant values supporting Ethereum call retry logic.
batcher/aligned-batcher/src/lib.rs	Changed the mechanism for listening to new blocks with a join! based approach.

Comments suppressed due to low confidence (2)

batcher/aligned-sdk/src/common/constants.rs:44

The reduction of the retry delay from 3600 to 60 seconds is a significant change. Please add a comment or documentation explaining the rationale behind this new value to help maintainers understand its impact.

pub const ETHEREUM_CALL_MAX_RETRY_DELAY: u64 = 60; // seconds

batcher/aligned-batcher/src/lib.rs:374

Using join! here waits for both streams to respond, which might cause delays if one stream is slow or unresponsive. Consider using tokio::select! so that the code can process a block as soon as either stream provides one.

let (block_main, block_fallback) = join!( ... );

batcher/aligned-batcher/src/lib.rs

MauroToscano · 2025-06-03T21:21:35Z

Code seems fine

MarcosNicolau added 2 commits June 2, 2025 18:28

fix: rpc fallback

a3314d6

fix: don't force connection on both nodes

79684a0

MarcosNicolau self-assigned this Jun 3, 2025

fix: lower batcher listen to new blocks max retry delay

7c05c72

JuArce requested a review from Copilot June 3, 2025 18:05

Copilot AI reviewed Jun 3, 2025

View reviewed changes

MarcosNicolau commented Jun 3, 2025

View reviewed changes

batcher/aligned-batcher/src/lib.rs Outdated Show resolved Hide resolved

Update batcher/aligned-batcher/src/lib.rs

bbefe00

MauroToscano approved these changes Jun 4, 2025

View reviewed changes

JuArce approved these changes Jun 4, 2025

View reviewed changes

MauroToscano added this pull request to the merge queue Jun 4, 2025

Merged via the queue into staging with commit 6ec1dc6 Jun 4, 2025
3 checks passed

MauroToscano deleted the fix/batcher-rpc-fallback branch June 4, 2025 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(batcher): stop listening to blocks when one of the rpcs disconnects #1961

fix(batcher): stop listening to blocks when one of the rpcs disconnects #1961

MarcosNicolau commented Jun 3, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

MauroToscano commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

fix(batcher): stop listening to blocks when one of the rpcs disconnects #1961

fix(batcher): stop listening to blocks when one of the rpcs disconnects #1961

Conversation

MarcosNicolau commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

MauroToscano commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

MarcosNicolau commented Jun 3, 2025 •

edited

Loading