refactor: improve resolve pool logic #591

sanderpick · 2025-03-30T19:02:24Z

Attempts to address fendermint fails to submit iroh resolution task #588 by,
- Instead of just logging an error, actually vote "nay" if there is some issue with the resolve pool channel. This should at least un-stick pending blobs that are not longer in the pool. I still don't know why the channel would close, but I added some more logging to help figure this out.
- Hydrate pending blobs from chain state if the local resolve pool is empty. This allows a validator that restarts to vote on currently pending blobs.
- Ensure that the local resolve pool does not exceed the pool's configured concurrency. Previously, we were using concurrency to determine how many new added blobs to move to pending without considering the current size of the pool
Reverts the addition of IrohManager to the Object API that was causing the "Iroh node is not running" errors.

Signed-off-by: Sander Pick <sanderpick@gmail.com>

sanderpick · 2025-03-30T19:02:53Z

fendermint/actors/blob_reader/src/actor.rs

+        )
+    }
+
+    fn get_pending_read_requests(


sanderpick · 2025-03-30T19:03:16Z

fendermint/actors/blob_reader/src/actor.rs

some rearranging here to match method order

sanderpick · 2025-03-30T19:03:28Z

fendermint/actors/blob_reader/src/shared.rs

some rearranging here to match method order

sanderpick · 2025-03-30T19:03:42Z

fendermint/actors/blob_reader/src/state.rs

    }

-    pub fn get_open_read_requests<BS: Blockstore>(
+    pub fn get_read_requests_by_status<BS: Blockstore>(


sanderpick · 2025-03-30T19:03:47Z

fendermint/actors/blob_reader/src/state.rs

some rearranging here to match method order

sanderpick · 2025-03-30T19:07:57Z

fendermint/vm/iroh_resolver/src/iroh.rs

                        }
                    }
-                    Ok(Err(e)) => {
+                    Err(e) | Ok(Err(e)) => {


vote "nay" in the case of any error

It would mean that if the service on the other side is not responding, we will re-enqueue to task, right? does that mean if the other service is not responding then the task queue will keep growing?

the task queue is now bounded by the concurrency config setting

Ah Okay. sounds good.

sanderpick · 2025-03-30T19:08:58Z

fendermint/vm/interpreter/src/chain.rs

+        let local_resolving_blobs_count =
+            local_blobs_count.saturating_sub(local_finalized_blobs.len());
+        let added_blobs_fetch_count = chain_env
+            .blob_concurrency
+            .saturating_sub(local_resolving_blobs_count as u32);


limit how many added blobs we move to pending

(all changes are mirrored for read requests)

sanderpick · 2025-03-30T19:10:06Z

fendermint/vm/interpreter/src/chain.rs

+        // If the local blobs pool is empty and there are pending blobs on-chain,
+        // we may have restarted the validator. We can hydrate the pool here.


hydrate from the prepare step if the local pool is empty (maybe from a restart)

sanderpick · 2025-03-30T19:11:25Z

fendermint/vm/interpreter/src/chain.rs

+            resolving = local_resolving_blobs_count,
+            finalized = local_finalized_blobs.len(),
+            "blob pool counts"
+        );
+        tracing::debug!(
+            added = added_blobs_fetched_count,
+            pending = pending_blobs_fetched_count,
+            "blob fetched counts"


clean up logging

sanderpick · 2025-03-30T19:11:56Z

fendermint/vm/interpreter/src/chain.rs

+                    // Reject the proposal if the current processor is not keeping up with blob
+                    // resolving.


give slow processors the opportunity to reject

makes sense to do this here but does it mean that new block will be created more slowly if proposals are rejected?

if the block doesn't have enough votes, yes, it would have to advance to another voting "round". if the rejector doesn't have much relative power, the block is still accepted.

avichalp · 2025-03-31T03:22:26Z

fendermint/app/src/cmd/objects.rs

                let client = FendermintClient::new_http(tendermint_url, None)?;
-                let iroh_manager = IrohManager::from_addr(Some(iroh_addr));
+                let iroh_addr = iroh_addr
+                    .to_socket_addrs()?


Makes sense. Why did we start using Iroh Manager earlier? Does it provide connection pooling?

it provides lazy connection creation... it's something we wrote... it's getting removed with the iroh upgrade work

avichalp · 2025-03-31T03:25:59Z

fendermint/vm/iroh_resolver/src/pool.rs

+        Ok((count, done))
+    }
+
+    /// Count all items and resolved and failed items.


Suggested change

/// Count all items and resolved and failed items.

/// Count all items including resolved and failed items.

avichalp · 2025-03-31T04:00:29Z

fendermint/vm/interpreter/src/chain.rs

                    );

+                    // Once the read request is closed, we can clean up the votes
+                    let mut request_id = read_request.id.as_bytes().to_vec();


avichalp

Looks good to me. Posted some questions for my understanding.

sanderpick · 2025-03-31T17:36:45Z

@avichalp doesn't look like I'll have time to make changes here today before we make a release for tomorrow. I'll do a follow up to address the doc comment and shutdown fendermint if the resolver service exits.

avichalp · 2025-03-31T18:15:57Z

sounds good. created a new issue here #593 (comment)

sanderpick added 3 commits March 30, 2025 07:14

refactor: improve resolve pool logic

fd47841

Signed-off-by: Sander Pick <sanderpick@gmail.com>

fix: remove iroh manager from objects api

491bae7

Signed-off-by: Sander Pick <sanderpick@gmail.com>

refactor: more readable resolve pool logic

ffad64f

Signed-off-by: Sander Pick <sanderpick@gmail.com>

sanderpick commented Mar 30, 2025

View reviewed changes

sanderpick requested a review from avichalp March 30, 2025 19:12

avichalp reviewed Mar 31, 2025

View reviewed changes

avichalp approved these changes Mar 31, 2025

View reviewed changes

sanderpick merged commit 22f9d42 into main Mar 31, 2025
13 checks passed

sanderpick deleted the sander/improve-blob-resolve-pool branch March 31, 2025 17:37

avichalp mentioned this pull request Mar 31, 2025

Terminate fendermint if the resolver service dies #593

Open

		// If the local blobs pool is empty and there are pending blobs on-chain,
		// we may have restarted the validator. We can hydrate the pool here.

		// Reject the proposal if the current processor is not keeping up with blob
		// resolving.

	/// Count all items and resolved and failed items.
	/// Count all items including resolved and failed items.

refactor: improve resolve pool logic #591

refactor: improve resolve pool logic #591

Uh oh!

Conversation

sanderpick commented Mar 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avichalp left a comment

Choose a reason for hiding this comment

Uh oh!

sanderpick commented Mar 31, 2025

Uh oh!

Uh oh!

avichalp commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants