Skip to content

Conversation

@sanderpick
Copy link
Contributor

  • Attempts to address fendermint fails to submit iroh resolution task #588 by,
    • Instead of just logging an error, actually vote "nay" if there is some issue with the resolve pool channel. This should at least un-stick pending blobs that are not longer in the pool. I still don't know why the channel would close, but I added some more logging to help figure this out.
    • Hydrate pending blobs from chain state if the local resolve pool is empty. This allows a validator that restarts to vote on currently pending blobs.
    • Ensure that the local resolve pool does not exceed the pool's configured concurrency. Previously, we were using concurrency to determine how many new added blobs to move to pending without considering the current size of the pool
  • Reverts the addition of IrohManager to the Object API that was causing the "Iroh node is not running" errors.

Signed-off-by: Sander Pick <sanderpick@gmail.com>
Signed-off-by: Sander Pick <sanderpick@gmail.com>
Signed-off-by: Sander Pick <sanderpick@gmail.com>
)
}

fn get_pending_read_requests(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some rearranging here to match method order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some rearranging here to match method order

}

pub fn get_open_read_requests<BS: Blockstore>(
pub fn get_read_requests_by_status<BS: Blockstore>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some rearranging here to match method order

}
}
Ok(Err(e)) => {
Err(e) | Ok(Err(e)) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vote "nay" in the case of any error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would mean that if the service on the other side is not responding, we will re-enqueue to task, right? does that mean if the other service is not responding then the task queue will keep growing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the task queue is now bounded by the concurrency config setting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah Okay. sounds good.

Comment on lines +389 to +393
let local_resolving_blobs_count =
local_blobs_count.saturating_sub(local_finalized_blobs.len());
let added_blobs_fetch_count = chain_env
.blob_concurrency
.saturating_sub(local_resolving_blobs_count as u32);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit how many added blobs we move to pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(all changes are mirrored for read requests)

Comment on lines +306 to +307
// If the local blobs pool is empty and there are pending blobs on-chain,
// we may have restarted the validator. We can hydrate the pool here.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hydrate from the prepare step if the local pool is empty (maybe from a restart)

Comment on lines +417 to +424
resolving = local_resolving_blobs_count,
finalized = local_finalized_blobs.len(),
"blob pool counts"
);
tracing::debug!(
added = added_blobs_fetched_count,
pending = pending_blobs_fetched_count,
"blob fetched counts"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up logging

Comment on lines +653 to +654
// Reject the proposal if the current processor is not keeping up with blob
// resolving.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give slow processors the opportunity to reject

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to do this here but does it mean that new block will be created more slowly if proposals are rejected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the block doesn't have enough votes, yes, it would have to advance to another voting "round". if the rejector doesn't have much relative power, the block is still accepted.

@sanderpick sanderpick requested a review from avichalp March 30, 2025 19:12
let client = FendermintClient::new_http(tendermint_url, None)?;
let iroh_manager = IrohManager::from_addr(Some(iroh_addr));
let iroh_addr = iroh_addr
.to_socket_addrs()?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Why did we start using Iroh Manager earlier? Does it provide connection pooling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it provides lazy connection creation... it's something we wrote... it's getting removed with the iroh upgrade work

Ok((count, done))
}

/// Count all items and resolved and failed items.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Count all items and resolved and failed items.
/// Count all items including resolved and failed items.

);

// Once the read request is closed, we can clean up the votes
let mut request_id = read_request.id.as_bytes().to_vec();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@avichalp avichalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Posted some questions for my understanding.

@sanderpick
Copy link
Contributor Author

@avichalp doesn't look like I'll have time to make changes here today before we make a release for tomorrow. I'll do a follow up to address the doc comment and shutdown fendermint if the resolver service exits.

@sanderpick sanderpick merged commit 22f9d42 into main Mar 31, 2025
13 checks passed
@sanderpick sanderpick deleted the sander/improve-blob-resolve-pool branch March 31, 2025 17:37
@avichalp
Copy link
Collaborator

sounds good. created a new issue here #593 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants