Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eng 766. List bottom up checkpoint status. #748

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

cryptoAtwill
Copy link
Contributor

@cryptoAtwill cryptoAtwill commented Feb 26, 2024

Closes ENG-766.

Adding a new CLI method that lists the checkpoint status in bottom up checkpoint submission.


This change is Reviewable

Copy link

linear bot commented Feb 26, 2024

// validate signatures and quorum threshold, revert if validation fails
validateActiveQuorumSignatures({signatories: signatories, hash: checkpointHash, signatures: signatures});
if (s.committedCheckpoints[checkpoint.blockHeight].blockHeight != 0) {
revert BottomUpCheckpointAlreadySubmitted();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason this wasn't an error condition is because we wanted to allow multiple relayers to submit the checkpoints and add them to pool of relayers to be rewarded. The difference was that the first relayer caused the checkpoint to be executed, while subsequent ones only went through validation of the payload, and being registered for rewards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a pro or contra argument against turning this into an error. Adding rewards will require migration, and undoing this change would require a migration too, but if we need migration anyway then there is little reason to keep accepting duplicate submissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, we dont have reward now, so I'm not sure if we still want to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I mean, when we'll have rewards, this will need a migration even if you leave it alone, so I don't have an opinion on error vs noop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noop for sure will not kill so many tests

@@ -1076,6 +1076,37 @@ impl BottomUpCheckpointRelayer for EthSubnetManager {

Ok(events)
}

async fn max_quorum_reached_height(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to explain what this method is trying to do.

Comment on lines 1093 to 1107
while l < r {
let mid = (l + r) / 2;

// locate the epoch where checkpoints should have been created
let mid = mid / period * period;

if self.quorum_reached_events(mid).await?.is_empty() {
r = mid - 1;
} else {
maybe_height = Some(mid);
l = mid + 1;
}
}

Ok(maybe_height)
Copy link
Contributor

@aakoshh aakoshh Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth factoring out the bisection algorithm and make some unit tests, because it's not a standard one and looks tricky. Not sure what are the invariants it's trying to adhere to.

I may be wrong but it looks to me like it could get into an infinite loop:

  1. Say our check period is 10, and we call this method with from = 53 and to = 62.
  2. mid = (53+62)/2 = 57
  3. mid = mid / 10 * 10 -> I'm not sure what this should do; mathematically it should be 57, but it might give 50 due to truncation. Let's say after this mid = 50; that's outside from and to. Is that okay for the method to return?
  4. Say quorum_reached_events(50) is not empty. maybe_height = 50, and from = 51.
  5. Goto 2. mid = (51+62)/2 = 56, and we'll enter another loop with mid = 50.
  6. Say quorum_reached_events(50) is empty. in this case to = 49 and we exit with an empty result. Maybe there would have been a result at 60?

I'm also not sure what why we're looking for quorum at fixed multiples of the period? Signatures are submitted asynchronously, they can gather at any height. I'm not even sure why we'd be looking for checkpoints added to the ledger at fixed multiples of the period, since batching could trigger a checkpoint at any height.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, there is a max-pending flag now, I dont think this would be needed anymore. Might be difficult to add tests for this one too.

@cryptoAtwill cryptoAtwill requested a review from aakoshh February 28, 2024 08:00
@raulk
Copy link
Contributor

raulk commented Mar 8, 2024

@cryptoAtwill please add a description and title here

@cryptoAtwill cryptoAtwill changed the title Eng 766 Eng 766. List bottom up checkpoint status. Mar 12, 2024
@jsoares
Copy link
Contributor

jsoares commented Mar 13, 2024

This looks close enough to the finish line and it's just non-critical cli code. @aakoshh's comments are all on outdated code, though still open. @aakoshh can we get a re-review to unblock?

Copy link
Contributor

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but we can make the output clearer, and our lives easier by using more specific names for variables.

ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
ipc/cli/src/commands/checkpoint/status.rs Outdated Show resolved Hide resolved
let ending = max_pending as ChainEpoch * period + start;
let mut checkpoints_ahead = 0;
for h in start..=ending {
let c = provider.get_bottom_up_bundle(&subnet, h).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cleaner to make this return an Option?

Copy link
Contributor Author

@cryptoAtwill cryptoAtwill Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is handled in previous PR: #743

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you query the whole bundle at every height to see if there is a potential checkpoint, right? Can't this be queried with the events API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is indeed a PR that I changed the query from height to height to a range query, but somehow that PR is not merged. I can recreate that on a follow up.

cryptoAtwill and others added 10 commits March 22, 2024 15:22
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
@cryptoAtwill cryptoAtwill requested a review from raulk March 22, 2024 08:09
@cryptoAtwill cryptoAtwill changed the base branch from ENG-763 to main March 22, 2024 08:10
Comment on lines +48 to +49
let start = last_checkpointed_height + 1;
let ending = limit_unsubmitted as ChainEpoch * period + start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to explain what's happening.

For example I don't know if last_checkpointed_height is a) height in the subnet of the last created checkpoint or b) the last submitted checkpoint in the parent.

pub subnet: String,
#[arg(
long,
help = "Limit unsubmitted checkpoints to print (looking forward from last submitted), default: 10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it doesn't necessarily limit the number of checkpoints to print, only the number of checkpoint periods to look ahead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find it confusing that the title and the docs say that it "lists checkpoint statuses", but I don't see any listing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically just display a bunch of metrics/fields related to bottom up checkpointing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

4 participants