-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eng 766. List bottom up checkpoint status. #748
base: main
Are you sure you want to change the base?
Conversation
// validate signatures and quorum threshold, revert if validation fails | ||
validateActiveQuorumSignatures({signatories: signatories, hash: checkpointHash, signatures: signatures}); | ||
if (s.committedCheckpoints[checkpoint.blockHeight].blockHeight != 0) { | ||
revert BottomUpCheckpointAlreadySubmitted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason this wasn't an error condition is because we wanted to allow multiple relayers to submit the checkpoints and add them to pool of relayers to be rewarded. The difference was that the first relayer caused the checkpoint to be executed, while subsequent ones only went through validation of the payload, and being registered for rewards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have a pro or contra argument against turning this into an error. Adding rewards will require migration, and undoing this change would require a migration too, but if we need migration anyway then there is little reason to keep accepting duplicate submissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, we dont have reward now, so I'm not sure if we still want to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I mean, when we'll have rewards, this will need a migration even if you leave it alone, so I don't have an opinion on error vs noop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noop for sure will not kill so many tests
@@ -1076,6 +1076,37 @@ impl BottomUpCheckpointRelayer for EthSubnetManager { | |||
|
|||
Ok(events) | |||
} | |||
|
|||
async fn max_quorum_reached_height( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to explain what this method is trying to do.
while l < r { | ||
let mid = (l + r) / 2; | ||
|
||
// locate the epoch where checkpoints should have been created | ||
let mid = mid / period * period; | ||
|
||
if self.quorum_reached_events(mid).await?.is_empty() { | ||
r = mid - 1; | ||
} else { | ||
maybe_height = Some(mid); | ||
l = mid + 1; | ||
} | ||
} | ||
|
||
Ok(maybe_height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth factoring out the bisection algorithm and make some unit tests, because it's not a standard one and looks tricky. Not sure what are the invariants it's trying to adhere to.
I may be wrong but it looks to me like it could get into an infinite loop:
- Say our check period is 10, and we call this method with
from = 53
andto = 62
. mid = (53+62)/2 = 57
mid = mid / 10 * 10
-> I'm not sure what this should do; mathematically it should be 57, but it might give 50 due to truncation. Let's say after thismid = 50
; that's outsidefrom
andto
. Is that okay for the method to return?- Say
quorum_reached_events(50)
is not empty.maybe_height = 50
, andfrom = 51
. - Goto 2.
mid = (51+62)/2 = 56
, and we'll enter another loop withmid = 50
. - Say
quorum_reached_events(50)
is empty. in this caseto = 49
and we exit with an empty result. Maybe there would have been a result at 60?
I'm also not sure what why we're looking for quorum at fixed multiples of the period? Signatures are submitted asynchronously, they can gather at any height. I'm not even sure why we'd be looking for checkpoints added to the ledger at fixed multiples of the period, since batching could trigger a checkpoint at any height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, there is a max-pending
flag now, I dont think this would be needed anymore. Might be difficult to add tests for this one too.
@cryptoAtwill please add a description and title here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but we can make the output clearer, and our lives easier by using more specific names for variables.
let ending = max_pending as ChainEpoch * period + start; | ||
let mut checkpoints_ahead = 0; | ||
for h in start..=ending { | ||
let c = provider.get_bottom_up_bundle(&subnet, h).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be cleaner to make this return an Option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is handled in previous PR: #743
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you query the whole bundle at every height to see if there is a potential checkpoint, right? Can't this be queried with the events API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is indeed a PR that I changed the query from height to height
to a range query, but somehow that PR is not merged. I can recreate that on a follow up.
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
Co-authored-by: raulk <raul@protocol.ai>
let start = last_checkpointed_height + 1; | ||
let ending = limit_unsubmitted as ChainEpoch * period + start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to explain what's happening.
For example I don't know if last_checkpointed_height
is a) height in the subnet of the last created checkpoint or b) the last submitted checkpoint in the parent.
pub subnet: String, | ||
#[arg( | ||
long, | ||
help = "Limit unsubmitted checkpoints to print (looking forward from last submitted), default: 10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it doesn't necessarily limit the number of checkpoints to print, only the number of checkpoint periods to look ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find it confusing that the title and the docs say that it "lists checkpoint statuses", but I don't see any listing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically just display a bunch of metrics/fields related to bottom up checkpointing.
Closes ENG-766.
Adding a new CLI method that lists the checkpoint status in bottom up checkpoint submission.
This change is