-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Compatibility] Introduce monitor_synced script #10547
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ |
9d9799a
to
f96614c
Compare
current_time = datetime.now() | ||
while current_epoch < end_epoch: | ||
# check that we are making progress | ||
time.sleep(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be CHECKPOINT_STUCK_THRESHOLD_SEC
?
The const is defined but never used in code logic.
print(f'Current local epoch: {current_epoch}') | ||
current_time = datetime.now() | ||
else: | ||
# check if we have been stuck for more than 5 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but EPOCH_STUCK_THRESHOLD_SEC = 2 * 60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepting to unblock so we can iterate but it would be great to make many of the nits are addressed, as well as if we can make the params like port etc args into the cmd.
f96614c
to
e163fdb
Compare
## Description Introduces a script that can be run on a fullnode running from an arbitrary checkpoint/epoch (e.g. genesis, or from a snapshot). Given an end epoch, script monitors local syncing progress of the fullnode towards that epoch, and exits successfully when reached. If stuck not making progress either by checkpoint or epoch, fails. If no end epoch is provided, will retrieve the current epoch of the network provided, and target that epoch. The configurable end epoch allows us to use this for checking syncing over a large period in a sharded manner across many machines, or to monitor a single machine syncing from genesis to the current network state, or anywhere in between. ## Test Plan 1. Run fullnode locally 2. Run script and ensure that it correctly tracks the progress of the fullnode, i.e. fails if stuck, updates if making progress, succeeds once synced ``` williamsmith in ~/github/sui on monitor-synced-script λ scripts/monitor_synced.py --end-epoch=2 --env=testnet Will attempt to sync to epoch 2 Current local epoch: 0 Locally highest executed checkpoint: 0 New highest executed checkpoint: 3169 New highest executed checkpoint: 7397 New local epoch: 1 New highest executed checkpoint: 12001 New highest executed checkpoint: 15575 New highest executed checkpoint: 20227 New highest executed checkpoint: 24864 New highest executed checkpoint: 29523 New highest executed checkpoint: 34325 New highest executed checkpoint: 38285 New highest executed checkpoint: 42996 New highest executed checkpoint: 47683 New highest executed checkpoint: 52405 New highest executed checkpoint: 57066 New highest executed checkpoint: 61712 New highest executed checkpoint: 66268 New highest executed checkpoint: 70302 New highest executed checkpoint: 74441 New local epoch: 2 New highest executed checkpoint: 79057 ------------------------------- Successfully synced to epoch 2 from epoch 0 (79057 checkpoints) in 3.01 minutes ``` --- If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process. ### Type of Change (Check all that apply) - [ ] user-visible impact - [ ] breaking change for a client SDKs - [ ] breaking change for FNs (FN binary must upgrade) - [ ] breaking change for validators or node operators (must upgrade binaries) - [ ] breaking change for on-chain data layout - [ ] necessitate either a data wipe or data migration ### Release notes
Description
Introduces a script that can be run on a fullnode running from an arbitrary checkpoint/epoch (e.g. genesis, or from a snapshot).
Given an end epoch, script monitors local syncing progress of the fullnode towards that epoch, and exits successfully when reached. If stuck not making progress either by checkpoint or epoch, fails.
If no end epoch is provided, will retrieve the current epoch of the network provided, and target that epoch.
The configurable end epoch allows us to use this for checking syncing over a large period in a sharded manner across many machines, or to monitor a single machine syncing from genesis to the current network state, or anywhere in between.
Test Plan
If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.
Type of Change (Check all that apply)
Release notes