Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compatibility] Introduce monitor_synced script #10547

Merged
merged 1 commit into from
Apr 8, 2023

Conversation

williampsmith
Copy link
Contributor

@williampsmith williampsmith commented Apr 7, 2023

Description

Introduces a script that can be run on a fullnode running from an arbitrary checkpoint/epoch (e.g. genesis, or from a snapshot).

Given an end epoch, script monitors local syncing progress of the fullnode towards that epoch, and exits successfully when reached. If stuck not making progress either by checkpoint or epoch, fails.

If no end epoch is provided, will retrieve the current epoch of the network provided, and target that epoch.

The configurable end epoch allows us to use this for checking syncing over a large period in a sharded manner across many machines, or to monitor a single machine syncing from genesis to the current network state, or anywhere in between.

Test Plan

  1. Run fullnode locally
  2. Run script and ensure that it correctly tracks the progress of the fullnode, i.e. fails if stuck, updates if making progress, succeeds once synced
williamsmith in ~/github/sui on monitor-synced-script λ scripts/monitor_synced.py --end-epoch=2 --env=testnet
Will attempt to sync to epoch 2
Current local epoch: 0
Locally highest executed checkpoint: 0
New highest executed checkpoint: 3169
New highest executed checkpoint: 7397
New local epoch: 1
New highest executed checkpoint: 12001
New highest executed checkpoint: 15575
New highest executed checkpoint: 20227
New highest executed checkpoint: 24864
New highest executed checkpoint: 29523
New highest executed checkpoint: 34325
New highest executed checkpoint: 38285
New highest executed checkpoint: 42996
New highest executed checkpoint: 47683
New highest executed checkpoint: 52405
New highest executed checkpoint: 57066
New highest executed checkpoint: 61712
New highest executed checkpoint: 66268
New highest executed checkpoint: 70302
New highest executed checkpoint: 74441
New local epoch: 2
New highest executed checkpoint: 79057
-------------------------------
Successfully synced to epoch 2 from epoch 0 (79057 checkpoints) in 3.01 minutes

If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process.

Type of Change (Check all that apply)

  • user-visible impact
  • breaking change for a client SDKs
  • breaking change for FNs (FN binary must upgrade)
  • breaking change for validators or node operators (must upgrade binaries)
  • breaking change for on-chain data layout
  • necessitate either a data wipe or data migration

Release notes

@vercel
Copy link

vercel bot commented Apr 7, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

4 Ignored Deployments
Name Status Preview Comments Updated (UTC)
explorer ⬜️ Ignored (Inspect) Apr 8, 2023 0:21am
explorer-storybook ⬜️ Ignored (Inspect) Apr 8, 2023 0:21am
sui-wallet-kit ⬜️ Ignored (Inspect) Apr 8, 2023 0:21am
wallet-adapter ⬜️ Ignored (Inspect) Apr 8, 2023 0:21am

@williampsmith williampsmith force-pushed the monitor-synced-script branch from 9d9799a to f96614c Compare April 7, 2023 20:03
@williampsmith williampsmith removed the request for review from dmitri-perelman April 7, 2023 20:12
current_time = datetime.now()
while current_epoch < end_epoch:
# check that we are making progress
time.sleep(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be CHECKPOINT_STUCK_THRESHOLD_SEC ?
The const is defined but never used in code logic.

print(f'Current local epoch: {current_epoch}')
current_time = datetime.now()
else:
# check if we have been stuck for more than 5 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but EPOCH_STUCK_THRESHOLD_SEC = 2 * 60

Copy link
Contributor

@oxade oxade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepting to unblock so we can iterate but it would be great to make many of the nits are addressed, as well as if we can make the params like port etc args into the cmd.

@williampsmith williampsmith force-pushed the monitor-synced-script branch from f96614c to e163fdb Compare April 8, 2023 00:21
@williampsmith williampsmith merged commit 70c4611 into main Apr 8, 2023
@williampsmith williampsmith deleted the monitor-synced-script branch April 8, 2023 00:29
sblackshear pushed a commit to sblackshear/sui that referenced this pull request Apr 10, 2023
## Description 

Introduces a script that can be run on a fullnode running from an
arbitrary checkpoint/epoch (e.g. genesis, or from a snapshot).

Given an end epoch, script monitors local syncing progress of the
fullnode towards that epoch, and exits successfully when reached. If
stuck not making progress either by checkpoint or epoch, fails.

If no end epoch is provided, will retrieve the current epoch of the
network provided, and target that epoch.

The configurable end epoch allows us to use this for checking syncing
over a large period in a sharded manner across many machines, or to
monitor a single machine syncing from genesis to the current network
state, or anywhere in between.

## Test Plan 

1. Run fullnode locally
2. Run script and ensure that it correctly tracks the progress of the
fullnode, i.e. fails if stuck, updates if making progress, succeeds once
synced

```
williamsmith in ~/github/sui on monitor-synced-script λ scripts/monitor_synced.py --end-epoch=2 --env=testnet
Will attempt to sync to epoch 2
Current local epoch: 0
Locally highest executed checkpoint: 0
New highest executed checkpoint: 3169
New highest executed checkpoint: 7397
New local epoch: 1
New highest executed checkpoint: 12001
New highest executed checkpoint: 15575
New highest executed checkpoint: 20227
New highest executed checkpoint: 24864
New highest executed checkpoint: 29523
New highest executed checkpoint: 34325
New highest executed checkpoint: 38285
New highest executed checkpoint: 42996
New highest executed checkpoint: 47683
New highest executed checkpoint: 52405
New highest executed checkpoint: 57066
New highest executed checkpoint: 61712
New highest executed checkpoint: 66268
New highest executed checkpoint: 70302
New highest executed checkpoint: 74441
New local epoch: 2
New highest executed checkpoint: 79057
-------------------------------
Successfully synced to epoch 2 from epoch 0 (79057 checkpoints) in 3.01 minutes
```
---
If your changes are not user-facing and not a breaking change, you can
skip the following section. Otherwise, please indicate what changed, and
then add to the Release Notes section as highlighted during the release
process.

### Type of Change (Check all that apply)

- [ ] user-visible impact
- [ ] breaking change for a client SDKs
- [ ] breaking change for FNs (FN binary must upgrade)
- [ ] breaking change for validators or node operators (must upgrade
binaries)
- [ ] breaking change for on-chain data layout
- [ ] necessitate either a data wipe or data migration

### Release notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants