Skip to content

sui_v1.26.0_1715423643_ci

@arun-koshy arun-koshy tagged this 11 May 07:26
Issue found via failining simtest

```
RUST_LOG=info,consensus=debug MSIM_TEST_SEED=161715360837 MSIM_TEST_NUM=1 MSIM_WATCHDOG_TIMEOUT_MS=60000 SIM_STRESS_TEST_DURATION_SECS=300   scripts/simtest/cargo-simtest simtest     --color always     --package sui-benchmark     --test-threads 1     --profile simtestnightly test_simulated_load_with_reconfig_and_correlated_crashes
```

Added logging to confirm
```
2022-03-02T00:44:12.689178Z  INFO node{id=5 name="k#99f25ef6.."}: consensus_core::leader_schedule: consensus/core/src/leader_schedule.rs:75: Recovering LeaderSchedule from store using LeaderSwapTable for CommitRange(21..31), good_nodes:[] with stake:0, bad_nodes:Map { iter: [] } with stake:0
2022-03-02T00:44:12.689178Z  INFO node{id=5 name="k#99f25ef6.."}: consensus_core::leader_schedule: consensus/core/src/leader_schedule.rs:80: There are 10 pending subdags to be scored in DagState.
...
2022-03-02T00:44:46.070364Z DEBUG node{id=5 name="k#99f25ef6.."}: consensus_core::core: consensus/core/src/core.rs:488: Sequenced 136 leaders and 0 commits can be made before next leader schedule change
```

Issue: 
If a node crashes on LeaderSchedule commit boundary it will not recover
and update the schedule which will lead to the node getting stuck and
not issuing more commits because there are no more commits that can be
made on the current schedule.

Resolution:
Check if leader schedule needs to be updated before we attempt to commit
in core, this will cover the recovery case as well.

---------

Co-authored-by: Mingwei Tian <mingwei@mystenlabs.com>
Assets 2
Loading