Skip to content

sui_v1.14.0_1698863746_ci

@mwtian mwtian tagged this 01 Nov 18:00
## Description 

Currently, certificates fetched during catchup are processed and
accepted one-by-one, same as normal certificate handling. This turns out
to be a throughput bottleneck for catching up. This PR includes one
major and a few minor changes to improve the catchup throughput.

Main change: take the lock on `state` once per batch of certificates in
`process_certificate_with_lock()`, instead of once per certificate.
`tokio::sync::Mutex` seems to be really slow as the lock on `state`,
taking 5 ~ 10s for 2000 lock operations.

Other changes:
- Increase channel size in Narwhal across the board from 1k to 10k, to
avoid filling the channels too often. I did not see very noticeable
memory usage increase.
- Avoid using blocking thread for verifying user signatures in
transactions, which should be relatively fast.
- Cleanups.

## Test Plan 

### Private testnet

In catchup experiments with 5000 TPS and 150 validators, this seems to
improve catchup speed from ~2/round to ~8/round.

Before:
Catching up after 1 hr of downtime never finished within the epoch:


![image](https://github.com/MystenLabs/sui/assets/81660174/bcdbd727-12f7-46dc-944a-b13f68cfbf10)


![image](https://github.com/MystenLabs/sui/assets/81660174/1933e6a4-e51b-4fcf-8df6-506cba67d6cd)

After:
Catching up after 1 hr of downtime took ~20 min:


![image](https://github.com/MystenLabs/sui/assets/81660174/7dd85846-5074-49a3-b85e-fa6213aebd30)


![image](https://github.com/MystenLabs/sui/assets/81660174/89725256-fb8e-4817-aed5-609d241c57d0)

---
If your changes are not user-facing and not a breaking change, you can
skip the following section. Otherwise, please indicate what changed, and
then add to the Release Notes section as highlighted during the release
process.

### Type of Change (Check all that apply)

- [ ] protocol change
- [ ] user-visible impact
- [ ] breaking change for a client SDKs
- [ ] breaking change for FNs (FN binary must upgrade)
- [ ] breaking change for validators or node operators (must upgrade
binaries)
- [ ] breaking change for on-chain data layout
- [ ] necessitate either a data wipe or data migration

### Release notes
Assets 2
Loading