After restarting a CKB mainnet node, the node synced to height 13575428 and then stopped progressing. ckb_chain_tip did not increase for nearly 2 hours.
A Grafana screenshot showed ckb_chain_tip stalled near 13575428 for nearly 2 hours.
The relevant entries I could find in run.log were:
2026-03-11 04:23:12.302 +00:00 init_load_unverified_blocks INFO ckb_chain::init_load_unverified finding unverified blocks, current tip: 13575428-Byte32(0xfa633bf9d6f53150f2269d00d479a617b0ce619f265b99bbc374392ef59720af)
2026-03-11 04:23:50.270 +00:00 init_load_unverified_blocks INFO ckb_chain::init_load_unverified no unverified blocks found after tip, current tip: 13575428-Byte32(0xfc2ed4bf37eea1aed47879aca7a31fc881f108fa90e218ef990e0fee1e438c52)
From the symptom, it seems possible that the node was stalled in headers sync, or around a state transition after headers sync. However, this is difficult to confirm from the default run.log.
I checked the related code path and found that headers sync is not completely missing logs, but most of the useful ones are currently at debug!/trace!. With the default logger.filter = "info", it is hard to tell:
- whether headers sync actually started for a peer
- whether the peer had already reached the end of headers sync
- whether the node entered retry / timeout / suspend logic
To keep log volume low, it should be enough to add a few info! logs only at key state transitions:
-
when headers sync starts for a peer
- for example:
peer, ibd, local tip, shared best / better tip
-
when HeadersProcess reaches a terminal branch
- only for
headers.is_empty() or headers.len() != MAX_HEADERS_LEN
- this would help confirm whether the peer has no more headers to provide, and whether it is marked as
tip_synced
-
when the headers sync controller enters retry / timeout / suspend
- this would help distinguish "stalled while syncing headers" from "stalled after headers sync already ended"
I think these few logs would already make this kind of issue much easier to diagnose under the default info log level, without adding too much log volume.
After restarting a CKB mainnet node, the node synced to height
13575428and then stopped progressing.ckb_chain_tipdid not increase for nearly 2 hours.A Grafana screenshot showed
ckb_chain_tipstalled near13575428for nearly 2 hours.The relevant entries I could find in
run.logwere:From the symptom, it seems possible that the node was stalled in headers sync, or around a state transition after headers sync. However, this is difficult to confirm from the default
run.log.I checked the related code path and found that headers sync is not completely missing logs, but most of the useful ones are currently at
debug!/trace!. With the defaultlogger.filter = "info", it is hard to tell:To keep log volume low, it should be enough to add a few
info!logs only at key state transitions:when headers sync starts for a peer
peer,ibd, local tip, shared best / better tipwhen
HeadersProcessreaches a terminal branchheaders.is_empty()orheaders.len() != MAX_HEADERS_LENtip_syncedwhen the headers sync controller enters retry / timeout / suspend
I think these few logs would already make this kind of issue much easier to diagnose under the default
infolog level, without adding too much log volume.