Skip to content

Add a few info-level logs for key headers sync state transitions #5143

@sunchengzhu

Description

@sunchengzhu

After restarting a CKB mainnet node, the node synced to height 13575428 and then stopped progressing. ckb_chain_tip did not increase for nearly 2 hours.

A Grafana screenshot showed ckb_chain_tip stalled near 13575428 for nearly 2 hours.

Image

The relevant entries I could find in run.log were:

2026-03-11 04:23:12.302 +00:00 init_load_unverified_blocks INFO ckb_chain::init_load_unverified  finding unverified blocks, current tip: 13575428-Byte32(0xfa633bf9d6f53150f2269d00d479a617b0ce619f265b99bbc374392ef59720af)
2026-03-11 04:23:50.270 +00:00 init_load_unverified_blocks INFO ckb_chain::init_load_unverified  no unverified blocks found after tip, current tip: 13575428-Byte32(0xfc2ed4bf37eea1aed47879aca7a31fc881f108fa90e218ef990e0fee1e438c52)

From the symptom, it seems possible that the node was stalled in headers sync, or around a state transition after headers sync. However, this is difficult to confirm from the default run.log.

I checked the related code path and found that headers sync is not completely missing logs, but most of the useful ones are currently at debug!/trace!. With the default logger.filter = "info", it is hard to tell:

  • whether headers sync actually started for a peer
  • whether the peer had already reached the end of headers sync
  • whether the node entered retry / timeout / suspend logic

To keep log volume low, it should be enough to add a few info! logs only at key state transitions:

  • when headers sync starts for a peer

    • for example: peer, ibd, local tip, shared best / better tip
  • when HeadersProcess reaches a terminal branch

    • only for headers.is_empty() or headers.len() != MAX_HEADERS_LEN
    • this would help confirm whether the peer has no more headers to provide, and whether it is marked as tip_synced
  • when the headers sync controller enters retry / timeout / suspend

    • this would help distinguish "stalled while syncing headers" from "stalled after headers sync already ended"

I think these few logs would already make this kind of issue much easier to diagnose under the default info log level, without adding too much log volume.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions