Description
Description
The backward sync (BWS) is getting stuck in a loop with these recurring log messages:
{"@timestamp":"2024-03-05T13:57:59,199","level":"INFO","thread":"EthScheduler-Timer-0","class":"BackwardSyncContext","message":"Current backward sync session failed, it will be restarted","throwable":""}
{"@timestamp":"2024-03-05T13:58:01,114","level":"INFO","thread":"vert.x-worker-thread-0","class":"BackwardSyncContext","message":"Starting a new backward sync session","throwable":""}
Enough peers are present.
Restarting the node fixes the problem.
Reason
We receive an fcu containing the block hash of a head block. This block is added to the hashesToAppend queue. The block get's reorged and when we try to retrieve that block in the BWS from out peers none of them is able to provide it to us. This causes the BWS to fail, and when we receive the next fcu, a new hash might be added to the queue, but a new BWS will be started, trying to retrieve the same block that we have unsuccessfully tried to retrieve before.
This happened on 7 out of 8 nodes I started based on 24.2.0-RC4:
dev-elc-besu-teku-mainnet-dev-stefan-rc4-(1,2,4)
dev-elc-besu-teku-mainnet-dev-stefan-ss-(1,2,3,4)
The block reorged had the hash 0x4550b82492bf1738af79efb6140770c5443d368b9512ae8551583909554a040f.
Node rc4-1 has been restarted and finished syncing successfully.
Activity