Skip to content

Chain rewind and failure to sync with err=missing parent after non-graceful restart of nodes #1117

@vdamle

Description

@vdamle

System information

Geth version: Geth/v1.9.7-stable-c6ed3ed2(quorum-v20.10.0)

OS & Version: linux-amd64

Branch, Commit Hash or Release: quorum-v20.10.0

Expected behaviour

On a non-graceful shutdown/restart of a gcmode=full, syncmode=full node, it is expected that the local full block may rewind to a block in the past. However, the node must be able to sync to latest by fetching missing headers/blocks and rebuild any missing state.

Actual behaviour

The node must be able to fetch missing headers/blocks and rebuild state from the previous full block to catch up with the rest of the chain. Instead, we see that the node

WARN [01-19|13:46:48.982] Head state missing, repairing chain      number=2668720 hash=98b961…a8cf87
DEBUG[01-19|13:47:48.961] Ancient blocks frozen already            number=2668720 hash=98b961…a8cf87 frozen=2578720
DEBUG[01-19|13:48:44.456] Privacy metadata root                    hash=000000…000000
INFO [01-19|13:48:44.458] Rewound blockchain to past state         number=2457534 hash=e6aa7c…3019d5
DEBUG[01-19|13:48:44.462] Privacy metadata root                    hash=000000…000000
INFO [01-19|13:48:44.474] Loaded most recent local header          number=2668720 hash=98b961…a8cf87 td=2668721 age=2m13s
INFO [01-19|13:48:44.475] Loaded most recent local full block      number=2457534 hash=e6aa7c…3019d5 td=2457535 age=1w5d14h
INFO [01-19|13:48:44.475] Loaded most recent local fast block      number=2668720 hash=98b961…a8cf87 td=2668721 age=2m13s
DEBUG[01-19|13:48:48.985] Ancient blocks frozen already            number=2457534 hash=e6aa7c…3019d5 frozen=2578720
WARN [01-19|13:49:37.519] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=headers  items=2578720 limit=2457535
WARN [01-19|13:49:38.035] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=hashes   items=2578720 limit=2457535
WARN [01-19|13:49:38.349] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=bodies   items=2578720 limit=2457535
WARN [01-19|13:49:38.588] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=receipts items=2578720 limit=2457535
WARN [01-19|13:49:39.079] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=diffs    items=2578720 limit=2457535
INFO [01-19|13:49:39.497] Rewind ancient data                      number=2457534
WARN [01-19|13:49:39.497] Truncate ancient chain                   from=2668720 to=2457534

INFO [01-19|13:49:49.874] Block synchronisation started
DEBUG[01-19|13:49:49.874] Synchronising with the network           peer=167a43cb1b335504 eth=99 head=98b961…a8cf87 td=2668721 mode=full
INFO [01-19|13:49:49.874] Mining aborted due to sync
DEBUG[01-19|13:49:49.874] Retrieving remote chain height           peer=167a43cb1b335504
DEBUG[01-19|13:49:49.874] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=1 fromhash=98b961…a8cf87 skip=0 reverse=false
DEBUG[01-19|13:49:49.875] Remote head header identified            peer=167a43cb1b335504 number=2668720 hash=98b961…a8cf87
DEBUG[01-19|13:49:49.875] Looking for common ancestor              peer=167a43cb1b335504 local=2457534 remote=2668720
DEBUG[01-19|13:49:49.875] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=12 fromnum=2668543 skip=15 reverse=false
DEBUG[01-19|13:49:49.990] Found common ancestor                    peer=167a43cb1b335504 number=2668719 hash=29d96f…cc4d44
DEBUG[01-19|13:49:49.990] Downloading transaction receipts         origin=2668720
DEBUG[01-19|13:49:49.990] Directing header downloads               peer=167a43cb1b335504 origin=2668720
DEBUG[01-19|13:49:49.990] Downloading block bodies                 origin=2668720
DEBUG[01-19|13:49:49.990] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=128 fromnum=2668911 skip=191 reverse=false
DEBUG[01-19|13:49:49.991] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=192 fromnum=2668720 skip=0   reverse=false
DEBUG[01-19|13:49:49.991] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=192 fromnum=2668721 skip=0   reverse=false
DEBUG[01-19|13:49:49.991] Inserting downloaded chain               items=1       firstnum=2668720 firsthash=98b961…a8cf87 lastnum=2668720 lasthash=98b961…a8cf87
DEBUG[01-19|13:49:49.992] No more headers available                peer=167a43cb1b335504
DEBUG[01-19|13:49:49.992] Header download terminated               peer=167a43cb1b335504
DEBUG[01-19|13:49:49.992] Data fetching completed                  type=bodies
DEBUG[01-19|13:49:49.992] Block body download terminated           err=nil
DEBUG[01-19|13:49:49.992] Data fetching completed                  type=receipts
DEBUG[01-19|13:49:49.992] Transaction receipt download terminated  err=nil
DEBUG[01-19|13:49:49.997] Pruned ancestor, inserting as sidechain  number=2668720 hash=98b961…a8cf87

DEBUG[01-19|13:50:17.872] Downloaded item processing failed on sidechain import index=1 err="missing parent"
DEBUG[01-19|13:50:17.873] Reset ancient limit to zero
DEBUG[01-19|13:50:17.873] Synchronisation terminated               elapsed=27.998s
DEBUG[01-19|13:50:17.873] Reset ancient limit to zero

Another instance:

WARN [01-18|14:11:48.465] Head state missing, repairing chain      number=2658019 hash=a6a383…49463b
DEBUG[01-18|14:17:31.749] Privacy metadata root                    hash=000000…000000
INFO [01-18|14:17:31.751] Rewound blockchain to past state         number=1959935 hash=3fd03e…bfc4e2
DEBUG[01-18|14:17:31.762] Privacy metadata root                    hash=000000…000000

INFO [01-18|14:17:31.779] Loaded most recent local header          number=2658019 hash=a6a383…49463b td=2658020 age=5m49s
INFO [01-18|14:17:31.779] Loaded most recent local full block      number=1959935 hash=3fd03e…bfc4e2 td=1959936 age=1mo1w3d
INFO [01-18|14:17:31.779] Loaded most recent local fast block      number=2658019 hash=a6a383…49463b td=2658020 age=5m49s
DEBUG[01-18|14:17:48.485] Ancient blocks frozen already            number=1959935 hash=3fd03e…bfc4e2 frozen=2568019

WARN [01-18|14:19:40.982] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=diffs    items=2568019 limit=1959936
WARN [01-18|14:19:41.129] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=headers  items=2568019 limit=1959936
WARN [01-18|14:19:42.502] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=hashes   items=2568019 limit=1959936
WARN [01-18|14:19:42.670] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=bodies   items=2568019 limit=1959936
WARN [01-18|14:19:42.890] Truncating freezer table                 database=/qdata/ethereum/geth/chaindata/ancient table=receipts items=2568019 limit=1959936
INFO [01-18|14:19:43.215] Rewind ancient data                      number=1959935
WARN [01-18|14:19:43.215] Truncate ancient chain                   from=2658019 to=1959935

DEBUG[01-18|14:19:53.520] Synchronising with the network           peer=167a43cb1b335504 eth=99 head=0e17e2…2c9686 td=2658078 mode=full
INFO [01-18|14:19:53.520] Mining aborted due to sync
DEBUG[01-18|14:19:53.520] Retrieving remote chain height           peer=167a43cb1b335504
DEBUG[01-18|14:19:53.520] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=1 fromhash=0e17e2…2c9686 skip=0 reverse=false
DEBUG[01-18|14:19:53.520] Discarded propagated block, too far away peer=838f19a15fb6d8ef number=2658078 hash=77acc6…11dcf8 distance=698143
DEBUG[01-18|14:19:53.521] Remote head header identified            peer=167a43cb1b335504 number=2658077 hash=0e17e2…2c9686
DEBUG[01-18|14:19:53.521] Looking for common ancestor              peer=167a43cb1b335504 local=1959935 remote=2658077
DEBUG[01-18|14:19:53.521] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=12 fromnum=2657900 skip=15 reverse=false
DEBUG[01-18|14:19:53.550] Found common ancestor                    peer=167a43cb1b335504 number=2658012 hash=36c183…b224c5
DEBUG[01-18|14:19:53.550] Directing header downloads               peer=167a43cb1b335504 origin=2658013
DEBUG[01-18|14:19:53.550] Downloading block bodies                 origin=2658013
DEBUG[01-18|14:19:53.550] Downloading transaction receipts         origin=2658013
DEBUG[01-18|14:19:53.556] Inserting downloaded chain               items=64      firstnum=2658013 firsthash=a28680…c5e3de lastnum=2658076 lasthash=965844…d416f9
DEBUG[01-18|14:19:54.182] Pruned ancestor, inserting as sidechain  number=2658013 hash=a28680…c5e3de

DEBUG[01-18|14:19:54.216] Injected sidechain block                 number=2658075 hash=76283c…2e7a36 diff=1       elapsed=54.231µs  txs=0 gas=0 uncles=0 root=0bb28c…0becd1
DEBUG[01-18|14:19:54.217] Injected sidechain block                 number=2658076 hash=965844…d416f9 diff=1       elapsed=36.668µs  txs=0 gas=0 uncles=0 root=0bb28c…0becd1
DEBUG[01-18|14:19:56.555] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=192 fromnum=2658077 skip=0   reverse=false
DEBUG[01-18|14:19:59.560] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=192 fromnum=2658077 skip=0   reverse=false
DEBUG[01-18|14:20:02.563] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=192 fromnum=2658077 skip=0   reverse=false
DEBUG[01-18|14:20:05.567] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=192 fromnum=2658077 skip=0   reverse=false
DEBUG[01-18|14:20:08.568] Fetching batch of headers                id=167a43cb1b335504 conn=staticdial count=192 fromnum=2658077 skip=0   reverse=false

DEBUG[01-18|14:20:23.185] Downloaded item processing failed on sidechain import index=64 err="missing parent"
DEBUG[01-18|14:20:23.185] Reset ancient limit to zero
DEBUG[01-18|14:20:23.185] Synchronisation terminated               elapsed=29.665s
DEBUG[01-18|14:20:23.185] Reset ancient limit to zero
WARN [01-18|14:20:23.185] Synchronisation failed, retrying         err="no peers to keep download active"

Steps to reproduce the behaviour

Perform a non-graceful restart of a node with blocks in both Ancients/Freezer and Level DB on Quorum v20.10.0

Backtrace

None

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions