Skip to content

Unexpected trie node error occurs after initial snap sync #28587

Closed
@rjl493456442

Description

System information

Geth version: geth version: 1.13.5

Issue description

Ref: original ticket #27983 (comment)

Nov 22 12:33:19 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:19.850] Initialized transaction indexer          limit=2,350,000
Nov 22 12:33:19 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:19.850] Loaded local transaction journal         transactions=0 dropped=0
Nov 22 12:33:19 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:19.851] Regenerated local transaction journal    transactions=0 accounts=0
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: WARN [11-22|12:33:20.370] Switch sync mode from snap sync to full sync reason="snap sync complete"
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:20.370] Chain post-merge, sync via beacon client
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:20.370] Gasprice oracle is ignoring threshold set threshold=2
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: ERROR[11-22|12:33:20.389] Unexpected trie node in disk             owner=5cc0a4..667982 path="[12 5 9 3 7]" expect=8b09b1..e87152 got=99f9a0..b9f78f
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: ERROR[11-22|12:33:20.389] State snapshotter failed to iterate trie err="missing trie node 8b09b17b3a4e17de5274c52cc6387cf42c1fb25fd97effda757bb9a2cde87152 (owner 5cc0a47442e6bc69eb1ec9e2ff1fe0c9657c26dfa5836f560fd7141038667982) (path 0c05090307) unexpected node, loc: disk, node: (5cc0a47442e6bc69eb1ec9e2ff1fe0c9657c26dfa5836f560fd7141038667982 [12 5 9 3 7]), 8b09b17b3a4e17de5274c52cc6387cf42c1fb25fd97effda757bb9a2cde87152!=99f9a0c9f954cd0d8cf5bb7df9c2b5e529a1652fcc97824ee446ba9300b9f78f, blob: 0xf87180a0df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080"

The node is reported as invalid, with

  • owner: 5cc0a47442e6bc69eb1ec9e2ff1fe0c9657c26dfa5836f560fd7141038667982,
  • address: 0x32400084C286CF3E17e7B677ea9583e60a000324
  • path:[12 5 9 3 7]
  • content: 0xf87180a0df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080
  • exphash: 8b09b17b3a4e17de5274c52cc6387cf42c1fb25fd97effda757bb9a2cde87152
  • gothash 99f9a0c9f954cd0d8cf5bb7df9c2b5e529a1652fcc97824ee446ba9300b9f78f

After retrieving the correct node from our benchmark machine, I rlpdump them

correct node

(base) ➜  ~ rlpdump -hex 0xf8518080808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080
[
  "",
  "",
  "",
  "",
  "",
  "",
  "",
  "",
  b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae,
  "",
  "",
  "",
  2ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e7,
  "",
  "",
  "",
  "",
]

corrupted node

(base) ➜  ~ rlpdump -hex 0xf87180a0df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080
[
  "",
  df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9,
  "",
  "",
  "",
  "",
  "",
  "",
  b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae,
  "",
  "",
  "",
  2ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e7,
  "",
  "",
  "",
  "",
]

The corrupted node has one more child at the index 1.


Also, I dumped out the parent nodes of this one, they are all full nodes with no shortNode in the middle of path, so it's not relevant with the shortNode trick at all.

This storage is quite huge, with 1.8m slots inside.


I analyzed the contract, there are two functions can mutate the states:

  • finalizeEthWithdrawal (0x6c0960f9): example from etherscan
  • requestL2Transaction (0xeb672419): example from etherscan

Both of them only create new storage slot, but never delete storage slot.


There are a few possibilities here for this situation:

  • The state sync target is forked, the transaction which creates the trie node at index 1 is reorged out and never get accepted
    I don't think it's the case here. Geth uses head-64 as the sync target, which is very very hard to be reorged in the proof-of-stake network.

  • programatic problems??


The log is attached.

Here it is -- I had a few mis-starts in the log but each time I purged the DB

sync-log.zip

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions