Skip to content

Conversation

meowsbits
Copy link
Contributor

If a gap in the chain is detected, geth will fail to start.

Fatal: Error starting protocol stack: gap (#494) in the chain between ancients and leveldb

In this case, the only solution is to rm -rf chaindata/ and
resync.

This change introduces a feature which rewinds the KV
chain head, purging all data along the way, to the current
height of the Ancient data.

Rel #20238

Signed-off-by: meows b5c6@protonmail.com

If a gap in the chain is detected, geth will fail to start.
In this case, the only solution is to rm -rf chaindata/ and
resync.

This change introduces a feature which rewinds the KV
chain head, purging all data along the way, to the current
height of the Ancient data.

Signed-off-by: meows <b5c6@protonmail.com>
@karalabe
Copy link
Member

karalabe commented Mar 6, 2020

We deliberately didn't nuke the data out. This error should never occur naturally (if it does, that's the bug that needs to be fixed). More likely than not, the user started Geth with ancient/leveldb flags pointing to datadirs from different networks / sync statuses. The deliberate choice to exit and not fix is because the user can always manually nuke if something went horribly wrong, but if the user accidentally linked a mainnet leveldb with a rinkeby ancient folder, we should definitely not nuke 2 completely valid datasets with one go :)

@karalabe
Copy link
Member

karalabe commented Mar 6, 2020

Unless I'm misunderstanding something, in that case, please explain a bit.

@meowsbits
Copy link
Contributor Author

I've seen the Fatal: Error starting protocol stack: gap error twice now, and neither related to chain switching or varying datadir flags; on both I've been using systemd and haven't modified chain configuration or client version.

I figured that the issue I was seeing was pretty much right from:

If Geth crashes, losing it's recent state, it will attempt to repair it's chain by rolling back to a block that has the state available. If however that point is in the freezer, Geth will delete the tail of the freezer without deleting the stuff in leveldb. This is an issue, because there will be a block gap in our dataset.
#20238

Agreed that the gap error should never occur naturally, but I think it does sometimes if geth gets harshly killed at an inopportune time.

@karalabe
Copy link
Member

Superseded by #21409.

@karalabe karalabe closed this Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants