Description
Describe the bug
Over the last few days we've had issues with Base Archive nodes being unable to get back to head from snapshot. I've read a few threads similar. We're now running the new i7ie aws instances, which are fast enough to get to head, however I noticed one thing in my travels that may be impacting Base's sync and I thought it worthwhile to get an experts opinion.
During my investigation I found that the MerkleExecute stage has a constant config: MERKLE_STAGE_DEFAULT_CLEAN_THRESHOLD = 5000
https://github.com/paradigmxyz/reth/blob/main/crates/stages/stages/src/stages/merkle.rs#L43C11-L43C47
From what I can tell this causes Reth to rebuild the Merkle data if it's syncing more than 5000 blocks from head. In Base, this takes 2-3 hours on aws nvme ssd (MerkleExecute stage_progress=0.07% stage_eta=2h 44m 13s
<- 6000 blocks from head), and on Base, 5000 blocks pass every ~2 hours (compared to 16 hours for ethereum). This causes nodes to get "stuck" just beyond the 5000 block threshold, and if they pass it, MerkleExecute finishes very quickly ( I just watched an i7ie take 13 minutes for 4000 blocks).
I don't know enough about the Merkle data to know for sure if it makes sense to double (or make configurable) that threshold for Base, but one of you will :) Let me know if I've read the situation incorrectly, but I thought it might be useful for others that hit this issue.
Steps to reproduce
Sync a Base archive node from snapshot on an i3en aws instance
Node logs
Platform(s)
No response
Container Type
Kubernetes
What version/commit are you on?
op-reth:v1.2.0
What database version are you on?
Unsure
Which chain / network are you on?
Base Mainnet
What type of node are you running?
Archive (default)
What prune config do you use, if any?
No response
If you've built Reth from source, provide the full command you used
No response
Code of Conduct
- I agree to follow the Code of Conduct
Metadata
Metadata
Labels
Type
Projects
Status