-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth/stagedsync: fixes for mining on devnet #8874
Conversation
return fmt.Errorf("attempting to mine %d, which is behind current head: %d", minedHeadNumber, headNumber) | ||
} | ||
} | ||
// if mine { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some discussion around this (more details in the PR description - see Issue 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can safely skip this checks as the block anyways goes through this stage (with mine = false
) as if it's syncing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manav2401 bor heimdall stage is called with mine=true
in the mining stage loop. There are 2 separate loops - syncing and mining. My understanding is that when we are in the mine
mode we need to get the header of the mined block from the mining state using cfg.miningState.MiningBlock.Header
. However now that the mine block stage has been moved after the bor heimdall stage that means the mined header will not be available? We need the header in order to fetch and persist the state sync events I believe. This makes me think that we would need 2 stages - 1 before the mine block stage to fetch and persist spans and 1 after the mine block to fetch the state sync events. What are your thoughts & is my understanding so far correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note I believe currently we are not correctly handling the state sync events for the mine=true
case as the mined header is not correctly passed to the fetchAndWriteBorEvents
function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it more maybe splitting is not necessary since in iteration N the bor heimdall stage will process state sync events for the block created in iteration N-1 since the header for that block will be available
however if we need to persist state sync events then we may need to fix that bug I mentioned with passing the mined header to fetchAndWriteBorEvents - but this is something that 1) I'm not sure if is a requirement for the block producer (do you happen to know?) and 2) can be done in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @taratorio, you're correct that we need header for processing certain things in the bor heimdall stage. I was initially of the view that we'll anyways pass through the same stage when syncing (because when we mine block N, the data like header, bodies, exec results are cached and then it's again sent to all the sync stages where most of them just directly use the cached values and persists the data in DB and perform the remaining operations). That time we do have a header available and that should be sufficient to perform all the validations.
Re. you questions, I don't think we need the whole header looking at the fetchAndWriteBorEvents
function. But, I am a bit confused because I didn't check that part of the code and looks like writing these things (not snapshot, but writing in contracts) should happen via bor consensus and not an external bor heimdall stage. At least, that's how it's done in bor. In this function, we could definitely persist the data (i.e. the state sync events) in snapshot and then bor consensus should be able to utilise it when required. Wdyt?
nextSpanId = snapshotLastSpanId + 1 | ||
} | ||
var endSpanID uint64 | ||
if headNumber > zerothSpanEnd { | ||
endSpanID = 2 + (headNumber-zerothSpanEnd)/spanLength | ||
} else { | ||
endSpanID = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, we can simply set this to 1 so that on the first block itself, it fetches both 0th and 1st span.
@@ -206,12 +202,14 @@ func BorHeimdallForward( | |||
nextSpanId = binary.BigEndian.Uint64(k) + 1 | |||
} | |||
snapshotLastSpanId := cfg.blockReader.(LastFrozen).LastFrozenSpanID() | |||
if snapshotLastSpanId+1 > nextSpanId { | |||
if s.BlockNumber != 0 && snapshotLastSpanId+1 > nextSpanId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because even though we don't have 0th span, the function LastFrozenSpanID
will return 0 which makes this check assume that we have 0th span and need the 1st span. This is not the case when we start to mine from scratch in devnets.
Also, not sure why lint is failing. |
@manav2401 merge devel |
this has been fixed as part of the refactor to split apart bor<->heimdall sync stage responsibilities and bor<->heimdall mining stage responsibilities #9149 - @manav2401 thank you for your help with this |
Due to the recent changes with the snapshots, there were some issues with mining on devnets. This PR attempts to fix that but also needs some more discussion before merging.
Issue 1
With the latest devel, when erigon tries to mine with bor consensus, the
engine.Prepare
call fails as it tries to get the 0th span required for it to proceed. On further investigation, looks like it is reaching our toeth/stagedsync/chain_reader.go:BorSpan
for the span which simply panics. Ideally it should reach out to the chain reader in consensus which has the implementation of loading span from snapshots. I wanted to know why panic was implemented in this function in the first place? For now, I have added the implementation.Issue 2
With this change, it no longer panics but now is unable to find span in snapshot. This is expected because in the mining stages, the
MiningCreateBlock
stage is beforeBorHeimdall
and hence when we start from genesis, we don't have the span loaded. This PR moves theBorHeimdall
stage to 1st stage.Issue 3
On moving this stage to the top, we no longer have access to the header and hence we can't validate it against whitelisted milestones. For now, this PR comments that check but it needs some workaround. A simple suggestion might be to skip this check while mining and only rely on syncing validation as it anyways will be checked when the block is actually processed post mining through the same stage.
Moving this stage to top also requires some more small changes to handle the 0th span which are done in this PR.
Edit: Do not merge, this breaks a few more things, trying to test it out on a devnet.
CC @AskAlexSharov