-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: 2771 #2773
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, @lgalabru !
Looks pretty good assuming that the test passes.. Just had a few requests mostly for comments and readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello... Mostly my comments are small wording suggestions for comments.
But, I have one bigger questions, which is: can we really add this test to automated tests? Is that what you're doing in the Dockerfile change?
Because, I thought this test doesn't actually "fail" on failure.. it just kind of hangs.. is that OK with the GitHub actions framework?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix!
// a reorg happened, and the last header fetched | ||
// is on a smaller fork than the one we just | ||
// invalidated. Wait for more blocks. | ||
while end_block < db_height { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my only concern is the use of a while loop here (in case the new blocks take a while to reach the db_height
). It's my impression that elsewhere in the code base we "fail" and try again - for example, by returning burnchain_error::TrySyncAgain
.
I'm still approving, but wondering what your thoughts are on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places in the codebase where we block on operations.
This patch is clearly an edge case that we saw on Bitcoin Testnet (way more instable), and will probably never see on Mainnet. The TrySyncAgain was my initial approach, however on the next retry, we would be loosing the information "we just reorgd and waiting for more blocks from the new canonical fork", and would be trying to fetch blocks using the burnchain tip of the previous fork as a point of reference (which is the real issue).
This approach is limiting the patching footprint, I think the clean solution would be introducing a DB schema modification, where we would be updating the BurnchainDB when we reorg, and marking the blocks from former canonical fork as such.
The approach that I've just described is way more invasive and subject to introduce regression, which is something I'd like to avoid for the next few weeks :).
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Address #2771.
The test added consists in producing a canonical chain with 210 blocks. Once done, we invalidate the block 208, and instead of rebuilding directly a longer fork with N blocks (at done in the
bitcoind_forking_test
) we slowly add some more blocks.Without the patch, this behavior ends up crashing the node with errors like:
And the burnchain db end up in the same state we ended up while investing 2771.
With this patch, the node is able to entirely register this new canonical fork, and then able to make some progress.