Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: 2771 #2773

Merged
merged 20 commits into from
Jul 21, 2021
Merged

Fix: 2771 #2773

merged 20 commits into from
Jul 21, 2021

Conversation

lgalabru
Copy link
Contributor

@lgalabru lgalabru commented Jul 16, 2021

Address #2771.
The test added consists in producing a canonical chain with 210 blocks. Once done, we invalidate the block 208, and instead of rebuilding directly a longer fork with N blocks (at done in the bitcoind_forking_test) we slowly add some more blocks.
Without the patch, this behavior ends up crashing the node with errors like:

WARN [1626791307.078061] [src/chainstate/coordinator/mod.rs:535] [chains-coordinator] ChainsCoordinator: could not retrieve  block burnhash=40bdbf0dda349642bdf4dd30dd31af4f0c9979ce12a7c17485245d0a6ddd970b
WARN [1626791307.078098] [src/chainstate/coordinator/mod.rs:308] [chains-coordinator] Error processing new burn block: NonContiguousBurnchainBlock(UnknownBlock(40bdbf0dda349642bdf4dd30dd31af4f0c9979ce12a7c17485245d0a6ddd970b))

And the burnchain db end up in the same state we ended up while investing 2771.

With this patch, the node is able to entirely register this new canonical fork, and then able to make some progress.

@lgalabru lgalabru changed the base branch from master to develop July 16, 2021 20:46
@lgalabru lgalabru marked this pull request as ready for review July 20, 2021 03:44
Copy link
Contributor

@gregorycoppola gregorycoppola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @lgalabru !

Looks pretty good assuming that the test passes.. Just had a few requests mostly for comments and readability.

testnet/stacks-node/src/tests/neon_integrations.rs Outdated Show resolved Hide resolved
testnet/stacks-node/src/tests/neon_integrations.rs Outdated Show resolved Hide resolved
src/burnchains/burnchain.rs Outdated Show resolved Hide resolved
src/burnchains/burnchain.rs Outdated Show resolved Hide resolved
src/burnchains/burnchain.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@gregorycoppola gregorycoppola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello... Mostly my comments are small wording suggestions for comments.

But, I have one bigger questions, which is: can we really add this test to automated tests? Is that what you're doing in the Dockerfile change?

Because, I thought this test doesn't actually "fail" on failure.. it just kind of hangs.. is that OK with the GitHub actions framework?

testnet/stacks-node/src/tests/neon_integrations.rs Outdated Show resolved Hide resolved
testnet/stacks-node/src/tests/neon_integrations.rs Outdated Show resolved Hide resolved
testnet/stacks-node/src/tests/neon_integrations.rs Outdated Show resolved Hide resolved
testnet/stacks-node/src/tests/neon_integrations.rs Outdated Show resolved Hide resolved
@lgalabru lgalabru requested a review from gregorycoppola July 20, 2021 19:51
Copy link
Contributor

@gregorycoppola gregorycoppola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

src/burnchains/burnchain.rs Outdated Show resolved Hide resolved
// a reorg happened, and the last header fetched
// is on a smaller fork than the one we just
// invalidated. Wait for more blocks.
while end_block < db_height {
Copy link
Contributor

@pavitthrap pavitthrap Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only concern is the use of a while loop here (in case the new blocks take a while to reach the db_height). It's my impression that elsewhere in the code base we "fail" and try again - for example, by returning burnchain_error::TrySyncAgain.

I'm still approving, but wondering what your thoughts are on this.

Copy link
Contributor Author

@lgalabru lgalabru Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places in the codebase where we block on operations.
This patch is clearly an edge case that we saw on Bitcoin Testnet (way more instable), and will probably never see on Mainnet. The TrySyncAgain was my initial approach, however on the next retry, we would be loosing the information "we just reorgd and waiting for more blocks from the new canonical fork", and would be trying to fetch blocks using the burnchain tip of the previous fork as a point of reference (which is the real issue).

This approach is limiting the patching footprint, I think the clean solution would be introducing a DB schema modification, where we would be updating the BurnchainDB when we reorg, and marking the blocks from former canonical fork as such.
The approach that I've just described is way more invasive and subject to introduce regression, which is something I'd like to avoid for the next few weeks :).

@blockstack-devops
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Nov 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants