Fix/2725 #2746

jcnelson · 2021-06-29T00:48:25Z

This PR fixes a lot of issues underpinning #2725. Namely, it fixes #2701, #2728, #2730, and #2738 as well.

The gist of the changes is that we make it so that when the block inventory state-machine determines that the node is in the initial block download, and one of its bootstrap nodes NACK's its request for a PoxInv message or a BlocksInv message at a given reward cycle, it will only refresh its view of the last INV_REWARD_CYCLES reward cycles from its bootstrap peers instead of all prior reward cycles. This situation arises more often than not -- for example, whenever the anchor block is "late" relative to the PoX sync watchdog -- which leads to a significant slow-down in synchronization as the node gets closer to the testnet chain tip (which has thousands of reward cycles). In addition, the changes here make it so that if this happens, the download state machine will begin scanning the local chain state from the sortition height at the start of the highest reward cycle synchronized with the bootstrap peer (instead of the very first sortition). This significantly speeds up block downloads, because like the inventory state machine, the download state machine only processes one reward cycle at a time (and making it start right where it's bound to find new blocks makes synchronization go faster towards the chain tip).

The PR also fixes a few odds and ends, like making it so /v2/info doesn't require any I/O to complete and making it so the inventory state machine doesn't have to load the PoX bitvector on each pass (which is surprisingly slow).

I'm in the process of testing this on a local testnet follower; will report back when it's ready.

…is queried while it is in the process of being invalidated

…on error from the GetBlocksInv handler, reply a NACK instead of erroring out

…nodes

…ase when choosing where to start scanning for new block downloads. Also, use the PeerNetwork::with_http() interface to access the inner HTTP peer when downloading blocks and microblocks.

…t before the reward cycle it reported that the local peer diverged. Also, record this sortition height for the block downloader's consumption.

…dicates that a network error can be resolved by simply re-trying the poll loop.

…nce can be passed to the HTTP handler code. Also, do a better job at propagating hints from the inventory state-machine to the downloader as part of #2730.

…ress #2738 by way of using cached data in the PeerNetwork to construct a /v2/info response without doing any I/O

…ge-podge of fields from it, so it can be passed into the HTTP request handler

…g them

…ock of the reward cycle in which the canonical Stacks chain state starts

…s, so /v2/info acts on the *current* chain view

gregorycoppola · 2021-06-29T17:52:15Z

@jcnelson Are there any tests that we should add as part of the release process in order to test this change?

jcnelson · 2021-06-29T18:01:35Z

Yes -- per the blockchain engineering meeting, it's imperative that we spin up a release candidate from genesis on both mainnet and testnet and verify that it reaches the chain tip.

jcnelson · 2021-06-29T18:02:16Z

@wileyj Want to take this branch for a spin?

src/blockstack_cli.rs

src/net/mod.rs

…ent errors

wileyj · 2021-06-30T21:08:07Z

@jcnelson ready to second spin-up test with the commits above?

jcnelson · 2021-07-01T00:42:27Z

Not yet -- still getting stuck (investigating)

…isn't in the middle of processing blocks

…r that's diverged from us, start at either the reward cycle that contains the highest processed Stacks block, or INV_REWARD_CYCLES fewer reward cycles than the diverged reward cycle -- whichever is lower

… of either the highest processed Stacks block, or the inventory sortition height hint from the inv state machine -- whichever is lower

jcnelson · 2021-07-02T03:38:56Z

Okay, this time it managed to reach the testnet chain tip. Going to spin up from testnet genesis again to verify that it can do so without getting stuck even once.

wileyj · 2021-07-02T13:52:17Z

Okay, this time it managed to reach the testnet chain tip. Going to spin up from testnet genesis again to verify that it can do so without getting stuck even once.

👍 spinning up our node from genesis now

gregorycoppola · 2021-07-02T15:07:23Z

Okay, this time it managed to reach the testnet chain tip. Going to spin up from testnet genesis again to verify that it can do so without getting stuck even once.

Nice work, @jcnelson . Thanks :).

jcnelson · 2021-07-02T16:05:54Z

Okay, my node reached the chain tip on testnet and is in steady state.

jcnelson · 2021-07-06T16:59:55Z

Both my testnet and mainnet nodes sync'ed with this patch, without issue.

kantai

This looks good to me, but the allowed warning should be re-enabled.

jcnelson · 2021-07-06T17:07:18Z

This looks good to me, but the allowed warning should be re-enabled.

As in 2a1e8f5?

gregorycoppola

Hey Jude

I was talking a look at this PR to review.

Even though I lack a lot of context, I was planning to just review based on the testing.

I was surprised to see that no tests were affected, or added.

Is it possible to add any unit tests that would exercise some of the new behavior? Perhaps this could be in a future PR.

Otherwise, can we document in the PR description how this was tested and why we think that is sufficient?

jcnelson · 2021-07-06T17:48:48Z

In terms of lines changed, the bulk of the delta in this PR comes from refactoring. Namely, we now wrap access to the HTTP network state with the new PeerNetwork::with_http() method, and we avoid reloading PoX and burnchain state in the RPC handler and the inventory state machine if we don't need to. The existing test battery already covers both areas of the codebase.

The truly new functionality -- which is simply concerned with making sure the block downloader will start downloading blocks at the right reward cycle -- is really only testable against a live system. You'd need a live Bitcoin node and a Stacks node that not only has all the blocks, but also will occasionally disconnect from bootstrapping nodes and render them in a state where they have no choice but to process reward cycles for which they don't have the anchor block. You'd also need to instrument the Stacks node so that it can be stopped and restarted while it is bootstrapping (clearing its inventory state), in order to verify that the node does not spend an unreasonable amount of time bootstrapping. Implementing such an integration test that faithfully simulates this condition would not only take more work than this PR, but also would be redundant -- this environment is exactly what the live testnet is supposed to provide. The release procedure explicitly calls for booting up a new node from genesis on both testnet and mainnet in order to give PRs like this their due testing.

gregorycoppola

@jcnelson If you think this has had the full test-suite run on it then LGTM.

jcnelson · 2021-07-06T19:33:39Z

The tests are failing is because bitcoin.org is down right now. However, it previously passed with the current patches with the exception of only size_overflow_unconfirmed_stream_microblocks_integration_test, which passed locally for me.

blockstack-devops · 2024-11-21T00:23:00Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

jcnelson added 19 commits June 28, 2021 14:26

fix: silence non-fmt panic warnings

dce90da

fix: remove potentiall runtime panic that can arise when a sortition …

fe5a741

…is queried while it is in the process of being invalidated

fix: disable non-fmt panic warnings

dd32d3c

fix: disable non-fmt panic warnings

e1343d2

refactor: use PeerNetwork::with_http() to expose the underlying HttpPeer

1d9d61d

fix: when loading block headers, if we encounter an InvalidPoxSortiti…

316a13a

…on error from the GetBlocksInv handler, reply a NACK instead of erroring out

feat: query the list of bootstrap nodes distinct from always-allowed …

1022227

…nodes

fix: address #2730 by considering whether or not we are in the IBD ph…

2d97365

…ase when choosing where to start scanning for new block downloads. Also, use the PeerNetwork::with_http() interface to access the inner HTTP peer when downloading blocks and microblocks.

fix: address #2728 by only rescanning a neighbor's inventory from jus…

6203370

…t before the reward cycle it reported that the local peer diverged. Also, record this sortition height for the block downloader's consumption.

fix: address #2719 by introducing a Transient(..) error type which in…

0624e53

…dicates that a network error can be resolved by simply re-trying the poll loop.

refactor: make the HTTP peer an Optional<..> so the PeerNetwork insta…

8a4aed3

…nce can be passed to the HTTP handler code. Also, do a better job at propagating hints from the inventory state-machine to the downloader as part of #2730.

refactor: use new test API for adding peers

9be7dfa

fix: take a PeerNetwork as an argument for handling requests, and add…

2bae344

…ress #2738 by way of using cached data in the PeerNetwork to construct a /v2/info response without doing any I/O

refactor: take a PeerNetwork instance as an argument instead of a hod…

08594fa

…ge-podge of fields from it, so it can be passed into the HTTP request handler

fix: address #2701 by catching Transient(..) errors and simply loggin…

0f26551

…g them

fix: address #2713 by starting processing from the first burnchain bl…

c95bbbf

…ock of the reward cycle in which the canonical Stacks chain state starts

refactor: use latest rust-ism for spin loops

80151e9

fix: fix compiler errors in unit tests

62703d4

fix: more compile-time errors

b16e06f

jcnelson added the release 2.0.11.2.0 label Jun 29, 2021

fix: update the cached burnchain view *before* servicing HTTP request…

473da85

…s, so /v2/info acts on the *current* chain view

jcnelson requested review from kantai, pavitthrap, lgalabru and wileyj June 29, 2021 18:01

kantai reviewed Jun 29, 2021

View reviewed changes

src/blockstack_cli.rs Outdated Show resolved Hide resolved

kantai reviewed Jun 29, 2021

View reviewed changes

src/net/mod.rs Outdated Show resolved Hide resolved

jcnelson added 2 commits June 30, 2021 16:27

fix: always panic on network error, now that the network masks transi…

882a914

…ent errors

fix: re-enable non_fmt_panic lint

2a1e8f5

jcnelson added 4 commits July 1, 2021 20:40

fix: only set the block/microblock start sortition if the downloader …

e0ae8f9

…isn't in the middle of processing blocks

fix: when starting a block inventory scan from a remote bootstrap pee…

82553af

…r that's diverged from us, start at either the reward cycle that contains the highest processed Stacks block, or INV_REWARD_CYCLES fewer reward cycles than the diverged reward cycle -- whichever is lower

fix: when starting a downloader pass, start from the sortition height…

a3aac0d

… of either the highest processed Stacks block, or the inventory sortition height hint from the inv state machine -- whichever is lower

Merge branch 'develop' into fix/2725

c29b692

kantai reviewed Jul 6, 2021

View reviewed changes

Merge branch 'develop' into fix/2725

ebe2a51

kantai approved these changes Jul 6, 2021

View reviewed changes

gregorycoppola reviewed Jul 6, 2021

View reviewed changes

gregorycoppola approved these changes Jul 6, 2021

View reviewed changes

jcnelson merged commit 3e2da2d into develop Jul 6, 2021

gregorycoppola mentioned this pull request Jul 13, 2021

Release 2.0.11.2.0 #2750

Merged

lgalabru mentioned this pull request Jul 20, 2021

Feat/health check endpoint #2768

Closed

4 tasks

gregorycoppola mentioned this pull request Jul 30, 2021

Release 2.0.11.2.0 #2793

Closed

jcnelson mentioned this pull request Feb 7, 2022

Network dispatch doesn't handle PoX invalidations #2701

Closed

blockstack-devops added the locked label Nov 21, 2024

stacks-network locked as resolved and limited conversation to collaborators Nov 21, 2024

wileyj deleted the fix/2725 branch March 11, 2025 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/2725 #2746

Fix/2725 #2746

jcnelson commented Jun 29, 2021

gregorycoppola commented Jun 29, 2021

jcnelson commented Jun 29, 2021

jcnelson commented Jun 29, 2021

wileyj commented Jun 30, 2021

jcnelson commented Jul 1, 2021

jcnelson commented Jul 2, 2021

wileyj commented Jul 2, 2021

gregorycoppola commented Jul 2, 2021

jcnelson commented Jul 2, 2021

jcnelson commented Jul 6, 2021

kantai left a comment

jcnelson commented Jul 6, 2021

gregorycoppola left a comment

jcnelson commented Jul 6, 2021

gregorycoppola left a comment

jcnelson commented Jul 6, 2021

blockstack-devops commented Nov 21, 2024

Fix/2725 #2746

Fix/2725 #2746

Conversation

jcnelson commented Jun 29, 2021

gregorycoppola commented Jun 29, 2021

jcnelson commented Jun 29, 2021

jcnelson commented Jun 29, 2021

wileyj commented Jun 30, 2021

jcnelson commented Jul 1, 2021

jcnelson commented Jul 2, 2021

wileyj commented Jul 2, 2021

gregorycoppola commented Jul 2, 2021

jcnelson commented Jul 2, 2021

jcnelson commented Jul 6, 2021

kantai left a comment

Choose a reason for hiding this comment

jcnelson commented Jul 6, 2021

gregorycoppola left a comment

Choose a reason for hiding this comment

jcnelson commented Jul 6, 2021

gregorycoppola left a comment

Choose a reason for hiding this comment

jcnelson commented Jul 6, 2021

blockstack-devops commented Nov 21, 2024