Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(state): Fix minute-long delays in block verification after a chain fork #6122

Merged
merged 16 commits into from
Feb 13, 2023

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Feb 9, 2023

Motivation

Zebra can take multiple minutes to rebuild note commitment trees after a chain fork in the non-finalized chain. This also delays mining template updates and verifying mined blocks.

It only impacts mainnet at the moment, it is caused by a large number of shielded transactions in blocks.

Closes #4794.

Specifications

This is a data structure and processing refactor. The consensus rule checks don't change, but we need to correctly store data so that we implement the consensus rules correctly.

Complex Code or Requirements

This PR deletes a lot of complex code.

It temporarily stores the finalized tip trees and anchors in the tree indexes at the finalized tip height. These temporary entries are deleted when the first root block in the chain is finalized.

Solution

  • Replace sprout, sapling, orchard, and history tree fields with lookup methods
  • Remove unused tree rebuild code
  • Replace a custom clone function with derive(Clone)
  • Remove unused arguments

Related changes:

  • Refactor tree and anchor addition and removal functions
  • Print sprout and sapling tree Nodes as hex when debugging
  • Show full debug info when tests fail because chains aren't equal

Testing

We already have good test coverage for this code. I am also running a partial and full sync locally on both mainnet and testnet.

Review

There is an initial refactor to get the height indexes working in 98dfa87.
Then this PR repeats similar changes for sprout, sapling, and orchard. The best example is probably commit 3b0abd4.
The history tree change is slightly different, it's commit 09e5148.

This is a blocker for mining pools on mainnet, so it is a routine bug fix.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

Follow Up Work

We could do further refactors, but this seemed like enough for now.

@teor2345 teor2345 added C-bug Category: This is a bug A-consensus Area: Consensus rule updates P-Medium ⚡ C-security Category: Security issues A-rpc Area: Remote Procedure Call interfaces A-state Area: State / database changes labels Feb 9, 2023
@teor2345 teor2345 requested a review from a team as a code owner February 9, 2023 05:01
@teor2345 teor2345 self-assigned this Feb 9, 2023
@teor2345 teor2345 requested a review from a team as a code owner February 9, 2023 05:01
@teor2345 teor2345 requested review from natalieesk and oxarbitrage and removed request for a team February 9, 2023 05:01
@teor2345 teor2345 removed the request for review from natalieesk February 9, 2023 05:02
@codecov
Copy link

codecov bot commented Feb 9, 2023

Codecov Report

Merging #6122 (0e2335e) into main (4f28929) will decrease coverage by 0.12%.
The diff coverage is 84.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6122      +/-   ##
==========================================
- Coverage   78.14%   78.03%   -0.12%     
==========================================
  Files         304      304              
  Lines       39087    39148      +61     
==========================================
+ Hits        30546    30549       +3     
- Misses       8541     8599      +58     

Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I really like the simplified fork()/parent_chain() methods.

zebra-chain/src/sapling/tree.rs Show resolved Hide resolved
zebra-state/src/service/non_finalized_state/chain.rs Outdated Show resolved Hide resolved
zebra-state/src/service/non_finalized_state.rs Outdated Show resolved Hide resolved
zebra-state/src/service/non_finalized_state/chain.rs Outdated Show resolved Hide resolved
zebra-state/src/service/non_finalized_state/chain.rs Outdated Show resolved Hide resolved
@teor2345 teor2345 force-pushed the remove-duplicate-chain-fields branch from 05607ac to 0e2335e Compare February 9, 2023 23:42
Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

(I left a do-not-merge label in case you'd like a second reviewer for the consensus rules)

@arya2 arya2 added the do-not-merge Tells Mergify not to merge this PR label Feb 9, 2023
@teor2345
Copy link
Contributor Author

teor2345 commented Feb 9, 2023

I am also running a partial and full sync locally on both mainnet and testnet.

The partial syncs on both networks seem normal, as does the full sync on testnet. I am still waiting for the full sync on mainnet to fully checkpoint, this is also normal with the latency to where I am.

Before the transaction spam we were getting about 0.3% of blocks with a chain fork on mainnet. So if there was anything seriously wrong with this PR, I'd expect my mainnet partial sync would have failed by now.

@teor2345
Copy link
Contributor Author

(I left a do-not-merge label in case you'd like a second reviewer for the consensus rules)

I think @upbqdn or @conradoplg changed this code last, can one of you review this PR?

@mpguerra mpguerra requested a review from upbqdn February 10, 2023 09:05
@upbqdn
Copy link
Member

upbqdn commented Feb 10, 2023

I'm reviewing this PR.

Copy link
Member

@upbqdn upbqdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks great. It was rewarding to see the gradual simplification in each commit.

@arya2 arya2 removed the do-not-merge Tells Mergify not to merge this PR label Feb 13, 2023
mergify bot added a commit that referenced this pull request Feb 13, 2023
@mergify mergify bot merged commit 9452487 into main Feb 13, 2023
@mergify mergify bot deleted the remove-duplicate-chain-fields branch February 13, 2023 21:44
mpguerra added a commit that referenced this pull request May 19, 2023
mergify bot pushed a commit that referenced this pull request May 23, 2023
* ZIPs were updated to remove ambiguity, this was tracked in #1267.

* #2105 was fixed by #3039 and #2379 was closed by #3069

* #2230 was a duplicate of #2231 which was closed by #2511

* #3235 was obsoleted by #2156 which was fixed by #3505

* #1850 was fixed by #2944, #1851 was fixed by #2961 and #2902 was fixed by #2969

* We migrated to Rust 2021 edition in Jan 2022 with #3332

* #1631 was closed as not needed

* #338 was fixed by #3040 and #1162 was fixed by #3067

* #2079 was fixed by #2445

* #4794 was fixed by #6122

* #1678 stopped being an issue

* #3151 was fixed by #3934

* #3204 was closed as not needed

* #1213 was fixed by #4586

* #1774 was closed as not needed

* #4633 was closed as not needed

* Clarify behaviour of difficulty spacing

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour when retrying block downloads

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify when we might want to fix

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify what we might want to change in future

Co-authored-by: teor <teor@riseup.net>

* Clarify benefits of how we do block verification

Co-authored-by: teor <teor@riseup.net>

* Fix rustfmt errors

---------

Co-authored-by: teor <teor@riseup.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-consensus Area: Consensus rule updates A-rpc Area: Remote Procedure Call interfaces A-state Area: State / database changes C-bug Category: This is a bug C-security Category: Security issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revert note commitment and history trees when forking non-finalized chains
3 participants