Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(state): Send treestate from non-finalized state to finalized state #4721

Merged
merged 10 commits into from
Sep 6, 2022

Conversation

upbqdn
Copy link
Member

@upbqdn upbqdn commented Jul 25, 2022

Motivation

When Zebra commits a finalized block, it re-calculates all the note commitment and history trees. This uses a lot of extra CPU. Zebra also fetches these structures from the database for each block coming from the non-finalized state.

The non-finalized state computes the treestate for each block, and it also stores it in RAM. It could pass the treestate over to the finalized state.

Solution

I created a new structure FinalizedWithTrees that wraps the
treestate and the finalized block. I did that because the original
FinalizedBlock is Eq, but HistoryTree can't be Eq.

This PR makes Zebra faster because:

  1. The finalized state doesn't retrieve the treestate from the disk if
    the non-finalized state supplies it.

  2. The finalized state doesn't recompute the treestate if the
    non-finalized state supplies it.

Close #4824.

@teor2345 teor2345 added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage I-slow Problems with performance or responsiveness P-Optional ✨ A-state Area: State / database changes labels Jun 29, 2022
@teor2345
Copy link
Contributor Author

We're seeing some issues with note commitment tree CPU usage, so I've changed the priority to medium.

(It's not a complete fix, because we still have to calculate the note commitment trees when checkpointing.)

@upbqdn upbqdn self-assigned this Jul 20, 2022
@upbqdn upbqdn removed the S-needs-triage Status: A bug report needs triage label Jul 20, 2022
@teor2345
Copy link
Contributor Author

teor2345 commented Jul 20, 2022

This is a high priority because it will save us about 1 second per large non-finalized block.

@teor2345 teor2345 changed the title Add note commitment and history trees to FinalizedBlock Send note commitment and history trees from the non-finalized state to the finalized state Jul 22, 2022
@teor2345
Copy link
Contributor Author

Marek and I talked about this ticket, the name is confusing and technically incorrect, so I fixed it.

@upbqdn upbqdn requested a review from a team as a code owner July 25, 2022 21:23
@upbqdn upbqdn requested review from teor2345 and removed request for a team July 25, 2022 21:23
@teor2345
Copy link
Contributor Author

I'm not quite sure how this happened, but it looks like this ticket got transformed into a PR?
(I'm seeing comments from a ticket from last week, and code for the PR that was just opened.)

@codecov
Copy link

codecov bot commented Jul 25, 2022

Codecov Report

Merging #4721 (42b137c) into main (bc7294f) will increase coverage by 0.22%.
The diff coverage is 96.61%.

@@            Coverage Diff             @@
##             main    #4721      +/-   ##
==========================================
+ Coverage   79.09%   79.31%   +0.22%     
==========================================
  Files         310      310              
  Lines       38900    38983      +83     
==========================================
+ Hits        30768    30920     +152     
+ Misses       8132     8063      -69     

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and it's a lot simpler than I thought!

I'm going to admin-merge this PR now, so other developers can get accurate timings, and we can move on with ticket #4823.

I made some optional code cleanup suggestions, feel free to submit them in another PR.

zebra-state/src/service/finalized_state.rs Show resolved Hide resolved
zebra-state/src/service/non_finalized_state.rs Outdated Show resolved Hide resolved
@teor2345
Copy link
Contributor Author

Actually I'm going to wait for CI before I merge this!

@upbqdn
Copy link
Member

upbqdn commented Jul 25, 2022

I'm not quite sure how this happened, but it looks like this ticket got transformed into a PR?
(I'm seeing comments from a ticket from last week, and code for the PR that was just opened.)

You're right. I tried a new feature in Emacs: create pull-request from issue. I thought nothing happened because it seemed like there was no "new" PR. What actually happened was that the issue really got converted into this PR. I won't do that anymore.

I'll open a new ticket that matches this one, and I'll edit the description of this PR.

@teor2345
Copy link
Contributor Author

I tried a new feature in Emacs: create pull-request from issue. I thought nothing happened because it seemed like there was no "new" PR. What actually happened was that the issue really got converted into this PR. I won't do that anymore.

I'll open a new ticket that matches this one, and I'll edit the description of this PR.

Thanks, it's not a big deal, I mainly commented because I thought it might confuse other developers.
(I can't imagine why GitHub has that feature, it looked like a bug to me!)

@upbqdn
Copy link
Member

upbqdn commented Jul 25, 2022

I'm going to admin-merge this PR now, so other developers can get accurate timings, and we can move on with ticket #4823.

Actually I'm going to wait for CI before I merge this!

All regular tests pass locally, but I didn't add any new tests. I'm running a full sync now. I'm currently at block 500086. I'd like to wait until tomorrow and see if I can sync Zebra with this PR. I think it might be good to wait until that with the merge as well to make sure we can sync.

@upbqdn
Copy link
Member

upbqdn commented Jul 25, 2022

I tried a new feature in Emacs: create pull-request from issue. I thought nothing happened because it seemed like there was no "new" PR. What actually happened was that the issue really got converted into this PR. I won't do that anymore.
I'll open a new ticket that matches this one, and I'll edit the description of this PR.

Thanks, it's not a big deal, I mainly commented because I thought it might confuse other developers. (I can't imagine why GitHub has that feature, it looked like a bug to me!)

At least we know it's possible.

@upbqdn
Copy link
Member

upbqdn commented Aug 24, 2022

I rebased this PR onto #4928, and I'm using the correct treestate now.
I resolved all issues, so the PR is ready for another review.
I'm running a fresh sync on mainnet to see how it performs.

@upbqdn upbqdn requested a review from teor2345 August 24, 2022 13:51
@teor2345 teor2345 removed the request for review from a team August 24, 2022 21:48
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

This also contains most of the work we need to skip rebuilding the treestates on a chain fork, which is our longest state update at the moment. (It's still a low priority, but it will be easy to do later.)

@upbqdn
Copy link
Member

upbqdn commented Aug 25, 2022

I could sync Zebra with this PR, but it still took ~15 hours, so no speedup.

@teor2345
Copy link
Contributor Author

I could sync Zebra with this PR, but it still took ~15 hours, so no speedup.

I don't expect to see a speedup until we finish checkpointing, because this PR only affects full verification.
Is the full verification stage faster?

(We can do some test runs in CI once the critical CI fixes merge, they split out checkpointing and full verification into different GitHub actions jobs.)

@upbqdn
Copy link
Member

upbqdn commented Aug 25, 2022

Is the full verification stage faster?

From observing the output log, the last two percent of the sync were still pretty slow.

@teor2345
Copy link
Contributor Author

That's interesting, I think I'll run a full sync with and without this PR, and see what the timings are. I just need to check it's not slower.

It's possible there are multiple things slowing down Zebra. So we might just want to merge this PR and come back to performance later, when it's a priority again.

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR needs a full sync test before it can merge, and we probably want to wait for other performance fixes to merge first.

Edit: we need to wait for PR #4933 to merge, because it changes the performance of full syncs.

Edit 2: we need to wait for #4926 to merge, it will have performance changes. Maybe #5032 will as well.

@teor2345 teor2345 assigned teor2345 and upbqdn and unassigned upbqdn and teor2345 Aug 29, 2022
@teor2345
Copy link
Contributor Author

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Aug 30, 2022

update

✅ Branch has been successfully updated

@upbqdn
Copy link
Member

upbqdn commented Sep 1, 2022

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Sep 1, 2022

update

✅ Branch has been successfully updated

@teor2345
Copy link
Contributor Author

teor2345 commented Sep 2, 2022

Sorry this is taking a while, but we can do the full syncs here once #4926 merges, because it will have performance changes.

Maybe #5032 will as well, but maybe not, so I think we'd be ok either way.

@upbqdn
Copy link
Member

upbqdn commented Sep 4, 2022

On Friday, I started a full-sync using beta 14 with the following times:

  • from genesis to block 1705779 (~95.019% of the block chain): 5 hours and 21 minutes
  • from block 1705779 to block 1795348: 14 hours and 12 minutes.

I'll post sync times for this PR tomorrow, I'm at 97% at the moment. I had to restart the sync after 15 hours, unfortunately.

@teor2345
Copy link
Contributor Author

teor2345 commented Sep 5, 2022

On Friday, I started a full-sync using beta 14 with the following times:

* from genesis to block 1705779 (~95.019% of the block chain): 5 hours and 21 minutes

* from block 1705779 to block 1795348: 14 hours and 12 minutes.

I'll post sync times for this PR tomorrow, I'm at 97% at the moment. I had to restart the sync after 15 hours, unfortunately.

This PR only impacts sync times for the non-finalized state, so the most important time will be the "end" job. (The job that comes after the final checkpoint.)

@upbqdn
Copy link
Member

upbqdn commented Sep 5, 2022

Here are times for syncing after the last checkpoint, the height of which is 1789180:

This PR:

  • from block 1789180 to block 1795401: 1 hour, 20 minutes;
  • the height difference is 6221 blocks.

Beta 14:

  • from block 1789180 to block 1795405: 2 hours, 20 minutes;
  • the height difference is 6225 blocks.

I consider this a noticeable speed-up.

Edit: the times in the original comment were not correct. I realized I included a few blocks for which beta 14 was waiting in a synced state.

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for this, and thanks for your patience with this PR.

mergify bot added a commit that referenced this pull request Sep 6, 2022
@mergify mergify bot merged commit b8712d9 into main Sep 6, 2022
@mergify mergify bot deleted the send-trees-to-from-nonfin-to-fin-state branch September 6, 2022 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-state Area: State / database changes C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Send note commitment and history trees from the non-finalized state to the finalized state
2 participants