-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(state): Send treestate from non-finalized state to finalized state #4721
Conversation
We're seeing some issues with note commitment tree CPU usage, so I've changed the priority to medium. (It's not a complete fix, because we still have to calculate the note commitment trees when checkpointing.) |
This is a high priority because it will save us about 1 second per large non-finalized block. |
Marek and I talked about this ticket, the name is confusing and technically incorrect, so I fixed it. |
I'm not quite sure how this happened, but it looks like this ticket got transformed into a PR? |
Codecov Report
@@ Coverage Diff @@
## main #4721 +/- ##
==========================================
+ Coverage 79.09% 79.31% +0.22%
==========================================
Files 310 310
Lines 38900 38983 +83
==========================================
+ Hits 30768 30920 +152
+ Misses 8132 8063 -69 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, and it's a lot simpler than I thought!
I'm going to admin-merge this PR now, so other developers can get accurate timings, and we can move on with ticket #4823.
I made some optional code cleanup suggestions, feel free to submit them in another PR.
Actually I'm going to wait for CI before I merge this! |
You're right. I tried a new feature in Emacs: I'll open a new ticket that matches this one, and I'll edit the description of this PR. |
Thanks, it's not a big deal, I mainly commented because I thought it might confuse other developers. |
All regular tests pass locally, but I didn't add any new tests. I'm running a full sync now. I'm currently at block 500086. I'd like to wait until tomorrow and see if I can sync Zebra with this PR. I think it might be good to wait until that with the merge as well to make sure we can sync. |
At least we know it's possible. |
I rebased this PR onto #4928, and I'm using the correct treestate now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good!
This also contains most of the work we need to skip rebuilding the treestates on a chain fork, which is our longest state update at the moment. (It's still a low priority, but it will be easy to do later.)
I could sync Zebra with this PR, but it still took ~15 hours, so no speedup. |
I don't expect to see a speedup until we finish checkpointing, because this PR only affects full verification. (We can do some test runs in CI once the critical CI fixes merge, they split out checkpointing and full verification into different GitHub actions jobs.) |
From observing the output log, the last two percent of the sync were still pretty slow. |
That's interesting, I think I'll run a full sync with and without this PR, and see what the timings are. I just need to check it's not slower. It's possible there are multiple things slowing down Zebra. So we might just want to merge this PR and come back to performance later, when it's a priority again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR needs a full sync test before it can merge, and we probably want to wait for other performance fixes to merge first.
Edit: we need to wait for PR #4933 to merge, because it changes the performance of full syncs.
Edit 2: we need to wait for #4926 to merge, it will have performance changes. Maybe #5032 will as well.
@Mergifyio update |
✅ Branch has been successfully updated |
@Mergifyio update |
✅ Branch has been successfully updated |
On Friday, I started a full-sync using beta 14 with the following times:
I'll post sync times for this PR tomorrow, I'm at 97% at the moment. I had to restart the sync after 15 hours, unfortunately. |
This PR only impacts sync times for the non-finalized state, so the most important time will be the "end" job. (The job that comes after the final checkpoint.) |
Here are times for syncing after the last checkpoint, the height of which is 1789180: This PR:
Beta 14:
I consider this a noticeable speed-up. Edit: the times in the original comment were not correct. I realized I included a few blocks for which beta 14 was waiting in a synced state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for this, and thanks for your patience with this PR.
Motivation
When Zebra commits a finalized block, it re-calculates all the note commitment and history trees. This uses a lot of extra CPU. Zebra also fetches these structures from the database for each block coming from the non-finalized state.
The non-finalized state computes the treestate for each block, and it also stores it in RAM. It could pass the treestate over to the finalized state.
Solution
I created a new structure
FinalizedWithTrees
that wraps thetreestate and the finalized block. I did that because the original
FinalizedBlock
isEq
, butHistoryTree
can't beEq
.This PR makes Zebra faster because:
The finalized state doesn't retrieve the treestate from the disk if
the non-finalized state supplies it.
The finalized state doesn't recompute the treestate if the
non-finalized state supplies it.
Close #4824.