Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak reconstruction batch size #6668

Merged
merged 2 commits into from
Dec 11, 2024

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Dec 9, 2024

Issue Addressed

Address a performance issue reported by @mcdee which boils down to reconstruction taking too long and starving the finalization migration.

Proposed Changes

Jim's node with per-slot diffs spent 110 epochs doing reconstruction before picking up another finalization migration. This is very slow, and causes severe issues for the caches (similar to 110 epochs of non-finality).

To address this I've reduced the maximum number of blocks processed in a single reconstruction batch by a factor of 32, down to 1024. This should mean that a node like Jim's spends around 110/32 = 3.5 epochs doing reconstruction before it does another finalization migration. This is much more manageable for the hot state cache (4 epochs * 32 slots = 128 states, which fits inside the cache with default sizing).

@michaelsproul michaelsproul added ready-for-review The code is ready for review optimization Something to make Lighthouse run more efficiently. database v6.0.1 Bugfix for v6.0.0 labels Dec 9, 2024
michaelsproul added a commit that referenced this pull request Dec 9, 2024
Squashed commit of the following:

commit e5f2be5
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Dec 9 11:00:55 2024 +1100

    Tweak reconstruction batch size
@michaelsproul michaelsproul mentioned this pull request Dec 9, 2024
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me.

Did this also happen to our node running per slot diffs? How did you find out the node spent 110 epochs doing reconstruction, is there a metric?

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Dec 10, 2024
@michaelsproul
Copy link
Member Author

How did you find out the node spent 110 epochs doing reconstruction, is there a metric?

There's no metric, although it might be good to add one. There is a log:

Dec 08 03:35:18.385 DEBG Starting database pruning               new_finalized_epoch: 98244, old_finalized_epoch: 98134, service: beacon

Did this also happen to our node running per slot diffs?

I didn't notice if it did, by the time I went to check it had already run out of space and been re-synced

@michaelsproul
Copy link
Member Author

@mergify queue

Copy link

mergify bot commented Dec 11, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at b2590bc

@mergify mergify bot merged commit b2590bc into sigp:release-v6.0.1 Dec 11, 2024
29 checks passed
@michaelsproul michaelsproul deleted the reconstruction-batch-size branch December 11, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database optimization Something to make Lighthouse run more efficiently. ready-for-merge This PR is ready to merge. v6.0.1 Bugfix for v6.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants