Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More historic state cache optimisations #6485

Open
michaelsproul opened this issue Oct 14, 2024 · 0 comments
Open

More historic state cache optimisations #6485

michaelsproul opened this issue Oct 14, 2024 · 0 comments
Labels
blocked optimization Something to make Lighthouse run more efficiently. tree-states Upcoming state and database overhaul

Comments

@michaelsproul
Copy link
Member

Description

With the move to tree-states, we've ensured that performance doesn't regress for sequential state loads using this PR:

However there are still a few sub-optimal things that we could work on over time to get even better performance.

What's still slow?

Profiling and logging show that building the caches on beacon states is slow. In particular the committee caches (via shuffle_list) and the pubkey caches are the slowest to build. Total time to build all caches in my measurements is around 1.8s.

Problem 1: unnecessary cache rebuilding

In #6475 the time to build BeaconState caches is amortised slightly, by building them at slot 1 in the epoch rather than slot 0. Slot 0 in the epoch usually takes more time to construct because it involves loading a snapshot or applying diffs, and the state reached has no caches built due to the diff process. Constructing the state at the next slot can be done by replaying 1 block on top of the slot 0 state, but this requires most of the caches to be built (for state processing). Our current approach will lazily initialise the beacon state caches if/when the slot 1 state is requested. This means the load time for states in an epoch goes something like: [2s (diff application), 2s (cache build), 0.5s (everything already cached), 0.5s, 0.5s, ...].

The problem with this approach arises when the caller for the slot 0 state requires some or all of the caches. In this case, the caller will build the caches on the state cloned from the historic state cache, but will not store the updated state with these caches back into the historic state cache. This sub-optimality was left in #6475 in order to keep the code relatively simple, and because this usually only represents a one-off waste of ~2s of cache building.

Problem 2: unnecessary pubkey cache builds

The building of the pubkey cache is pretty much unnecessary, and could be avoided by reusing existing caches. See:

Implementation strategies

Mutable caches on beacon states (problem 1)

One way to solve problem 1 would be to lazily initialise the caches on a BeaconState using interior mutability. In this paradigm building a cache at slot 0 from caller code would build it for all copies of the state, including the one in the historic state cache. This would mean that when loading the state at slot 1, caches would be already built.

The complexity of this approach is that it requires putting the caches inside the beacon state behind something like Arc<RwLock<..>> or ArcSwap<..>. This could be quite an invasive change, especially proportional to the benefit.

Per-route optimisation (problem 1)

Another more bespoke way of handling problem 1 would be to optimise each caller (HTTP route, realistically) to be smarter about its cache management. For example: the block rewards API could build just the caches that it needs, and then update the historic state cache with these built caches. The disadvantage of this approach is that it needs to be done for every route, and it leads to states in the cache with all sorts of combinations of their caches built.

Blocked on

I think we should block this on:

@michaelsproul michaelsproul added blocked optimization Something to make Lighthouse run more efficiently. tree-states Upcoming state and database overhaul labels Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked optimization Something to make Lighthouse run more efficiently. tree-states Upcoming state and database overhaul
Projects
None yet
Development

No branches or pull requests

1 participant