Improve block processing performance during re-org #2805

michaelsproul · 2021-11-12T06:23:26Z

Description

Consider the following re-org that frustrates Lighthouse's attempts to process blocks quickly:

Let n be a slot on an epoch boundary (n % 32 == 0).

Immediately prior to slot n the preemptive state advance occurs as normal
The block from slot n arrives super late (12s+), consuming the advanced state
The block from slot n + 1 arrives on time, but builds upon the parent at slot n -1. It's going to be super slow to process because its parent state is missing from the cache, meaning:
a) We need to load the full state for slot n - 1 from disk (a few hundred ms)
b) We need to transition that state through an epoch boundary (200ms)
c) We need to store the state for slot n on disk. It is different from the slot n slot with block n applied, and presently we store every epoch boundary state

Example

Here's an instance of this behaviour that I observed at slot n=2485472 on mainnet, resulting in block processing taking 2.5s instead of the usual 80ms (median) or 456ms (99th percentile) (metrics from sigp/lighthouse-metrics#31).

Nov 11 16:55:03.815 WARN Beacon chain re-org                     reorg_distance: 1, new_slot: 2485473, new_head: 0xb17f…a572, new_head_parent: 0x0f98…9b22, previous_slot: 2485472, previous_head: 0x1c4d…c94b, service: beacon
Nov 11 16:55:03.818 DEBG Delayed head block                      set_as_head_delay: Some(222.219889ms), imported_delay: Some(2.545278411s), observed_delay: Some(2.051036927s), block_delay: 4.818535227s, slot: 2485473, proposer_index: 52065, block_root: 0xb17fe52ce55315713a9e3eb28858a1a53039daf9e1f6406aa2c8d0d8ae11a572, service: beacon

Even though the block arrived on time, taking 2.5s to process it meant that any attestations at this slot would have missed (if running on this node).

Additional Info

It should be noted that this behaviour should be quite rare, due to the infrequency of re-orgs and late blocks on mainnet (at worst ~4% of blocks are late, with very few being 12s+ late). However if proposer boosting is adopted we may see more re-orgs of this type, where a proposer intentionally orphans the previous block despite it having been published.

The text was updated successfully, but these errors were encountered:

michaelsproul added major-task A significant amount of work or conceptual task. optimization Something to make Lighthouse run more efficiently. A1 labels Nov 12, 2021

michaelsproul mentioned this issue Nov 12, 2021

Persistent copy-on-write beacon states #2806

Open

michaelsproul mentioned this issue Jan 25, 2022

Avoid hashing while loading states from disk #2954

Closed

michaelsproul mentioned this issue May 23, 2022

Upgrade in-memory and on-disk state representation with tree states #3206

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve block processing performance during re-org #2805

Improve block processing performance during re-org #2805

michaelsproul commented Nov 12, 2021

Improve block processing performance during re-org #2805

Improve block processing performance during re-org #2805

Comments

michaelsproul commented Nov 12, 2021

Description

Example

Additional Info