Expensive fork-choice queued attestation mutation #6206

dapplion · 2024-07-30T09:02:53Z

Description

Debugging a Holesky node we noted that the split_off here was moving ~1 GB of data every time we process an attestation

lighthouse/consensus/fork_choice/src/fork_choice.rs

Line 258 in c52c598

let remaining = queued_attestations.split_off(

Questions:

Why did the queued attestations Vec grow so large?
Why is it being mutated every call?
Do we have to use a Vec for this?

Tackling 2 and 3, we could switch to a HashMap<Slot, Vec<QueuedAttestation>> as time strictly advances forward, so each slot Vec is strictly append-only.

WIP of this approach: https://github.com/sigp/lighthouse/compare/stable...dapplion:lighthouse:fork-choice-queued-attestations?expand=1

However we should answer 1 too as there may be some other lingering issue

The text was updated successfully, but these errors were encountered:

michaelsproul · 2024-07-30T13:21:31Z

Nice detective work @jimmygchen !

jimmygchen · 2024-07-31T06:04:49Z

I've done some further investigation and noticed an invalid attestation (bad data.slot) in fork choice, causing this strange behaviour. However I think it's not possible for this to happen as the validation here prevents this invalid attestation to be queued at all.

I think it might be worth running a memtest on the machine.

jimmygchen · 2024-07-31T12:19:35Z

Further investigation shows that this cannot happen (see the link above) and @michaelsproul found that the bad attestation's slot actually has 1 bit flipped compared to its correct slot, so it could very well be a memory / disk issue.

I couldn't get anything from memtests on that machine. I think we could deploy v5.3.0 to more testnet nodes and see if it happens elsewhere (including slasher), if it doesn't we could probably conclude this bug to be likely a hardware issue, and move this issue out of v5.3.0.

michaelsproul · 2024-07-31T12:57:48Z

If this happens more often with nodes that have been running slashers, maybe it is some UB in the slasher code, potentially the C database code!

We could try running a slasher node under valgrind

jimmygchen · 2024-08-01T06:53:46Z

This undefined behaviour might have surfaced since the recent refactor in #4529, which revealed a bug in the LMDB bindings. See #6211 for a workaround fix.

Under normal circumstances the queued attestations vec should be quite small as there are validations to prevent attestations for future slots to be queued. It would be useful to add a metric for this though.

dapplion · 2024-08-01T15:53:55Z

@jimmygchen this PR adds a metric for the queued attestations vec:

Add metrics inside fork-choice crate #6205

Closing this issue as won't fix

michaelsproul · 2024-08-14T01:44:11Z

For reference, the metric to check for this issue is beacon_fork_choice_process_attestation_seconds_*. Processing attestations in fork choice should be very quick (<5ms per attestation), but if this issue is occurring then that metric will blow way up (we've seen it >100ms per attestation).

cc @chong-he

michaelsproul added bug Something isn't working v5.3.0 Q3 2024 release with database changes! optimization Something to make Lighthouse run more efficiently. labels Jul 30, 2024

michaelsproul mentioned this issue Aug 1, 2024

Work around UB in LMDB bindings #6211

Merged

dapplion closed this as not planned Won't fix, can't repro, duplicate, stale Aug 1, 2024

michaelsproul mentioned this issue Aug 5, 2024

Broadcast VC requests in parallel and fix subscription error #6223

Merged

This was referenced Aug 19, 2024

Binaries compiled with MDBX exhibit memory corruption #6277

Open

v5.2.0 built from source runs out of memory #5970

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expensive fork-choice queued attestation mutation #6206

Expensive fork-choice queued attestation mutation #6206

dapplion commented Jul 30, 2024

michaelsproul commented Jul 30, 2024

jimmygchen commented Jul 31, 2024

jimmygchen commented Jul 31, 2024 •

edited

Loading

michaelsproul commented Jul 31, 2024

jimmygchen commented Aug 1, 2024

dapplion commented Aug 1, 2024

michaelsproul commented Aug 14, 2024

Expensive fork-choice queued attestation mutation #6206

Expensive fork-choice queued attestation mutation #6206

Comments

dapplion commented Jul 30, 2024

Description

michaelsproul commented Jul 30, 2024

jimmygchen commented Jul 31, 2024

jimmygchen commented Jul 31, 2024 • edited Loading

michaelsproul commented Jul 31, 2024

jimmygchen commented Aug 1, 2024

dapplion commented Aug 1, 2024

michaelsproul commented Aug 14, 2024

jimmygchen commented Jul 31, 2024 •

edited

Loading