Investigate all uses of canonical_header_hash #251

telackey · 2023-06-16T02:43:41Z

It is very easy to shoot yourself in the foot when using canonical_header_hash(), such that it gets executed in a nested loop thousands or millions of times.

We need to make sure that all current uses are efficient, and if they are not, either fix them, or make a serious adjustment such that we no longer need canonical_header_hash() at all, for example by marking canonical/non-canonical blocks rather than by executing a fn.

The latter would be ideal long term, because we don't really want to rely on complicated DB functions, since they are very difficult to observe or debug, but how urgent that change is would depend at least in part on determining how much of an issue it is at this time.

i-norden · 2023-06-16T19:49:20Z

I wonder how feasible it would be to modify the existing postgres queries so that they only apply their canonical_header_hash(block_number) condition if e.g. block_number >= MAX(block_number) - 64 , and what the performance overhead would then be like for queries that span both regions and span/target only the region below MAX-64.

i-norden · 2023-06-16T19:51:53Z

This way we could continue to use it to weigh canonicity in the region of the chain that is still subject to reorgs (2 epochs, each with 32 slots-although they don't necessarily have to populated so not a perfect map to 64 execution layer blocks), and below that cutoff we have a mechanism for marking/cleaning out non-canon data.

i-norden · 2023-06-16T20:31:00Z

If we are okay with no longer persisting non-canon blocks for analytical purposes, we might be able to simply add a UC to eth.header_cids.block_number and let statediffing geth handle reorgs at the source by UPSERTings the old blocks away.

In this context, we would no longer use canonical_header_hash when querying the DB as whatever the single current state of the DB is (including within the 64 block "frothy" region) would be considered canonical (and would be consistent with what geth considers canonical).

The statediffing service simply subscribes to a stream of latest blocks from geth's core.BlockChain. This subscription includes blocks from reorgs, so the service can overwrite (or tag/shuffle somewhere else; if more desirable than simply UPSERTing) non-canon blocks. With a UC on eth.header_cids.block_number this would occur passively, I don't think we'd need to change anything other than some minor modifications to the current INSERT statements to add the upsert logic. But if we wanted to we could also do something more sophisticated in order to continue to persist the non-canon blocks for analytical purposes in some form/place.

In this later case where we don't rely on a DB constraint we need additional logic in the statediffing service to track when a block coming off the feed is a reorg block (the feed doesn't signal this itself). Since reorgs can't go below 2 epochs from head, I think one way to do this would be to maintain a map in memory of the latest ~64 (block_hash, block_number) tuples statediffed into the database and if we get a header off the subscription channel that has the same number but different hash of one of these processed blocks we know it is a reorg/replacement.

This all relies on the assumption that geth is handling reorgs as expected.

i-norden · 2023-06-16T20:48:50Z

Somewhat related: https://github.com/vulcanize/ops/issues/46#issuecomment-1474229179 we haven't seen any reorgs post-merge that are longer than 1 block.

i-norden · 2023-06-17T01:26:15Z

Since we can't have FKs in place in timescaleDB, so cannot cascade the UPSERT to remove all linked rows, relying on the unique constraint doesn't work. It would leave non-canonical records in the other tables.

dboreham · 2023-06-17T02:59:29Z

My gut thought is to not upsert, but instead retain the "dup" rows, but have a field to denote the most favored row.
Regular queries can sort on that field, limit 1 (gives you the good block if there is one, but still gives you a young block if it's too early to be canonical). Queries that are interested in the chaff blocks can not do that and get them all.
I can't think of a performance impact by not having the UC, and modifying data previously inserted gives me a queasy feeling from a consistency perspective (e.g. what if we crash in the middle of fixing up the old data, so now we need to be careful to have a transaction with the correct lifetime).

telackey · 2023-10-04T17:48:31Z

Fixed.

telackey self-assigned this Jun 16, 2023

i-norden mentioned this issue Jun 26, 2023

getLogs is inefficient for range queries #249

Closed

This was referenced Jul 17, 2023

Add canonical column to eth.header_cids cerc-io/ipld-eth-db#136

Merged

398: Statediff missing parent blocks automatically. cerc-io/go-ethereum#399

Merged

telackey closed this as completed Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate all uses of canonical_header_hash #251

Investigate all uses of canonical_header_hash #251

telackey commented Jun 16, 2023 •

edited

Loading

i-norden commented Jun 16, 2023 •

edited

Loading

i-norden commented Jun 16, 2023

i-norden commented Jun 16, 2023 •

edited

Loading

i-norden commented Jun 16, 2023

i-norden commented Jun 17, 2023

dboreham commented Jun 17, 2023

telackey commented Oct 4, 2023

Investigate all uses of canonical_header_hash #251

Investigate all uses of canonical_header_hash #251

Comments

telackey commented Jun 16, 2023 • edited Loading

i-norden commented Jun 16, 2023 • edited Loading

i-norden commented Jun 16, 2023

i-norden commented Jun 16, 2023 • edited Loading

i-norden commented Jun 16, 2023

i-norden commented Jun 17, 2023

dboreham commented Jun 17, 2023

telackey commented Oct 4, 2023

telackey commented Jun 16, 2023 •

edited

Loading

i-norden commented Jun 16, 2023 •

edited

Loading

i-norden commented Jun 16, 2023 •

edited

Loading