Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate all uses of canonical_header_hash #251

Closed
telackey opened this issue Jun 16, 2023 · 7 comments
Closed

Investigate all uses of canonical_header_hash #251

telackey opened this issue Jun 16, 2023 · 7 comments
Assignees

Comments

@telackey
Copy link
Contributor

telackey commented Jun 16, 2023

It is very easy to shoot yourself in the foot when using canonical_header_hash(), such that it gets executed in a nested loop thousands or millions of times.

We need to make sure that all current uses are efficient, and if they are not, either fix them, or make a serious adjustment such that we no longer need canonical_header_hash() at all, for example by marking canonical/non-canonical blocks rather than by executing a fn.

The latter would be ideal long term, because we don't really want to rely on complicated DB functions, since they are very difficult to observe or debug, but how urgent that change is would depend at least in part on determining how much of an issue it is at this time.

@telackey telackey self-assigned this Jun 16, 2023
@i-norden
Copy link
Collaborator

i-norden commented Jun 16, 2023

I wonder how feasible it would be to modify the existing postgres queries so that they only apply their canonical_header_hash(block_number) condition if e.g. block_number >= MAX(block_number) - 64 , and what the performance overhead would then be like for queries that span both regions and span/target only the region below MAX-64.

@i-norden
Copy link
Collaborator

This way we could continue to use it to weigh canonicity in the region of the chain that is still subject to reorgs (2 epochs, each with 32 slots-although they don't necessarily have to populated so not a perfect map to 64 execution layer blocks), and below that cutoff we have a mechanism for marking/cleaning out non-canon data.

@i-norden
Copy link
Collaborator

i-norden commented Jun 16, 2023

If we are okay with no longer persisting non-canon blocks for analytical purposes, we might be able to simply add a UC to eth.header_cids.block_number and let statediffing geth handle reorgs at the source by UPSERTings the old blocks away.

In this context, we would no longer use canonical_header_hash when querying the DB as whatever the single current state of the DB is (including within the 64 block "frothy" region) would be considered canonical (and would be consistent with what geth considers canonical).

The statediffing service simply subscribes to a stream of latest blocks from geth's core.BlockChain. This subscription includes blocks from reorgs, so the service can overwrite (or tag/shuffle somewhere else; if more desirable than simply UPSERTing) non-canon blocks. With a UC on eth.header_cids.block_number this would occur passively, I don't think we'd need to change anything other than some minor modifications to the current INSERT statements to add the upsert logic. But if we wanted to we could also do something more sophisticated in order to continue to persist the non-canon blocks for analytical purposes in some form/place.

In this later case where we don't rely on a DB constraint we need additional logic in the statediffing service to track when a block coming off the feed is a reorg block (the feed doesn't signal this itself). Since reorgs can't go below 2 epochs from head, I think one way to do this would be to maintain a map in memory of the latest ~64 (block_hash, block_number) tuples statediffed into the database and if we get a header off the subscription channel that has the same number but different hash of one of these processed blocks we know it is a reorg/replacement.

This all relies on the assumption that geth is handling reorgs as expected.

@i-norden
Copy link
Collaborator

Somewhat related: https://github.com/vulcanize/ops/issues/46#issuecomment-1474229179 we haven't seen any reorgs post-merge that are longer than 1 block.

@i-norden
Copy link
Collaborator

Since we can't have FKs in place in timescaleDB, so cannot cascade the UPSERT to remove all linked rows, relying on the unique constraint doesn't work. It would leave non-canonical records in the other tables.

@dboreham
Copy link

My gut thought is to not upsert, but instead retain the "dup" rows, but have a field to denote the most favored row.
Regular queries can sort on that field, limit 1 (gives you the good block if there is one, but still gives you a young block if it's too early to be canonical). Queries that are interested in the chaff blocks can not do that and get them all.
I can't think of a performance impact by not having the UC, and modifying data previously inserted gives me a queasy feeling from a consistency perspective (e.g. what if we crash in the middle of fixing up the old data, so now we need to be careful to have a transaction with the correct lifetime).

@telackey
Copy link
Contributor Author

telackey commented Oct 4, 2023

Fixed.

@telackey telackey closed this as completed Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants