Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: add block store #1369

Closed
wants to merge 14 commits into from
Closed

enhancement: add block store #1369

wants to merge 14 commits into from

Conversation

lostman
Copy link
Contributor

@lostman lostman commented Sep 21, 2023

Description

Closes #990.

This PR adds an index_block_data table where we store serialized BlockData structs. We can use these to get a new indexer up to speed without re-fetching blocks from the client.

The table contains a trigger that ensures a full range of blocks—no block_height may be missing.

--enable-block-store and --remove-block-data.

Since block fetching is handled by a single Block Sync task when --enable-block-store is on, when there is an indexer which already indexer some blocks, and the block data is removed through --remove-block-data CLI flag, the indexer will have to wait "a while" until it can fetch any new blocks from the database.

An example log output is presented below.

2023-09-22T10:53:16.658464Z  INFO fuel_indexer::commands::run: 107: Removing stored blocks.
2023-09-22T10:53:16.807424Z  INFO fuel_indexer::commands::run: 110: Succesfully removed 420 blocks.
2023-09-22T10:53:16.818583Z  INFO fuel_indexer::service: 419: Block Sync: starting from Block#1
2023-09-22T10:53:16.858618Z  INFO fuel_indexer::service: 392: Resuming Indexer(fuellabs.explorer) from block 245661
2023-09-22T10:53:17.487850Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 1-20.
2023-09-22T10:53:18.538448Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 21-40.
2023-09-22T10:53:18.923075Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 41-60.
2023-09-22T10:53:19.250824Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 61-80.
2023-09-22T10:53:19.546749Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 81-100.
2023-09-22T10:53:19.836243Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 101-120.
2023-09-22T10:53:20.146883Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 121-140.
2023-09-22T10:53:20.450078Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 141-160.
2023-09-22T10:53:20.780480Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 161-180.
2023-09-22T10:53:21.073758Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 181-200.
2023-09-22T10:53:21.371959Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 201-220.
2023-09-22T10:53:21.726376Z  INFO fuel_indexer::database: 219: Database loading schema for Indexer(fuellabs.explorer) with Version(40801efcb4845b78465ceb83ec94d87c3596b5d109a73ded7ef18315412f3701).
2023-09-22T10:53:21.752420Z  INFO fuel_indexer::service: 205: Registered Indexer(fuellabs.explorer)
2023-09-22T10:53:21.752452Z  INFO fuel_indexer::executor: 95: Indexer(fuellabs.explorer) subscribing to Fuel node at beta-4.fuel.network:80
2023-09-22T10:53:21.752545Z  WARN fuel_indexer::executor: 104: No end_block specified in the manifest. Indexer(fuellabs.explorer) will run forever.
2023-09-22T10:53:21.755446Z  INFO fuel_indexer_lib::utils: 144: Parsed SocketAddr '127.0.0.1:29987' from 'localhost:29987'
2023-09-22T10:53:21.782014Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 221-240.
2023-09-22T10:53:21.796199Z  INFO fuel_indexer::service: 392: Resuming Indexer(fuellabs.explorer) from block 245661
2023-09-22T10:53:21.797012Z  INFO fuel_indexer::executor: 196: Indexer(fuellabs.explorer) has no new blocks to process, sleeping zzZZ. (Empty response #1)
2023-09-22T10:53:22.095679Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 241-260.
2023-09-22T10:53:22.411824Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 261-280.
2023-09-22T10:53:22.742317Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 281-300.
2023-09-22T10:53:22.814914Z  INFO fuel_indexer::executor: 196: Indexer(fuellabs.explorer) has no new blocks to process, sleeping zzZZ. (Empty response #2)
2023-09-22T10:53:23.050226Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 301-320.
2023-09-22T10:53:23.371112Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 321-340.
2023-09-22T10:53:23.689745Z  INFO fuel_indexer::service: 441: Block Sync: retrieved blocks: 341-360.

Checking BlockData format changes

In this PR, I have implemented a serialization check. When starting up, the Block Sync task fetches blocks from the client, serializes them, and compares them against the same blocks already in the database. If they do not match, the blocks in the database are purged and re-synced.

The check is here:

https://github.com/FuelLabs/fuel-indexer/pull/1369/files#diff-c08046bf0074327316a6b1016f142fdcccc38eec1171f6e43dccd338228cb2f1R499-R518

Testing steps

Run Fuel Indexer with block store enabled:

cargo run -p fuel-indexer -- run --fuel-node-host beta-4.fuel.network --fuel-node-port 80 --replace-indexer --run-migrations --manifest examples/fuel-explorer/fuel-explorer/fuel_explorer.manifest.yaml --enable-block-store

Restart and remove data:

cargo run -p fuel-indexer -- run --fuel-node-host beta-4.fuel.network --fuel-node-port 80 --replace-indexer --run-migrations --manifest examples/fuel-explorer/fuel-explorer/fuel_explorer.manifest.yaml --enable-block-store --remove-block-data

Changelog

  • Add --enable-block-store flag, which enables saving blocks in the database.
  • Add --remove-block-data flag, which causes Fuel Indexer to remove all saved blocks from the database.
  • Add Block Sync task. Only the Block Sync task queries the Fuel Client when'- enable-block-store' is on. The individual indexers fetch blocks from the database.

Stats

psql -c "select MAX(block_height) from index_block_data;"''
   max
---------
 3885359
(1 row)
psql -c "SELECT pg_size_pretty(pg_total_relation_size('index_block_data'))";
 pg_size_pretty
----------------
 2703 MB
(1 row)

QA script with --enable-block-store:

system: Darwin 22.1.0 arm64
date: 2023-10-02
host: MBP.Home
branch: maciej/blockstore-v2
runtime: 35 minutes
missing blocks: 0
avg memory: 51.5kB
avg cpu: 0.3%
avg blocks/sec: 148.6
index size: 8.1kB per block

----------------


run: 1
    runtime:        1.4 minutes
    start block:    0
    end block:      25000
    avg memory:     80.6kB
    stdv memory:    51.2kB
    avg cpu:        0.5%
    stdv cpu:       0.3%
    missing blocks: 0
    blocks/sec:     297.6
    index size:     8.206kB per block

run: 2
    runtime:        1.6 minutes
    start block:    385876
    end block:      410876
    avg memory:     105.2kB
    stdv memory:    45.9kB
    avg cpu:        0.6%
    stdv cpu:       0.3%
    missing blocks: 0
    blocks/sec:     268.8
    index size:     8.025kB per block

run: 3
    runtime:        1.9 minutes
    start block:    771752
    end block:      796752
    avg memory:     67.9kB
    stdv memory:    54.4kB
    avg cpu:        0.4%
    stdv cpu:       0.3%
    missing blocks: 0
    blocks/sec:     215.5
    index size:     8.022kB per block

run: 4
    runtime:        3.0 minutes
    start block:    1157628
    end block:      1182628
    avg memory:     43.8kB
    stdv memory:    43.4kB
    avg cpu:        0.3%
    stdv cpu:       0.3%
    missing blocks: 0
    blocks/sec:     138.9
    index size:     7.994kB per block

run: 5
    runtime:        3.7 minutes
    start block:    1543504
    end block:      1568504
    avg memory:     48.7kB
    stdv memory:    45.7kB
    avg cpu:        0.3%
    stdv cpu:       0.3%
    missing blocks: 0
    blocks/sec:     113.6
    index size:     7.997kB per block

run: 6
    runtime:        3.8 minutes
    start block:    1929380
    end block:      1954380
    avg memory:     48.6kB
    stdv memory:    46.2kB
    avg cpu:        0.3%
    stdv cpu:       0.3%
    missing blocks: 0
    blocks/sec:     110.1
    index size:     8.084kB per block

run: 7
    runtime:        4.2 minutes
    start block:    2315256
    end block:      2340256
    avg memory:     36.0kB
    stdv memory:    35.5kB
    avg cpu:        0.2%
    stdv cpu:       0.2%
    missing blocks: 0
    blocks/sec:     98.0
    index size:     8.036kB per block

run: 8
    runtime:        4.8 minutes
    start block:    2701132
    end block:      2726132
    avg memory:     33.1kB
    stdv memory:    36.0kB
    avg cpu:        0.2%
    stdv cpu:       0.2%
    missing blocks: 0
    blocks/sec:     87.7
    index size:     8.057kB per block

run: 9
    runtime:        5.1 minutes
    start block:    3087008
    end block:      3112008
    avg memory:     29.2kB
    stdv memory:    29.9kB
    avg cpu:        0.2%
    stdv cpu:       0.2%
    missing blocks: 0
    blocks/sec:     82.2
    index size:     8.047kB per block

run: 10
    runtime:        5.7 minutes
    start block:    3472884
    end block:      3497884
    avg memory:     21.7kB
    stdv memory:    16.3kB
    avg cpu:        0.1%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     73.3
    index size:     8.276kB per block

QA script without block store:


system: Darwin 22.1.0 arm64
date: 2023-10-03
host: Maciejs-MBP.Home
branch: maciej/blockstore-v2
runtime: 125 minutes
missing blocks: 0
avg memory: 28.9kB
avg cpu: 0.1%
avg blocks/sec: 33.6
index size: 8.0kB per block

----------------


run: 1
    runtime:        9.3 minutes
    start block:    0
    end block:      25000
    avg memory:     30.2kB
    stdv memory:    44.8kB
    avg cpu:        0.1%
    stdv cpu:       0.0%
    missing blocks: 0
    blocks/sec:     44.8
    index size:     7.988kB per block

run: 2
    runtime:        10.7 minutes
    start block:    395337
    end block:      420337
    avg memory:     48.0kB
    stdv memory:    72.0kB
    avg cpu:        0.1%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     38.9
    index size:     7.948kB per block

run: 3
    runtime:        12.7 minutes
    start block:    790674
    end block:      815674
    avg memory:     45.4kB
    stdv memory:    80.0kB
    avg cpu:        0.1%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     32.8
    index size:     7.947kB per block

run: 4
    runtime:        13.9 minutes
    start block:    1186011
    end block:      1211011
    avg memory:     30.2kB
    stdv memory:    52.6kB
    avg cpu:        0.1%
    stdv cpu:       0.0%
    missing blocks: 0
    blocks/sec:     30.0
    index size:     7.927kB per block

run: 5
    runtime:        14.2 minutes
    start block:    1581348
    end block:      1606348
    avg memory:     37.6kB
    stdv memory:    68.6kB
    avg cpu:        0.1%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     29.4
    index size:     7.937kB per block

run: 6
    runtime:        13.0 minutes
    start block:    1976685
    end block:      2001685
    avg memory:     17.8kB
    stdv memory:    11.3kB
    avg cpu:        0.1%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     32.1
    index size:     8.028kB per block

run: 7
    runtime:        12.9 minutes
    start block:    2372022
    end block:      2397022
    avg memory:     17.3kB
    stdv memory:    5.2kB
    avg cpu:        0.0%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     32.3
    index size:     8.032kB per block

run: 8
    runtime:        12.9 minutes
    start block:    2767359
    end block:      2792359
    avg memory:     23.5kB
    stdv memory:    28.0kB
    avg cpu:        0.1%
    stdv cpu:       0.2%
    missing blocks: 0
    blocks/sec:     32.3
    index size:     8.05kB per block

run: 9
    runtime:        13.0 minutes
    start block:    3162696
    end block:      3187696
    avg memory:     17.8kB
    stdv memory:    15.5kB
    avg cpu:        0.1%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     32.1
    index size:     8.04kB per block

run: 10
    runtime:        13.4 minutes
    start block:    3558033
    end block:      3583033
    avg memory:     21.3kB
    stdv memory:    16.5kB
    avg cpu:        0.1%
    stdv cpu:       0.1%
    missing blocks: 0
    blocks/sec:     31.2
    index size:     8.273kB per block

@lostman lostman force-pushed the maciej/blockstore-v2 branch 3 times, most recently from a1fd7e0 to bd10230 Compare September 22, 2023 13:48
@lostman lostman marked this pull request as ready for review September 22, 2023 16:03
Copy link
Contributor

@deekerno deekerno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass, left a few comments.

I have one suggestion and one question:

  • "We need a mechanism for Fuel Indexer to recognize this situation, remove the stale data, and proceed with syncing blocks from scratch."
    -- I'm thinking that since there's no way to do reflection, we can serialize the types of each field of the BlockData struct signature into an arbitrary byte string and store that in a single row of a table. Then, at start-up, we serialize the BlockData type in use and compare it against the persisted byte string. If they don't match, then a re-sync of formerly saved blocks should start.
  • Has any testing been done as to whether latency will added by forcing indexers to wait for blocks by hitting a database on a loop? If so, how significant is the slowdown?

packages/fuel-indexer/src/executor.rs Outdated Show resolved Hide resolved
packages/fuel-indexer/src/executor.rs Outdated Show resolved Hide resolved
packages/fuel-indexer/src/service.rs Outdated Show resolved Hide resolved
packages/fuel-indexer-database/postgres/src/lib.rs Outdated Show resolved Hide resolved
@lostman lostman self-assigned this Sep 26, 2023
Copy link
Contributor

@ra0x3 ra0x3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I see I'm still ping'd for review on this.
  • However given @deekerno's feedback and the conflicts I'll just defer my review :)

@lostman lostman force-pushed the maciej/blockstore-v2 branch 2 times, most recently from e2bb077 to cd75175 Compare September 27, 2023 13:29
@lostman
Copy link
Contributor Author

lostman commented Sep 27, 2023

@deekerno, I've added a function that checks for BlockData serialization changes:

https://github.com/FuelLabs/fuel-indexer/pull/1369/files#diff-c08046bf0074327316a6b1016f142fdcccc38eec1171f6e43dccd338228cb2f1R479-R484

No versioning. Pull some blocks, serialize, and compare. Once, at startup. I couldn't think of any other way to check this automatically. Versioning could work, but it would be manual—we would have to bump the version number. This way feels nicer.

@ra0x3
Copy link
Contributor

ra0x3 commented Sep 28, 2023

@lostman

  • Some questions I have about this that will help it get merged faster -- having hard numbers for these would be helpful:
    • What effect does this have on resource usage? (CPU, memory, etc)
    • How much faster exactly does this index?
    • What effect does this have on database/index size?
    • What is the mitigation plan for if something in BlockData changes?
    • Ideally we shouldn't have to really do anything other than just update BlockData (the literal struct) when anything in BlockData changes - everything else downstream from that should just automagically update itself

@lostman
Copy link
Contributor Author

lostman commented Sep 29, 2023

@ra0x3

What effect does this have on resource usage? (CPU, memory, etc.)

I'll sync some blocks and see if I can run the QA tests with block store enabled. That will give us some numbers.

I'll update the description once I have something.

@lostman lostman requested a review from ra0x3 October 4, 2023 08:54
@lostman
Copy link
Contributor Author

lostman commented Oct 5, 2023

@deekerno

Has any testing been done as to whether latency will added by forcing indexers to wait for blocks by hitting a database on a loop? If so, how significant is the slowdown?

The logic is the same as when the indexers fetch the blocks from the client. The only difference is that they do a SELECT instead of a http call.

Or do you refer to a situation when multiple indexers are fast-forwarded and fetching blocks from the database?

I haven't tested the latency, but I doubt it would be an issue. All the indexers do is a simple SELECT.

@deekerno
Copy link
Contributor

deekerno commented Oct 5, 2023

@deekerno

Has any testing been done as to whether latency will added by forcing indexers to wait for blocks by hitting a database on a loop? If so, how significant is the slowdown?

The logic is the same as when the indexers fetch the blocks from the client. The only difference is that they do a SELECT instead of a http call.

Or do you refer to a situation when multiple indexers are fast-forwarded and fetching blocks from the database?

I haven't tested the latency, but I doubt it would be an issue. All the indexers do is a simple SELECT.

I think my concern was around the following situation:

  • Currently, each individual indexer hits the client on a loop and receives a request with blocks when they're available.
  • This would change it so that a task is retrieving the blocks, adding them to the database, and then the indexers are hitting the database on a loop.

I was wondering if this solution introduced any noticeable increase in latency, specifically at the head of the chain. But honestly, I think my concerns are probably unfounded given the benefits shown by the QA results.

@ra0x3
Copy link
Contributor

ra0x3 commented Oct 6, 2023

@lostman

  • Oh wow the QA run results are quite telling in terms of indexing speed 😅

  • My question is, what's the use-case here. When would I want to use --enable-block-store, when would I want to use --remove-block-data ?

  • I think we should include docs

  • Overall this is merge-able 👌🏼 - but we're not in a rush (just want to be clear about that)

@deekerno deekerno closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: internal block store for backfilling newly-deployed indexers
3 participants