Split the block cache into block pointer cache and block data cache #6037

mangas · 2025-05-28T11:55:40Z

Split the block cache into block pointer cache and block data cache

Introduce new block_pointers table to keep hash, number, parent_hash, timestamp
Remove columns number, parent_hash from old block cache table
Cache truncation now removes all the block data but not the pointers.

lutter

Nice! This should enable a much better/logical block caching strategy

lutter · 2025-05-29T23:47:55Z

graph/src/components/store/traits.rs

@@ -579,7 +579,7 @@ pub trait ChainStore: ChainHeadStore {
    async fn block_number(
        &self,
        hash: &BlockHash,
-    ) -> Result<Option<(String, BlockNumber, Option<u64>, Option<BlockHash>)>, StoreError>;
+    ) -> Result<Option<(String, BlockNumber, Option<BlockTime>, Option<BlockHash>)>, StoreError>;


Are all these Option still justified? I think they will all always be Some. It would also be nicer to have a struct for this. Maybe call it BlockPointer since it's one row from that table (and BlockPtr is than a small excerpt from that)

Also, this method should be renamed to block_pointer

There's not always a timestamp, on the shared storage model it still can be None

The option BlockTime is a little weird but I kept it because there is a different between Some(epoch time) and None, it's more idiomatic to have Option than checking BlockTime == BlockTime::NONE or MIN which are also in fact the same value (I didn't really get why).

lutter · 2025-05-29T23:48:56Z

graph/src/components/store/traits.rs

@@ -668,7 +668,7 @@ pub trait QueryStore: Send + Sync {
    async fn block_number_with_timestamp_and_parent_hash(
        &self,
        block_hash: &BlockHash,
-    ) -> Result<Option<(BlockNumber, Option<u64>, Option<BlockHash>)>, StoreError>;
+    ) -> Result<Option<(BlockNumber, Option<BlockTime>, Option<BlockHash>)>, StoreError>;


And this could also just be called block_pointer

lutter · 2025-05-29T23:50:24Z

justfile

@@ -0,0 +1,40 @@
+DATABASE_TEST_VAR_NAME := "THEGRAPH_STORE_POSTGRES_DIESEL_URL"
+DATABASE_URL := "postgresql://graph-node:let-me-in@localhost:5432/graph-node"
+


What's a justfile? This should be your local file, not something in the repo

this is similar to a make file, it's intentionally to be in the repo, provides some shortcuts for common operations, you don't need to use it yourself but it's useful to have for others

https://github.com/casey/just

https://github.com/graphprotocol/indexer-rs/blob/main/justfile

lutter · 2025-05-29T23:51:51Z

justfile

+
+# Requires test-deps to be running, see test-deps-up
+it-test *ARGS:
+    just _run_in_bash cargo test --test integration_tests -- --nocapture {{ ARGS }}


These can be just aliases in ~/.cargo/config.toml. I have e.g.

[alias] store = "test -p graph-store-postgres" tst = "test --workspace --exclude graph-tests" docs = "doc --workspace --document-private-items" gm = "install --bin graphman --path node --locked" gmt = "install --bin graphman --path node --locked --root /var/tmp/cargo" rt = "test -p graph-tests --test runner_tests" it = "test -p graph-tests --test integration_tests -- --nocapture"

and that's local, this works for everyone.

lutter · 2025-05-29T23:57:26Z

store/postgres/src/chain_store.rs

+                INSERT INTO {nsp}.version VALUES ({version}) ON CONFLICT DO NOTHING;
+            ",
+                    nsp = nsp,
+                    version = Storage::CHAINS_SCHEMA_VERSION,


You don't need this version table and mechanism, and in a way it's a denormalization.

You can find out from information_schema.tables whether the block_pointers table exists and decide based on that whether the migration needs to be run. Since everything this migration does happens in one transaction, you can be sure that the changes to the blocks table also happened and don't need to check for that.

I thought about doing it this way but it's entirely possible there's other changes in the future, having a version makes it easy to figure out what is the current version of the schema and implement the different changes sequentially, it's much simpler than trying to figure out each step through pg metadata

The version table is completely unnecessary; if there are more changes in the future, they can also look at the information_schema to determine whether they have been applied or not. Plus, over time, people will forget what these version numbers mean. In any event, it would be good if the comment on this method actually explained what the migration is doing.

The argument was never that it is necessary, it is that is simpler to use and understand (portable too) but whatever, I'll change it to use psql tables...

lutter · 2025-05-30T00:48:04Z

store/test-store/tests/graphql/query.rs

@@ -53,6 +53,7 @@ lazy_static! {
    /// The id of the sole publisher in the test data
    static ref PUB1: IdVal = IdType::Bytes.parse("0xb1");
    /// The chain we actually put into the chain store, blocks 0 to 3
+    // static ref CHAIN: Vec<FakeBlock> = vec![GENESIS_BLOCK.clone(), BLOCK_ONE.clone(), BLOCK_TWO.clone(), BLOCK_THREE.clone()];


Leftover from testing?

lutter · 2025-05-30T00:51:00Z

store/test-store/src/block_store.rs

    pub static ref BLOCK_SIX_NO_PARENT: FakeBlock = FakeBlock::make_no_parent(6, "6b834521bb753c132fdcf0e1034803ed9068e324112f8750ba93580b393a986b");
 }

 // Hash indicating 'no parent'
 pub const NO_PARENT: &str = "0000000000000000000000000000000000000000000000000000000000000000";
 /// The parts of an Ethereum block that are interesting for these tests:
 /// the block number, hash, and the hash of the parent block
-#[derive(Clone, Debug, PartialEq)]
+#[derive(Default, Clone, Debug, PartialEq)]


This doesn't need to be Default (and there's not really a sensible default for a block)

The default here allows you to use { number x, ..Default::default() }, it's really just to make the tests a little less verbose but it turns out I didn't actually use it 😆

lutter · 2025-05-30T00:55:58Z

graph/src/data_source/offchain.rs

@@ -216,7 +216,7 @@ impl DataSource {
            data_source::MappingTrigger::Offchain(trigger.clone()),
            self.mapping.handler.clone(),
            BlockPtr::new(Default::default(), self.creation_block.unwrap_or(0)),
-            BlockTime::NONE,
+            BlockTime::MIN,


Why that change here?

from testing, I'll revert, it's the exact same value, not sure why either

lutter · 2025-05-30T00:59:30Z

graph/src/blockchain/types.rs

+    }
+}
+
+impl FromStr for BlockTime {


This impl is very unintuitive to me, that parsing a string will try to interpret the string as a hex/decimal number.

That's how it was used I just move the implementation somewhere that was easier to find. The previous function was try_parse_timestamp or something similar. If it's the naming I can change it a method?

renamed function

lutter · 2025-05-30T01:00:45Z

graph/src/blockchain/types.rs

-    /// have a timestamp
-    pub const NONE: Self = Self(Timestamp::NONE);
+    // /// A timestamp from a long long time ago used to indicate that we don't
+    // /// have a timestamp


Seems like some extra comment signs snuck in

lutter · 2025-06-02T23:17:52Z

store/postgres/src/chain_store.rs

+            fn make_ddl(nsp: &str) -> String {
+                format!(
+                    "
+                CREATE TABLE IF NOT EXISTS {nsp}.block_pointers (


THere's no need to make this idempotent. You run all this in one transaction, so either it all succeed or none of it succeeds. There's no way that this table gets created but other statements later on do not succeed.

lutter · 2025-06-02T23:18:16Z

store/postgres/src/chain_store.rs

+                INSERT INTO {nsp}.version VALUES ({version}) ON CONFLICT DO NOTHING;
+            ",
+                    nsp = nsp,
+                    version = Storage::CHAINS_SCHEMA_VERSION,


The version table is completely unnecessary; if there are more changes in the future, they can also look at the information_schema to determine whether they have been applied or not. Plus, over time, people will forget what these version numbers mean. In any event, it would be good if the comment on this method actually explained what the migration is doing.

lutter · 2025-06-02T23:24:12Z

store/postgres/src/chain_store.rs

+                format!(
+                    "
+                CREATE TABLE IF NOT EXISTS {nsp}.block_pointers (
+                  hash         BYTEA not null primary key,


Yes, you can't use the number as a pk, I was talking about a synthetic pk, like an auto-incrementing counter. But thinking about this more, what we want in the fullness of time to avoid storing block hashes redundantly is to move the data column to the block_pointers table. Really, the main point of this PR is to add a timestamp column to the blocks table without requiring a rewrite/truncation of that table. The PR is a good first step to that, and we'll address the duplication by figuring out how to get the data into the block_pointers table at some point.

mangas added 3 commits May 28, 2025 12:55

split block cache

727b845

get timestamp from pointers table

044a125

update justfile

7c80bfb

mangas force-pushed the filipe/chain-store-rework2 branch 4 times, most recently from a3d1291 to 4d76568 Compare May 29, 2025 10:39

use blocktime and fix tests

a2acdaa

mangas force-pushed the filipe/chain-store-rework2 branch from 4d76568 to a2acdaa Compare May 29, 2025 10:47

mangas changed the title ~~Filipe/chain store rework2~~ Filipe/chain store rework May 29, 2025

mangas marked this pull request as ready for review May 29, 2025 10:57

mangas changed the title ~~Filipe/chain store rework~~ Split the block cache into block pointer cache and block data cache May 29, 2025

mangas requested a review from lutter May 29, 2025 10:59

lutter requested changes May 30, 2025

View reviewed changes

code review updates

1f7a117

mangas force-pushed the filipe/chain-store-rework2 branch from 1e10c68 to 1f7a117 Compare June 2, 2025 11:51

lutter requested changes Jun 2, 2025

View reviewed changes

use psql information_schema to check for migration requirements

7cb0690

mangas requested a review from lutter June 3, 2025 13:11

		@@ -0,0 +1,40 @@
		DATABASE_TEST_VAR_NAME := "THEGRAPH_STORE_POSTGRES_DIESEL_URL"
		DATABASE_URL := "postgresql://graph-node:let-me-in@localhost:5432/graph-node"

Split the block cache into block pointer cache and block data cache #6037

Are you sure you want to change the base?

Split the block cache into block pointer cache and block data cache #6037

Uh oh!

Conversation

mangas commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lutter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mangas Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mangas commented May 28, 2025 •

edited

Loading

mangas Jun 2, 2025 •

edited

Loading