-
Notifications
You must be signed in to change notification settings - Fork 1k
Split the block cache into block pointer cache and block data cache #6037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a3d1291
to
4d76568
Compare
4d76568
to
a2acdaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This should enable a much better/logical block caching strategy
@@ -579,7 +579,7 @@ pub trait ChainStore: ChainHeadStore { | |||
async fn block_number( | |||
&self, | |||
hash: &BlockHash, | |||
) -> Result<Option<(String, BlockNumber, Option<u64>, Option<BlockHash>)>, StoreError>; | |||
) -> Result<Option<(String, BlockNumber, Option<BlockTime>, Option<BlockHash>)>, StoreError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all these Option
still justified? I think they will all always be Some
. It would also be nicer to have a struct for this. Maybe call it BlockPointer
since it's one row from that table (and BlockPtr
is than a small excerpt from that)
Also, this method should be renamed to block_pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not always a timestamp, on the shared storage model it still can be None
The option BlockTime is a little weird but I kept it because there is a different between Some(epoch time) and None, it's more idiomatic to have Option than checking BlockTime == BlockTime::NONE or MIN which are also in fact the same value (I didn't really get why).
@@ -668,7 +668,7 @@ pub trait QueryStore: Send + Sync { | |||
async fn block_number_with_timestamp_and_parent_hash( | |||
&self, | |||
block_hash: &BlockHash, | |||
) -> Result<Option<(BlockNumber, Option<u64>, Option<BlockHash>)>, StoreError>; | |||
) -> Result<Option<(BlockNumber, Option<BlockTime>, Option<BlockHash>)>, StoreError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this could also just be called block_pointer
@@ -0,0 +1,40 @@ | |||
DATABASE_TEST_VAR_NAME := "THEGRAPH_STORE_POSTGRES_DIESEL_URL" | |||
DATABASE_URL := "postgresql://graph-node:let-me-in@localhost:5432/graph-node" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a justfile? This should be your local file, not something in the repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is similar to a make file, it's intentionally to be in the repo, provides some shortcuts for common operations, you don't need to use it yourself but it's useful to have for others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
# Requires test-deps to be running, see test-deps-up | ||
it-test *ARGS: | ||
just _run_in_bash cargo test --test integration_tests -- --nocapture {{ ARGS }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be just aliases in ~/.cargo/config.toml
. I have e.g.
[alias]
store = "test -p graph-store-postgres"
tst = "test --workspace --exclude graph-tests"
docs = "doc --workspace --document-private-items"
gm = "install --bin graphman --path node --locked"
gmt = "install --bin graphman --path node --locked --root /var/tmp/cargo"
rt = "test -p graph-tests --test runner_tests"
it = "test -p graph-tests --test integration_tests -- --nocapture"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that's local, this works for everyone.
store/postgres/src/chain_store.rs
Outdated
INSERT INTO {nsp}.version VALUES ({version}) ON CONFLICT DO NOTHING; | ||
", | ||
nsp = nsp, | ||
version = Storage::CHAINS_SCHEMA_VERSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this version table and mechanism, and in a way it's a denormalization.
You can find out from information_schema.tables
whether the block_pointers
table exists and decide based on that whether the migration needs to be run. Since everything this migration does happens in one transaction, you can be sure that the changes to the blocks
table also happened and don't need to check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about doing it this way but it's entirely possible there's other changes in the future, having a version makes it easy to figure out what is the current version of the schema and implement the different changes sequentially, it's much simpler than trying to figure out each step through pg metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version
table is completely unnecessary; if there are more changes in the future, they can also look at the information_schema
to determine whether they have been applied or not. Plus, over time, people will forget what these version numbers mean. In any event, it would be good if the comment on this method actually explained what the migration is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument was never that it is necessary, it is that is simpler to use and understand (portable too) but whatever, I'll change it to use psql tables...
@@ -53,6 +53,7 @@ lazy_static! { | |||
/// The id of the sole publisher in the test data | |||
static ref PUB1: IdVal = IdType::Bytes.parse("0xb1"); | |||
/// The chain we actually put into the chain store, blocks 0 to 3 | |||
// static ref CHAIN: Vec<FakeBlock> = vec![GENESIS_BLOCK.clone(), BLOCK_ONE.clone(), BLOCK_TWO.clone(), BLOCK_THREE.clone()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from testing?
store/test-store/src/block_store.rs
Outdated
pub static ref BLOCK_SIX_NO_PARENT: FakeBlock = FakeBlock::make_no_parent(6, "6b834521bb753c132fdcf0e1034803ed9068e324112f8750ba93580b393a986b"); | ||
} | ||
|
||
// Hash indicating 'no parent' | ||
pub const NO_PARENT: &str = "0000000000000000000000000000000000000000000000000000000000000000"; | ||
/// The parts of an Ethereum block that are interesting for these tests: | ||
/// the block number, hash, and the hash of the parent block | ||
#[derive(Clone, Debug, PartialEq)] | ||
#[derive(Default, Clone, Debug, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be Default
(and there's not really a sensible default for a block)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default here allows you to use { number x, ..Default::default() }, it's really just to make the tests a little less verbose but it turns out I didn't actually use it 😆
graph/src/data_source/offchain.rs
Outdated
@@ -216,7 +216,7 @@ impl DataSource { | |||
data_source::MappingTrigger::Offchain(trigger.clone()), | |||
self.mapping.handler.clone(), | |||
BlockPtr::new(Default::default(), self.creation_block.unwrap_or(0)), | |||
BlockTime::NONE, | |||
BlockTime::MIN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why that change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from testing, I'll revert, it's the exact same value, not sure why either
graph/src/blockchain/types.rs
Outdated
} | ||
} | ||
|
||
impl FromStr for BlockTime { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This impl is very unintuitive to me, that parsing a string will try to interpret the string as a hex/decimal number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how it was used I just move the implementation somewhere that was easier to find. The previous function was try_parse_timestamp
or something similar. If it's the naming I can change it a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed function
graph/src/blockchain/types.rs
Outdated
/// have a timestamp | ||
pub const NONE: Self = Self(Timestamp::NONE); | ||
// /// A timestamp from a long long time ago used to indicate that we don't | ||
// /// have a timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like some extra comment signs snuck in
1e10c68
to
1f7a117
Compare
store/postgres/src/chain_store.rs
Outdated
fn make_ddl(nsp: &str) -> String { | ||
format!( | ||
" | ||
CREATE TABLE IF NOT EXISTS {nsp}.block_pointers ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THere's no need to make this idempotent. You run all this in one transaction, so either it all succeed or none of it succeeds. There's no way that this table gets created but other statements later on do not succeed.
store/postgres/src/chain_store.rs
Outdated
INSERT INTO {nsp}.version VALUES ({version}) ON CONFLICT DO NOTHING; | ||
", | ||
nsp = nsp, | ||
version = Storage::CHAINS_SCHEMA_VERSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version
table is completely unnecessary; if there are more changes in the future, they can also look at the information_schema
to determine whether they have been applied or not. Plus, over time, people will forget what these version numbers mean. In any event, it would be good if the comment on this method actually explained what the migration is doing.
format!( | ||
" | ||
CREATE TABLE IF NOT EXISTS {nsp}.block_pointers ( | ||
hash BYTEA not null primary key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can't use the number as a pk, I was talking about a synthetic pk, like an auto-incrementing counter. But thinking about this more, what we want in the fullness of time to avoid storing block hashes redundantly is to move the data
column to the block_pointers
table. Really, the main point of this PR is to add a timestamp
column to the blocks
table without requiring a rewrite/truncation of that table. The PR is a good first step to that, and we'll address the duplication by figuring out how to get the data
into the block_pointers
table at some point.
Split the block cache into block pointer cache and block data cache