Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/n][dialect_support]Generalize indexer code to support different sql dialects #17167

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

sadhansood
Copy link
Contributor

@sadhansood sadhansood commented Apr 16, 2024

Description

This PR generalizes the code in sui-indexer crate such that different diesel backends can be enabled through feature flags when compiling. The default backend is still postgres which means nothing changes in terms of how we build our binaries today.

  1. The most important thing is that we do all kinds of db operation through macros which are feature gated such that we only pull in relevant db specific dependencies.
  2. In Cargo.toml by default we do not add any db specific dependencies but only pull them through features:
  3. In sui-indexer, all instances of diesel::PgConnection are pulled with postgres-feature and similary all instances of diesel::MySqlConnection are pulled with mysql-feature
  4. run_query and run_query_async are replaced with their corresponding macro invocations
  5. We do not try to make other dependencies of sui-indexer generic in this PR. This includes sui-graphql-rpc which just works with IndexerReader<PgConnection> for now (in the future we will refactor read for other db backends but for now the focus is to get writes to be compatible first)
  6. Usages of #[Cached(..)] had to be replaced with an explicit usage of SizedCache as this macro doesn't seem to work for functions which take type parameters.
  7. Next: we are going to introduce migrations for mysql and build sui-indexer for mysql

Test plan

Running this locally works: cargo run --bin sui-indexer -- --db-url "postgres://sadhansood:sadhansood@localhost/test" --rpc-client-url "<>" --fullnode-sync-worker --reset-db

Copy link

vercel bot commented Apr 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-core ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 23, 2024 8:21pm
3 Ignored Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Apr 23, 2024 8:21pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Apr 23, 2024 8:21pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Apr 23, 2024 8:21pm

.read_only()
.run($query)
.map_err(|e| IndexerError::PostgresReadError(e.to_string()))
($pool:expr, $query:expr, $repeatable_read:expr) => {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we have the repeatable_read bool come before the query:expr? it took me a hot sec looking at where we use this macro before realizing a bool was added

@@ -193,14 +208,8 @@ impl GovernanceReadApi {

/// Cached exchange rates for validators for the given epoch, the cache size is 1, it will be cleared when the epoch changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: thoughts on lifting this up to the struct field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's possible now, will do

crates/sui-indexer/src/db.rs Show resolved Hide resolved
{
pool: ConnectionPool<T>,
package_resolver: Arc<Resolver<PackageStoreWithLruCache<IndexerStorePackageResolver<T>>>>,
package_obj_type_cache: Arc<Mutex<SizedCache<String, Option<ObjectID>>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this mainly intended for CoinMetadata and TreasuryCap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is meant to replace this usage of memoization specifically:

    type = "SizedCache<String, Option<ObjectID>>",
    create = "{ SizedCache::with_size(10000) }",
    convert = r#"{ format!("{}{}", package_id, obj_type) }"#,
    result = true
)]
fn get_single_obj_id_from_package_publish(
    reader: &IndexerReader,
    package_id: ObjectID,
    obj_type: String,```

.downcast_mut::<PoolConnection<diesel::MysqlConnection>>()
.unwrap()
.transaction($query)
.map_err(|e| IndexerError::PostgresReadError(e.to_string()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: MysqlReadError? Or a more generic DbReadError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a more generic error type makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm about to put up a change that shifts this package cache to using GraphQL's Db type (an alias for its PgExecutor type), which should make this change unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#17179 (and #17178)

@gegaowp
Copy link
Contributor

gegaowp commented Apr 19, 2024

I skimmed thru the DB setup & writing side of the change and looks great overall! I have mod.rs and indexer_reader.rs left, will review them later today.
cc @phoenix-o as some changes might overlap with the data ingestion framework change.

@@ -16,8 +16,14 @@ chrono.workspace = true
serde_with.workspace = true
clap.workspace = true
tap.workspace = true
diesel.workspace = true
diesel-derive-enum.workspace = true
diesel = { features = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading the doc of i-implement-a-third-party-backend-and-opt-into-breaking-changes, seems like it's recommended to pin to a minor version?
https://github.com/diesel-rs/diesel/blob/master/guide_drafts/migration_guide.md#removed-most-of-the-non-public-reachable-api

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also curious what functionalities do we use of this feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to remove this from here, and just use the workspace (but I copied it from root level Cargo.toml)

IndexerError::PostgresResetError(db_err_msg)
})?;
}
let (_registry_service, registry) = start_prometheus_server(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's keep prometheus in main.rs as they are not exactly db setup?

use diesel::upsert::excluded;
use diesel::ExpressionMethods;
use diesel::OptionalExtension;
use diesel::{QueryDsl, RunQueryDsl};
use downcast::Any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this gets used in the macro but i'll double check (and remove if not)

@@ -93,24 +96,27 @@ where
metrics,
indexed_checkpoint_sender,
package_buffer: IndexingPackageBuffer::start(package_tx),
_phantom: Default::default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this and how is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we added _phantom: PhantomData<T>, in the struct definition, we need to remember the connection type so later we can return the type in the connection pool. Check below function: fn pg_blocking_cp(&self) -> Result<ConnectionPool<T>, IndexerError> {

Copy link
Contributor

@gegaowp gegaowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great other than the existing comments, thanks for working this out!
super

@sadhansood sadhansood force-pushed the sadhan/mysql_dialect branch from a8cc33b to a83ba36 Compare April 23, 2024 20:20
@sadhansood sadhansood merged commit e305b4c into main Apr 23, 2024
46 of 47 checks passed
@sadhansood sadhansood deleted the sadhan/mysql_dialect branch April 23, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants