[ENH]: Optimize GetCollections and remove usage of raw gorm #5274

tanujnay112 · 2025-08-14T04:28:49Z

Description of changes

An earlier change attempted to use a CTE to optimize the GetCollections query in the event where the databases pkey is fully specified. This led to using raw gorm which led to a few unexpected binding issues when converting from gorm sql to raw sql, leading us to revert that change here. In this change, we achieve the same goal of optimizing GetCollections but avoid the use of CTE's entirely. We get the required database_id from databases within a predicate subquery. If the predicate subquery has a LIMIT 1 in it, Postgres is known to treat this as a CTE. Since gorm plays much more nicely with subqueries than CTE's this is a much better fix.

Improvements & Bug fixes
- ...
New functionality
- ...

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

tanujnay112 · 2025-08-14T04:29:04Z

[ENH]: Optimize GetCollections and remove usage of raw gorm #5274 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-08-14T04:29:09Z

propel-code-bot · 2025-08-14T04:33:34Z

Optimize GetCollections: Remove Raw GORM, Use Subqueries for Efficient Querying

This PR refactors the GetCollections logic within the collection DAO to optimize query performance and improve maintainability. The main enhancement involves replacing the previous Common Table Expression (CTE) and raw GORM SQL usage-which were reverted due to binding issues-with a subquery-based approach that stays within GORM's standard query-building capabilities. This optimizes for the case where the database's primary key is fully specified (i.e., database name and tenant ID provided), leveraging a predicate subquery to fetch the correct database_id, which integrates more cleanly with GORM and avoids the pitfalls of manual SQL string handling. Additional improvements are made to the organization and clarity of the query construction code, and a session-level Postgres random_page_cost parameter optimization is set during the optimized query path.

Key Changes

• Replaced raw GORM SQL/CTE approach with a predicate subquery for acquiring database_id when optimizing GetCollections queries.
• Refactored query selection logic to use the optimized subquery path only when both databaseName and tenantID are specified and the feature flag is set.
• Set random_page_cost to 1.1 at the transaction level for the optimized query path to improve performance on SSD-based Postgres installations.
• Normalized result mapping and post-processing code for collection and metadata rows.
• Removed prior raw SQL and CTE construction logic, fully reverting to GORM-native query generation.

Affected Areas

• go/pkg/sysdb/metastore/db/dao/collection.go (GetCollections and related methods)

This summary was automatically generated by @propel-code-bot

propel-code-bot · 2025-08-14T04:36:34Z