Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: push predicates into virtual datasets #31486

Merged
merged 1 commit into from
Jan 9, 2025
Merged

Conversation

betodealmeida
Copy link
Member

SUMMARY

In a perfect world database optimizers would know how to push predicates into inner queries, but in reality many databases don't do that. This can be a huge performance issue for charts built on top of virtual datasets, where the SQL is wrapped by the controls from the chart builder.

This PR introduces a new feature flag OPTIMIZE_SQL. When the feature is on, and the database engine has a supported dialect in sqlglot, SQL queries on virtual datasets will be optimized automatically, pushing the predicates into the inner SELECT is possible.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro
Copy link
Member

LGTM at quick glance. It would be nice to have a roadmap in place for this, so that the feature flag isn't left dangling. Something like:

  • 5.0: feature flag introduced, defaulting to False
  • 6.0: feature flag defaults to True, is deprecated
  • 7.0: feature flag is removed and feature is always enabled.

Were you planning something similar?

@betodealmeida betodealmeida marked this pull request as ready for review December 17, 2024 01:39
@dosubot dosubot bot added change:backend Requires changing the backend data Namespace | Anything related to data, including databases configurations, datasets, etc. labels Dec 17, 2024
@rusackas rusackas requested review from villebro, michael-s-molina and mistercrunch and removed request for villebro December 17, 2024 18:22
@betodealmeida
Copy link
Member Author

betodealmeida commented Dec 17, 2024

@villebro there are roughly two reasons why we parse SQL in Superset:

  1. To extract referenced tables (enforcing data access roles, for example); and
  2. To modify the query (enforcing RLS, for example).

The feature from this PR falls in category 2. I think that any functionality in that category should always be optional, since there are always security and correctness concerns when modifying a query. So I would propose the following:

  • 5.0: add the feature flag, default to false.
  • 6.0: move the feature flag to a configuration option, default to false.

@betodealmeida
Copy link
Member Author

Actually, I think we should have a list of engines in superset/config.py where this should be applied, it gives more flexibility.

@betodealmeida betodealmeida changed the title feat: push predicates feat: push predicates into virtual datasets Dec 18, 2024
@Vitor-Avila
Copy link
Contributor

@betodealmeida would this also push RLS rules to the query predicate as well?

@betodealmeida
Copy link
Member Author

@betodealmeida would this also push RLS rules to the query predicate as well?

I think so, but let me double check.

@betodealmeida
Copy link
Member Author

Yeah, the RLS is applied in get_sqla_query in superset/models/helpers.py, which happens before the query optimization.

@betodealmeida betodealmeida merged commit e4b3ecd into master Jan 9, 2025
56 of 61 checks passed
@sadpandajoe sadpandajoe deleted the optimize-sql branch January 9, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:backend Requires changing the backend data Namespace | Anything related to data, including databases configurations, datasets, etc. preset-io size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants