-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(planner): Add MaterializedViewScanNode #26492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR adds support for non-legacy materialized views by capturing view metadata in the analyzer, introducing a new MaterializedViewScanNode, extending the statement and relation planners to produce unified plans over both data table and view query, and adding a rewrite rule that chooses between fresh and materialized data at runtime, with SPI, optimizer, and test updates. Sequence diagram for planning and rewriting a MaterializedViewScanNodesequenceDiagram
participant Analyzer
participant StatementAnalyzer
participant RelationPlanner
participant Optimizer
participant MaterializedViewRewrite
Analyzer->>StatementAnalyzer: setMaterializedViewInfo(table, mvInfo)
StatementAnalyzer->>RelationPlanner: process(table)
RelationPlanner->>RelationPlanner: planMaterializedView(table, mvInfo, context)
RelationPlanner->>RelationPlanner: create MaterializedViewScanNode
Optimizer->>MaterializedViewRewrite: apply(MaterializedViewScanNode, ...)
MaterializedViewRewrite->>MaterializedViewScanNode: getDataTablePlan()/getViewQueryPlan()
MaterializedViewRewrite->>Optimizer: return rewritten ProjectNode
Class diagram for the new MaterializedViewScanNode and related changesclassDiagram
class MaterializedViewScanNode {
+PlanNode dataTablePlan
+PlanNode viewQueryPlan
+QualifiedObjectName materializedViewName
+Map<VariableReferenceExpression, VariableReferenceExpression> dataTableMappings
+Map<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappings
+List<VariableReferenceExpression> outputVariables
+getDataTablePlan()
+getViewQueryPlan()
+getMaterializedViewName()
+getDataTableMappings()
+getViewQueryMappings()
+getOutputVariables()
+getSources()
+sourceOutputLayout(int)
+replaceChildren(List<PlanNode>)
+accept(PlanVisitor, context)
+assignStatsEquivalentPlanNode(Optional<PlanNode>)
}
MaterializedViewScanNode --|> PlanNode
class Analysis {
+Map<NodeRef<Table>, MaterializedViewInfo> materializedViewInfoMap
+setMaterializedViewInfo(Table, MaterializedViewInfo)
+getMaterializedViewInfo(Table): Optional<MaterializedViewInfo>
}
class Analysis.MaterializedViewInfo {
+QualifiedObjectName materializedViewName
+QualifiedName storageTableName
+Table dataTable
+Query viewQuery
+MaterializedViewDefinition materializedViewDefinition
+getMaterializedViewName()
+getStorageTableName()
+getDataTable()
+getViewQuery()
+getMaterializedViewDefinition()
}
Analysis "1" *-- "*" Analysis.MaterializedViewInfo
class MaterializedViewRewrite {
+apply(MaterializedViewScanNode, Captures, Context): Result
}
MaterializedViewRewrite ..> MaterializedViewScanNode
class PlanVisitor {
+visitMaterializedViewScan(MaterializedViewScanNode, context)
}
PlanVisitor ..> MaterializedViewScanNode
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:187-188` </location>
<code_context>
@Override
protected RelationPlan visitTable(Table node, SqlPlannerContext context)
{
+ Optional<Analysis.MaterializedViewInfo> materializedViewInfo = analysis.getMaterializedViewInfo(node);
+ if (materializedViewInfo.isPresent()) {
+ return planMaterializedView(node, materializedViewInfo.get(), context);
+ }
</code_context>
<issue_to_address>
**issue:** Consider handling the case where the viewQueryPlan and dataTablePlan have mismatched field counts.
Adding a check or assertion for matching field counts and orderings between dataTablePlan and viewQueryPlan would help prevent runtime errors and ensure query correctness.
</issue_to_address>
### Comment 2
<location> `presto-memory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java:70` </location>
<code_context>
+ }
+
+ @Test
+ public void testMaterializedViewRefreshed()
+ {
+ assertUpdate("CREATE TABLE base_table (id BIGINT, value BIGINT)");
</code_context>
<issue_to_address>
**suggestion (testing):** Suggestion to add error condition tests for querying dropped materialized views.
Please add a test that verifies querying a dropped materialized view results in the expected error and no execution plan is generated.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java
Show resolved
Hide resolved
...emory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java
Show resolved
Hide resolved
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
- Using NodeRef as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
- The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
- Using NodeRef<Table> as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
- The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:338-339` </location>
<code_context>
+ ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> dataTableMappingsBuilder = ImmutableMap.builder();
+ ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappingsBuilder = ImmutableMap.builder();
+
+ for (Field field : dataTableDescriptor.getVisibleFields()) {
+ int fieldIndex = dataTableDescriptor.indexOf(field);
+ VariableReferenceExpression dataTableVar = dataTableVariables.get(fieldIndex);
+ VariableReferenceExpression viewQueryVar = viewQueryVariables.get(fieldIndex);
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential mismatch if visible fields differ from output variable order.
This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.
</issue_to_address>
### Comment 2
<location> `presto-spi/src/main/java/com/facebook/presto/spi/plan/MaterializedViewScanNode.java:123-132` </location>
<code_context>
+ return unmodifiableList(sources);
+ }
+
+ public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex)
+ {
+ if (sourceIndex == 0) {
+ // Data table plan - use dataTableMappings
+ return outputVariables.stream()
+ .map(dataTableMappings::get)
+ .collect(toList());
+ }
+ else if (sourceIndex == 1) {
+ // View query plan - use viewQueryMappings
+ return outputVariables.stream()
+ .map(viewQueryMappings::get)
+ .collect(toList());
+ }
+ else {
+ throw new IllegalArgumentException("MaterializedViewScanNode only has 2 sources (index 0 and 1), but got index " + sourceIndex);
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** sourceOutputLayout may return nulls if mappings are incomplete.
Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java:47-48` </location>
<code_context>
+public class TestMaterializedViewRewrite
+ extends BaseRuleTest
+{
+ @Test
+ public void testUseFreshDataWhenFullyMaterialized()
+ {
+ QualifiedObjectName materializedViewName = QualifiedObjectName.valueOf("catalog.schema.mv");
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.
Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.
</issue_to_address>Sourcery is free for open source - if you like our reviews please consider sharing them ✨
| for (Field field : dataTableDescriptor.getVisibleFields()) { | ||
| int fieldIndex = dataTableDescriptor.indexOf(field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential mismatch if visible fields differ from output variable order.
This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.
| public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex) | ||
| { | ||
| if (sourceIndex == 0) { | ||
| // Data table plan - use dataTableMappings | ||
| return outputVariables.stream() | ||
| .map(dataTableMappings::get) | ||
| .collect(toList()); | ||
| } | ||
| else if (sourceIndex == 1) { | ||
| // View query plan - use viewQueryMappings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): sourceOutputLayout may return nulls if mappings are incomplete.
Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.
| @Test | ||
| public void testUseFreshDataWhenFullyMaterialized() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.
Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this feature. Overall looks good to me! Just one nit and a few points for discussion.
| } | ||
|
|
||
| @Override | ||
| public Map<String, com.facebook.presto.spi.ColumnHandle> getColumnHandles(TableHandle tableHandle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public Map<String, com.facebook.presto.spi.ColumnHandle> getColumnHandles(TableHandle tableHandle) | |
| public Map<String, ColumnHandle> getColumnHandles(TableHandle tableHandle) |
nit: use static import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| builder.add(new IterativeOptimizer( | ||
| metadata, | ||
| ruleStats, | ||
| statsCalculator, | ||
| estimatedExchangesCostCalculator, | ||
| ImmutableSet.of(new MaterializedViewRewrite(metadata)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little curious — is there a specific reason this rule needs to be wrapped in an IterativeOptimizer? I was wondering if it could simply work as a standalone PlanOptimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, although this isn't written down, I believe it's preferable to use the iterative optimizer framework, and at one point in time we were trying to deprecate direct usage of PlanOptimizer. In the future, I'll combine this rule and other rules to simplify expressions into this single IterativeOptimizer, as partition stitching will create potentially complicated expression trees which can be simplified. I also appreciate that the scope of the Rules are confined to the nodes we are interested in (i.e. I only see MaterializedViewScanNode in MaterializedViewRewrite, not the entire plan).
| MaterializedViewStatus status = metadataResolver.getMaterializedViewStatus(node.getMaterializedViewName()); | ||
|
|
||
| boolean useDataTable = status.isFullyMaterialized(); | ||
| PlanNode chosenPlan = useDataTable ? node.getDataTablePlan() : node.getViewQueryPlan(); | ||
| Map<VariableReferenceExpression, VariableReferenceExpression> chosenMappings = | ||
| useDataTable ? node.getDataTableMappings() : node.getViewQueryMappings(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, when the status is PARTIALLY_MATERIALIZED, should the stitching logic be supported here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. For now, if it's anything other than FULLY_MATERIALIZED, then we use the view plan. In subsequent PRs, I'll refine this with the techniques outlined in RFC-0016.
Description
Implements the planning-oriented materialized view framework from RFC-0016. Adds MaterializedViewScanNode that holds both the materialized data plan and view query plan, letting the optimizer choose based on freshness during planning instead of at analysis time.
New code path is opt-in via
legacy_materialized_views=falsesession property. Legacy SQL stitching path is preserved as default.Motivation and Context
See RFC-0016.
Impact
No user-facing changes with default settings.
Test Plan
Tests are included in this PR, existing Memory connector end to end tests validate the behavior for unpartitioned tables.
Contributor checklist
Release Notes