Skip to content

Conversation

@tdcmeehan
Copy link
Contributor

@tdcmeehan tdcmeehan commented Oct 31, 2025

Description

Implements the planning-oriented materialized view framework from RFC-0016. Adds MaterializedViewScanNode that holds both the materialized data plan and view query plan, letting the optimizer choose based on freshness during planning instead of at analysis time.

New code path is opt-in via legacy_materialized_views=false session property. Legacy SQL stitching path is preserved as default.

Motivation and Context

See RFC-0016.

Impact

No user-facing changes with default settings.

Test Plan

Tests are included in this PR, existing Memory connector end to end tests validate the behavior for unpartitioned tables.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 31, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 31, 2025

Reviewer's Guide

This PR adds support for non-legacy materialized views by capturing view metadata in the analyzer, introducing a new MaterializedViewScanNode, extending the statement and relation planners to produce unified plans over both data table and view query, and adding a rewrite rule that chooses between fresh and materialized data at runtime, with SPI, optimizer, and test updates.

Sequence diagram for planning and rewriting a MaterializedViewScanNode

sequenceDiagram
    participant Analyzer
    participant StatementAnalyzer
    participant RelationPlanner
    participant Optimizer
    participant MaterializedViewRewrite

    Analyzer->>StatementAnalyzer: setMaterializedViewInfo(table, mvInfo)
    StatementAnalyzer->>RelationPlanner: process(table)
    RelationPlanner->>RelationPlanner: planMaterializedView(table, mvInfo, context)
    RelationPlanner->>RelationPlanner: create MaterializedViewScanNode
    Optimizer->>MaterializedViewRewrite: apply(MaterializedViewScanNode, ...)
    MaterializedViewRewrite->>MaterializedViewScanNode: getDataTablePlan()/getViewQueryPlan()
    MaterializedViewRewrite->>Optimizer: return rewritten ProjectNode
Loading

Class diagram for the new MaterializedViewScanNode and related changes

classDiagram
    class MaterializedViewScanNode {
        +PlanNode dataTablePlan
        +PlanNode viewQueryPlan
        +QualifiedObjectName materializedViewName
        +Map<VariableReferenceExpression, VariableReferenceExpression> dataTableMappings
        +Map<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappings
        +List<VariableReferenceExpression> outputVariables
        +getDataTablePlan()
        +getViewQueryPlan()
        +getMaterializedViewName()
        +getDataTableMappings()
        +getViewQueryMappings()
        +getOutputVariables()
        +getSources()
        +sourceOutputLayout(int)
        +replaceChildren(List<PlanNode>)
        +accept(PlanVisitor, context)
        +assignStatsEquivalentPlanNode(Optional<PlanNode>)
    }
    MaterializedViewScanNode --|> PlanNode

    class Analysis {
        +Map<NodeRef<Table>, MaterializedViewInfo> materializedViewInfoMap
        +setMaterializedViewInfo(Table, MaterializedViewInfo)
        +getMaterializedViewInfo(Table): Optional<MaterializedViewInfo>
    }
    class Analysis.MaterializedViewInfo {
        +QualifiedObjectName materializedViewName
        +QualifiedName storageTableName
        +Table dataTable
        +Query viewQuery
        +MaterializedViewDefinition materializedViewDefinition
        +getMaterializedViewName()
        +getStorageTableName()
        +getDataTable()
        +getViewQuery()
        +getMaterializedViewDefinition()
    }
    Analysis "1" *-- "*" Analysis.MaterializedViewInfo

    class MaterializedViewRewrite {
        +apply(MaterializedViewScanNode, Captures, Context): Result
    }
    MaterializedViewRewrite ..> MaterializedViewScanNode

    class PlanVisitor {
        +visitMaterializedViewScan(MaterializedViewScanNode, context)
    }
    PlanVisitor ..> MaterializedViewScanNode
Loading

File-Level Changes

Change Details Files
Capture and propagate materialized view metadata in analysis
  • Added materializedViewInfoMap map with set/get methods
  • Introduced immutable MaterializedViewInfo class to hold view metadata
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java
Extend StatementAnalyzer to plan non-legacy materialized views
  • Branched on legacy_materialized_views session property to choose new planning path
  • Parsed original view SQL and storage table into MaterializedViewInfo
  • Processed view query and data table to build scopes and unregister temporary named query
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Implement MaterializedViewScanNode planning in RelationPlanner
  • Checked for MaterializedViewInfo when visiting Table nodes
  • Built data and view query plans, mapped output variables, and created MaterializedViewScanNode
presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java
Define new MaterializedViewScanNode in SPI
  • Added new PlanNode class with JSON annotations and mapping logic
  • Implemented getSources, replaceChildren, visitor accept, and sourceOutputLayout
presto-spi/src/main/java/com/facebook/presto/spi/plan/MaterializedViewScanNode.java
Add rewrite rule to choose between data table or view query
  • Implemented MaterializedViewRewrite rule using MetadataResolver.getMaterializedViewStatus
  • Integrated rule into PlanOptimizers and enabled in ApplyConnectorOptimization
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MaterializedViewRewrite.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/Patterns.java
presto-spi/src/main/java/com/facebook/presto/spi/plan/PlanVisitor.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Ensure validator and rule infrastructure supports the new scan node
  • Updated ValidateDependenciesChecker to validate MaterializedViewScanNode mappings
  • Extended test PlanBuilder to construct MaterializedViewScanNode
  • Added default overload in MetadataResolver SPI
presto-main-base/src/main/java/com/facebook/presto/sql/planner/sanity/ValidateDependenciesChecker.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/test/PlanBuilder.java
presto-spi/src/main/java/com/facebook/presto/spi/analyzer/MetadataResolver.java
Add end-to-end tests for materialized view planning and rewrite
  • Created TestMaterializedViewRewrite for full/partial materialization scenarios
  • Added TestMemoryMaterializedViewPlanner for connector-level planning verification
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java
presto-memory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@tdcmeehan
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:187-188` </location>
<code_context>
     @Override
     protected RelationPlan visitTable(Table node, SqlPlannerContext context)
     {
+        Optional<Analysis.MaterializedViewInfo> materializedViewInfo = analysis.getMaterializedViewInfo(node);
+        if (materializedViewInfo.isPresent()) {
+            return planMaterializedView(node, materializedViewInfo.get(), context);
+        }
</code_context>

<issue_to_address>
**issue:** Consider handling the case where the viewQueryPlan and dataTablePlan have mismatched field counts.

Adding a check or assertion for matching field counts and orderings between dataTablePlan and viewQueryPlan would help prevent runtime errors and ensure query correctness.
</issue_to_address>

### Comment 2
<location> `presto-memory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java:70` </location>
<code_context>
+    }
+
+    @Test
+    public void testMaterializedViewRefreshed()
+    {
+        assertUpdate("CREATE TABLE base_table (id BIGINT, value BIGINT)");
</code_context>

<issue_to_address>
**suggestion (testing):** Suggestion to add error condition tests for querying dropped materialized views.

Please add a test that verifies querying a dropped materialized view results in the expected error and no execution plan is generated.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@tdcmeehan
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
  • Using NodeRef as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
  • The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
  • Prompt for AI Agents
    Please address the comments from this code review:
    
    ## Overall Comments
    - There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
    - Using NodeRef<Table> as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
    - The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
    
    ## Individual Comments
    
    ### Comment 1
    <location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:338-339` </location>
    <code_context>
    +        ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> dataTableMappingsBuilder = ImmutableMap.builder();
    +        ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappingsBuilder = ImmutableMap.builder();
    +
    +        for (Field field : dataTableDescriptor.getVisibleFields()) {
    +            int fieldIndex = dataTableDescriptor.indexOf(field);
    +            VariableReferenceExpression dataTableVar = dataTableVariables.get(fieldIndex);
    +            VariableReferenceExpression viewQueryVar = viewQueryVariables.get(fieldIndex);
    </code_context>
    
    <issue_to_address>
    **issue (bug_risk):** Potential mismatch if visible fields differ from output variable order.
    
    This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.
    </issue_to_address>
    
    ### Comment 2
    <location> `presto-spi/src/main/java/com/facebook/presto/spi/plan/MaterializedViewScanNode.java:123-132` </location>
    <code_context>
    +        return unmodifiableList(sources);
    +    }
    +
    +    public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex)
    +    {
    +        if (sourceIndex == 0) {
    +            // Data table plan - use dataTableMappings
    +            return outputVariables.stream()
    +                    .map(dataTableMappings::get)
    +                    .collect(toList());
    +        }
    +        else if (sourceIndex == 1) {
    +            // View query plan - use viewQueryMappings
    +            return outputVariables.stream()
    +                    .map(viewQueryMappings::get)
    +                    .collect(toList());
    +        }
    +        else {
    +            throw new IllegalArgumentException("MaterializedViewScanNode only has 2 sources (index 0 and 1), but got index " + sourceIndex);
    +        }
    </code_context>
    
    <issue_to_address>
    **issue (bug_risk):** sourceOutputLayout may return nulls if mappings are incomplete.
    
    Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.
    </issue_to_address>
    
    ### Comment 3
    <location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java:47-48` </location>
    <code_context>
    +public class TestMaterializedViewRewrite
    +        extends BaseRuleTest
    +{
    +    @Test
    +    public void testUseFreshDataWhenFullyMaterialized()
    +    {
    +        QualifiedObjectName materializedViewName = QualifiedObjectName.valueOf("catalog.schema.mv");
    </code_context>
    
    <issue_to_address>
    **suggestion (testing):** Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.
    
    Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.
    </issue_to_address>

    Sourcery is free for open source - if you like our reviews please consider sharing them ✨
    Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +338 to +339
for (Field field : dataTableDescriptor.getVisibleFields()) {
int fieldIndex = dataTableDescriptor.indexOf(field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Potential mismatch if visible fields differ from output variable order.

This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.

Comment on lines +123 to +132
public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex)
{
if (sourceIndex == 0) {
// Data table plan - use dataTableMappings
return outputVariables.stream()
.map(dataTableMappings::get)
.collect(toList());
}
else if (sourceIndex == 1) {
// View query plan - use viewQueryMappings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): sourceOutputLayout may return nulls if mappings are incomplete.

Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.

Comment on lines +47 to +48
@Test
public void testUseFreshDataWhenFullyMaterialized()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.

Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.

@tdcmeehan tdcmeehan changed the title Add MaterializedViewScanNode feat(planner): Add MaterializedViewScanNode Oct 31, 2025
@tdcmeehan tdcmeehan requested a review from zation99 October 31, 2025 19:54
@tdcmeehan tdcmeehan marked this pull request as ready for review October 31, 2025 19:54
@prestodb-ci prestodb-ci requested review from a team, NivinCS and nishithakbhaskaran and removed request for a team October 31, 2025 19:55
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this feature. Overall looks good to me! Just one nit and a few points for discussion.

}

@Override
public Map<String, com.facebook.presto.spi.ColumnHandle> getColumnHandles(TableHandle tableHandle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Map<String, com.facebook.presto.spi.ColumnHandle> getColumnHandles(TableHandle tableHandle)
public Map<String, ColumnHandle> getColumnHandles(TableHandle tableHandle)

nit: use static import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +318 to +323
builder.add(new IterativeOptimizer(
metadata,
ruleStats,
statsCalculator,
estimatedExchangesCostCalculator,
ImmutableSet.of(new MaterializedViewRewrite(metadata))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little curious — is there a specific reason this rule needs to be wrapped in an IterativeOptimizer? I was wondering if it could simply work as a standalone PlanOptimizer.

Copy link
Contributor Author

@tdcmeehan tdcmeehan Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, although this isn't written down, I believe it's preferable to use the iterative optimizer framework, and at one point in time we were trying to deprecate direct usage of PlanOptimizer. In the future, I'll combine this rule and other rules to simplify expressions into this single IterativeOptimizer, as partition stitching will create potentially complicated expression trees which can be simplified. I also appreciate that the scope of the Rules are confined to the nodes we are interested in (i.e. I only see MaterializedViewScanNode in MaterializedViewRewrite, not the entire plan).

Comment on lines +63 to +68
MaterializedViewStatus status = metadataResolver.getMaterializedViewStatus(node.getMaterializedViewName());

boolean useDataTable = status.isFullyMaterialized();
PlanNode chosenPlan = useDataTable ? node.getDataTablePlan() : node.getViewQueryPlan();
Map<VariableReferenceExpression, VariableReferenceExpression> chosenMappings =
useDataTable ? node.getDataTableMappings() : node.getViewQueryMappings();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, when the status is PARTIALLY_MATERIALIZED, should the stitching logic be supported here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. For now, if it's anything other than FULLY_MATERIALIZED, then we use the view plan. In subsequent PRs, I'll refine this with the techniques outlined in RFC-0016.

@tdcmeehan tdcmeehan requested a review from aaneja November 3, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants