[multistage][feature] RelDistribution-based optimization 

Trackers
===
- [x] enable trait-based partition optimization for more accurate leaf-stage direct exchange/shuffle (#11976)
- [x] extend current direct exchange/shuffle logic to the entire query plan https://github.com/apache/pinot/pull/12079
- [ ] revisit long-term distribution (#12012)

Details
===

Trait-based optimziation
------
Goals of the first enablement of trait-based optimization:

1. ensure traits are propagated properly across the rel-tree
2. ensure the leaf-stage optimization for direct exchange is correct --> previously it was blindly done via tableOptions
3. ensure that no regression occurs for existing, correct direct exchange

Extended partition optimization beyond leaf-stage
------
in additional to dealing with RelDistribution trait we also need to deal with physical information passed in via hints, specifically

- colocated join hint: ensure colocated join hints are enforced 
    - when colocated hint exist and table are not colocated --> issue warning or fail with excpeiton
    - when no colocated join hints are passed, colocation is best-effort, unless explicitly disabled via hint
- partition key and functions: data can be partitioned by the same key but not the same function 
    - ensure partition/colocate are checked against both function and name

- allow auto TableOptions
    - table partition key, size, function can be read from tableConfigs;
    - with RelDistribution and partition/colocated hint, table Options are no longer needed inside query, but it is also used as a hint to "assigning worker/server based on table partition" which should be preserved

Execute long-term optimization
------
Several goals are listed down but can be discussed further

- stage parallelism 
    - direct exchanges are not currently tested against stage parallelism, they are all 1-1
    - should support per-partition fan-out
- relDistribution trait insertion before exchange rules
    - relDistribution trait can be walked before exchange thus exchange insertion can be done when expected and actual trait differs between input node result relDistribution and current node expected result relDistribution
- best relDistribution preservation
    - partition compatibililty optimization, e.g. 
        - whether to partition by [a, b] combine or partition by [a] when doing a GROUP BY a, b --> the former is trivial selection; but latter could be beneficial if downstream stages are joining on [a]
        - push repartition later --> e.g. GROUP BY a, b can be done on a pre-partitioned-by-a stage until a repartition is absolutely necessary.

See more discussion in #12012 for detail impl plan



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multistage][feature] RelDistribution-based optimization #12015

Trackers

Details

Trait-based optimziation

Extended partition optimization beyond leaf-stage

Execute long-term optimization

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[multistage][feature] RelDistribution-based optimization #12015

Description

Trackers

Details

Trait-based optimziation

Extended partition optimization beyond leaf-stage

Execute long-term optimization

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions