Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PruningPredicate Use Physical not Logical Predicate #4695

Closed
tustvold opened this issue Dec 21, 2022 · 7 comments · Fixed by #5419
Closed

PruningPredicate Use Physical not Logical Predicate #4695

tustvold opened this issue Dec 21, 2022 · 7 comments · Fixed by #5419
Labels
enhancement New feature or request

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Related to #4693, currently PruningPredicate accepts a logical Expr which it then converts to a PhysicalExpr. The comments appear to indicate this at one point related to serialization. However, looking at the code in datafusion-proto, we appear to be able to serialize physical expressions.

Describe the solution you'd like

I would like PruningPredicate to be created with a PhysicalExpr instead of Expr, this would in turn mean ParquetExec would be created with PhysicalExpr instead of Expr.

Not only would this avoid issues like #4693 but would also be more consistent for the physical plan to contain PhysicalExpr, and not logical Expr

Describe alternatives you've considered

Additional context

This relates to #4629 which tracks making ExecutionProps a trait to ensure all stages of planning/execution take place against the same configuration.

FYI @alamb in case there is something fundamental I have missed here

@tustvold tustvold added the enhancement New feature or request label Dec 21, 2022
@alamb
Copy link
Contributor

alamb commented Dec 21, 2022

I think using PhysicalExpr would be fine for pruning predicates.

If we are going to rewrite pruning anyways, I think we should consider changing the approach -- rather than doing algebriac rewrite I would love to change it to a range analysis -- using the base that @isidentical has created (for predicate selectivity analysis)

@tustvold
Copy link
Contributor Author

This wouldn't require a rewrite, it already uses PhysicalExpr internally, it is just its external interface is Expr. This is the cause of the issue 😅

@tustvold
Copy link
Contributor Author

I didn't dig deep enough into the code, it is actually performing a rewrite of the Expr which would require a non-trivial rewrite to port to PhysicalExpr

@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Dec 22, 2022
@tustvold tustvold reopened this Feb 22, 2023
@tustvold
Copy link
Contributor Author

I believe @crepererum plans to pick this one up

@alamb
Copy link
Contributor

alamb commented Feb 23, 2023

If anyone is going to significantly revamp the pruning predicate logic, I recommend to please coordinate with @mingmwang @ozankabak @metesynnada -- it turns out there is already another range analysis implementation (other than PruningPredicate) and we are in discussions about possibly adding a third: #5322 (comment)

If possible it would be great to avoid investing a lot of time into PruningPredicate only to then have to do some other major surgery on it to avoid duplication with the range analysis

The other two range analysis are based on PhysicalExpr

@crepererum
Copy link
Contributor

It's not really a revamp. It's mostly a straight-forward porting of the PruningPredicate and ParquetExec from logical to phys. expressions. I would like to perform this port first before compressing/de-duping the two implementations. I agree that ideally there should be a single way to express ranges etc.

@ozankabak
Copy link
Contributor

Sounds good, if it is a straightforward porting it makes sense to do it early. We can subsequently merge them in the upcoming weeks as we mature the interval arithmetic library.

crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Feb 24, 2023
avantgardnerio pushed a commit that referenced this issue Feb 24, 2023
* fix: `TRY_CAST` phys. expr display

* refactor: move precedence from `BinaryExpr` to `Operator`

This is useful in other contexts as well, e.g. for formatting phys.
expressions.

* fix: display of nested phys. binary expr should use parenthesis

Found while working on #4695.
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Feb 27, 2023
Use `Arc<dyn PhysicalExpr>` instead of `Expr` within `ParquetExec` and
move lowering from logical to physical expression into plan lowering
(e.g. `ListingTable`).

This is in line w/ all other physical plan nodes (e.g. `FilterExpr`) and
simplifies reasoning within physical optimizer but also allows correct
passing of `ExecutionProps` into the conversion.

Closes apache#4695.
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Feb 27, 2023
Use `Arc<dyn PhysicalExpr>` instead of `Expr` within `ParquetExec` and
move lowering from logical to physical expression into plan lowering
(e.g. `ListingTable`).

This is in line w/ all other physical plan nodes (e.g. `FilterExpr`) and
simplifies reasoning within physical optimizer but also allows correct
passing of `ExecutionProps` into the conversion.

Closes apache#4695.
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Mar 1, 2023
Use `Arc<dyn PhysicalExpr>` instead of `Expr` within `ParquetExec` and
move lowering from logical to physical expression into plan lowering
(e.g. `ListingTable`).

This is in line w/ all other physical plan nodes (e.g. `FilterExpr`) and
simplifies reasoning within physical optimizer but also allows correct
passing of `ExecutionProps` into the conversion.

Closes apache#4695.
alamb pushed a commit that referenced this issue Mar 1, 2023
* feat: `get_phys_expr_columns` util

* feat: add `reassign_predicate_columns` util

* feat: `PhysicalExprRef` type alias

* refactor: `ParquetExec` logical expr. => phys. expr.

Use `Arc<dyn PhysicalExpr>` instead of `Expr` within `ParquetExec` and
move lowering from logical to physical expression into plan lowering
(e.g. `ListingTable`).

This is in line w/ all other physical plan nodes (e.g. `FilterExpr`) and
simplifies reasoning within physical optimizer but also allows correct
passing of `ExecutionProps` into the conversion.

Closes #4695.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants