-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PruningPredicate's rewrite public #12850
Conversation
01274f9
to
9c49413
Compare
cc @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb -- this looks like a great idea to me -- I left a comment on how we can potentially improve the API slightly. Let me know what you think
); | ||
|
||
// an expression referencing an unknown column (that is not in the schema) gets passed to the hook | ||
let input = col("b").eq(lit(ScalarValue::Int32(Some(12)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can write literals more concisely like this if you wanted
let input = col("b").eq(lit(ScalarValue::Int32(Some(12)))); | |
let input = col("b").eq(lit(12i32)); |
/// Returns the pruning predicate as an [`PhysicalExpr`] | ||
/// | ||
/// Notice: Does not handle [`phys_expr::InListExpr`] greater than 20, which will be rewritten to TRUE | ||
/// Notice: Does not handle [`phys_expr::InListExpr`] greater than 20, which will fall back to calling `unhandled_hook` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new API I think it might be worth thinking a bit more about. What would you think about moving this logic into its own structure?
For example, instead of
let known_expression = col("a").eq(lit(ScalarValue::Int32(Some(12))));
let known_expression_transformed = rewrite_predicate_to_statistics_predicate(
&logical2physical(&known_expression, &schema),
&schema,
None,
);
Something like
let known_expression = col("a").eq(lit(ScalarValue::Int32(Some(12))));
let known_expression_transformed = StatisticsPredicateRewriter::new()
.rewrite(
&logical2physical(&known_expression, &schema)
&schema)
);
Or with a rewrite hook
let expr = logical2physical(&expr, &schema_with_b);
StatisticsPredicateRewriter::new()
.with_unhandledhook(|expr| {
// closure that handles expr
})
.rewrite(
&expr,
&schema,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that sounds good to me, I'll push that change tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually just did it, it was easy.
My main question is if it should include the schema or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb -- this looks good to me.
My main question is if it should include the schema or not.
I think callers could probably get the schema in other ways. However, it might be nice to pass it in too.
I just meant: let known_expression_transformed = StatisticsPredicateRewriter::new()
.rewrite(
&logical2physical(&known_expression, &schema)
&schema)
); vs let known_expression_transformed = StatisticsPredicateRewriter::new(schema)
.rewrite(&logical2physical(&known_expression, &schema)); I think both work, I don't see a reason to change it but was just double checking. |
Here are some small suggestions: pydantic#3 |
Improve documentation and add default to ConstantUnhandledPredicatehook
Thanks Andrew, merged your suggestions 😄 |
Doh, looks like I broke clippy https://github.com/apache/datafusion/actions/runs/11294824353/job/31416484653?pr=12850
|
I can fix it or feel free to just push to this branch directly. |
I don't think I can as it is the pydanitic fork and I don't have access to do so |
Ah right I think you would have been able to if it was my personal fork but you're right it's the pydantic one. I think I fixed it now. |
Thanks again @adriangb |
* Make PruningPredicate's rewrite public * feedback * Improve documentation and add default to ConstantUnhandledPredicatehook * Update pruning.rs --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Replaces #12606, #12835
As per #12606 (comment) the plan was to split the rewrite logic from the rest of PruningPredicate. It turns out that was as easy as making a function public, kudos to whoever wrote this originally for organizing it nicely 🥳.