Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PruningPredicate's rewrite public #12850

Merged
merged 5 commits into from
Oct 13, 2024
Merged

Conversation

adriangb
Copy link
Contributor

Replaces #12606, #12835

As per #12606 (comment) the plan was to split the rewrite logic from the rest of PruningPredicate. It turns out that was as easy as making a function public, kudos to whoever wrote this originally for organizing it nicely 🥳.

@github-actions github-actions bot added the core Core DataFusion crate label Oct 10, 2024
@adriangb
Copy link
Contributor Author

cc @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb -- this looks like a great idea to me -- I left a comment on how we can potentially improve the API slightly. Let me know what you think

);

// an expression referencing an unknown column (that is not in the schema) gets passed to the hook
let input = col("b").eq(lit(ScalarValue::Int32(Some(12))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can write literals more concisely like this if you wanted

Suggested change
let input = col("b").eq(lit(ScalarValue::Int32(Some(12))));
let input = col("b").eq(lit(12i32));

/// Returns the pruning predicate as an [`PhysicalExpr`]
///
/// Notice: Does not handle [`phys_expr::InListExpr`] greater than 20, which will be rewritten to TRUE
/// Notice: Does not handle [`phys_expr::InListExpr`] greater than 20, which will fall back to calling `unhandled_hook`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new API I think it might be worth thinking a bit more about. What would you think about moving this logic into its own structure?

For example, instead of

        let known_expression = col("a").eq(lit(ScalarValue::Int32(Some(12))));
        let known_expression_transformed = rewrite_predicate_to_statistics_predicate(
            &logical2physical(&known_expression, &schema),
            &schema,
            None,
        );

Something like

        let known_expression = col("a").eq(lit(ScalarValue::Int32(Some(12))));
        let known_expression_transformed = StatisticsPredicateRewriter::new()
           .rewrite(
             &logical2physical(&known_expression, &schema)
             &schema)
           );

Or with a rewrite hook

            let expr = logical2physical(&expr, &schema_with_b);
            StatisticsPredicateRewriter::new()
              .with_unhandledhook(|expr| {
                 // closure that handles expr
              })
           .rewrite(
                &expr,
                &schema,
            )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that sounds good to me, I'll push that change tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually just did it, it was easy.

My main question is if it should include the schema or not.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb -- this looks good to me.

My main question is if it should include the schema or not.

I think callers could probably get the schema in other ways. However, it might be nice to pass it in too.

@adriangb
Copy link
Contributor Author

I think callers could probably get the schema in other ways. However, it might be nice to pass it in too.

I just meant:

        let known_expression_transformed = StatisticsPredicateRewriter::new()
           .rewrite(
             &logical2physical(&known_expression, &schema)
             &schema)
           );

vs

        let known_expression_transformed = StatisticsPredicateRewriter::new(schema)
           .rewrite(&logical2physical(&known_expression, &schema));

I think both work, I don't see a reason to change it but was just double checking.

@alamb
Copy link
Contributor

alamb commented Oct 11, 2024

Here are some small suggestions: pydantic#3

Improve documentation and add default to ConstantUnhandledPredicatehook
@adriangb
Copy link
Contributor Author

Thanks Andrew, merged your suggestions 😄

@alamb
Copy link
Contributor

alamb commented Oct 11, 2024

Doh, looks like I broke clippy

https://github.com/apache/datafusion/actions/runs/11294824353/job/31416484653?pr=12850

505 | impl ConstantUnhandledPredicateHook {
    | ----------------------------------- associated function in this implementation
506 |     fn new(default: Arc<dyn PhysicalExpr>) -> Self {
    |        ^^^

@adriangb
Copy link
Contributor Author

I can fix it or feel free to just push to this branch directly.

@alamb
Copy link
Contributor

alamb commented Oct 11, 2024

I can fix it or feel free to just push to this branch directly.

I don't think I can as it is the pydanitic fork and I don't have access to do so

@adriangb
Copy link
Contributor Author

Ah right I think you would have been able to if it was my personal fork but you're right it's the pydantic one.

I think I fixed it now.

@alamb alamb merged commit 1b10c9f into apache:main Oct 13, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 13, 2024

Thanks again @adriangb

hailelagi pushed a commit to hailelagi/datafusion that referenced this pull request Oct 14, 2024
* Make PruningPredicate's rewrite public

* feedback

* Improve documentation and add default to ConstantUnhandledPredicatehook

* Update pruning.rs

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants