Support simplify not for physical expr#18970
Conversation
|
Thanks for the review @martin-g |
alamb
left a comment
There was a problem hiding this comment.
Thank you @xudong963 and @martin-g
I think prior to releasing this code we should make the following changes:
- Avoid explicit recursion in not simplification (and instead use the tree node API)
- Avoid the potential infinite loop
It would be nice to make them as part of this PR, but I also think it would be ok to do it as a follow on
Other things I think would be good to do but not strictly necessary:
- Move the tests to the public interface (PhysicalExprSimplifier)
- Avoid duplications of
Operator::negate
| ); | ||
| Ok(unwrapped) | ||
| // Combine transformation results | ||
| let final_transformed = transformed || unwrapped.transformed; |
There was a problem hiding this comment.
I think you can use transform_data here instead: https://docs.rs/datafusion/latest/datafusion/common/tree_node/struct.Transformed.html#method.transform_data
So something like
// Apply NOT expression simplification first
let rewritten =
simplify_not_expr(&node, self.schema)?.transform_data(|node| {
unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
})?;That handles combining the transformed flag for you
There was a problem hiding this comment.
neat, done in be077db.
Need to catch up on the tree node APIs, haha
| ]) | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Since this function is not really run in isolation (it is always run as part of PhysicalExprSimplifier) I think it would be better if these tests were in datafusion/physical-expr/src/simplifier/mod.rs rather than here.
I don't think this change is required
| } | ||
|
|
||
| /// Returns the negated version of a comparison operator, if possible | ||
| fn negate_operator(op: &Operator) -> Option<Operator> { |
There was a problem hiding this comment.
This looks the same as Operator::negate: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#method.negate
I recommend using that code, or if there is a reason not to use Operator::negate add a comment explaining why it is not used
| Arc::new(NotExpr::new(Arc::clone(binary_expr.right()))); | ||
|
|
||
| // Recursively simplify the NOT expressions | ||
| let simplified_left = simplify_not_expr(¬_left, schema)?; |
There was a problem hiding this comment.
I think since simplify_not_expr is used in PhysicalExprSimplifier which is doing a walk up the plan (using f_up) there is no reason to also recurse explicitly in this rule.
You should be able to apply the rewrite rules only NOT(A OR B) --> NOT A AND NOT B rather than also changing the exprs
| let mut current_expr = Arc::clone(expr); | ||
| let mut overall_transformed = false; | ||
|
|
||
| loop { |
There was a problem hiding this comment.
I am also somewhat concerned about this loop -- it seems like it may be trying to handle a non complete recursion and it could potentially lead to infinite recursion if a rewrite flip/flopped
If the loop is needed I suggest:
- Put it in the higher level PhysicalExprSimplifier
- add a counter that breaks after some number of iterations (e.g. 5 or something)
There was a problem hiding this comment.
I put it into PhysicalExprSimplifier with a counter to limit the loop
|
|
||
| assert!(result.transformed); | ||
| // Should be simplified back to the original b > 5 | ||
| assert_eq!(result.data.to_string(), inner_expr.to_string()); |
There was a problem hiding this comment.
What is the reason for using to_string()?
I tried comparing the two directly and it seems to work
assert_eq!(&result.data, &inner_expr);There was a problem hiding this comment.
yeah, I didn't notice the expr is comparable
| assert!(result.transformed); | ||
| // Should be simplified back to the original b > 5 | ||
| assert_eq!(result.data.to_string(), inner_expr.to_string()); |
There was a problem hiding this comment.
It might also make these queries easier to read if you made a helper like
fn assert_transformed(expr, expected_expr);and
fn assert_not_transformed(expr, expected_expr);| if let Some(literal) = result.data.as_any().downcast_ref::<Literal>() { | ||
| assert_eq!(literal.value(), &ScalarValue::Boolean(Some(false))); | ||
| } else { | ||
| panic!("Expected literal result"); | ||
| } |
There was a problem hiding this comment.
You could potentially make this a function to make the tests easier to read - something like
let literal = as_literal(&result);And handle the panic internally in as_literal
That way the mechics of the test wouldn't obscure the test logic so much
I think you could do something similar with BinaryExpr
| let schema = test_schema(); | ||
|
|
||
| // Create a deeply nested NOT expression: NOT(NOT(NOT(...NOT(b > 5)...))) | ||
| // This tests that we don't get stack overflow with many nested NOTs |
d749da7 to
be077db
Compare
e9731f3 to
27c4ef1
Compare
Thanks @alamb. I addressed all suggestions in the latest commit 27c4ef1 |
|
Looks great @xudong963 -- thank you |
Part of the #18868