-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support normalized expr in CSE #13315
base: main
Are you sure you want to change the base?
Conversation
cc @peter-toth |
Thanks @alamb for pinging me and @zhuliquan for the PR. Node normalization was indeed missing from CSE. I am happy to review the change next week. |
datafusion/expr/src/expr.rs
Outdated
| Operator::NotEq | ||
) { | ||
let (l_expr, r_expr) = | ||
if format!("{normalized_left}") < format!("{normalized_right}") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this slowdown the query? How about checking the equality for (left, right) and (right, left) to determine the order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm, This code runs in CSE phase, instead of the statement execution phase. It stands to reason that there should be no impact on execution, but are you referring specifically to the scenario where you use datafusion-cli
to run statements? This function will only be invoked on Eq
when the hash value of the node is the same, and the frequency should not be high, and the normalized comparison should be the same time complexity as the original node direct comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I share the concerns of @jayzhan211. Also, the other case in this method contains .clone()
of Expr
, which can be also costly.
I wonder if we could implement and use something like fn eq_normalized(&self, other: &Self) -> bool
instead of the current fn normalize(&self) -> Expr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I got it. Thanks for @peter-toth @jayzhan211 suggestion. I improve this code as below
datafusion/datafusion/expr/src/expr.rs
Lines 1697 to 1738 in dda0848
impl NormalizeEq for Expr { | |
fn normalize_eq(&self, other: &Self) -> bool { | |
match (self, other) { | |
( | |
Expr::BinaryExpr(BinaryExpr { | |
left: self_left, | |
op: self_op, | |
right: self_right, | |
}), | |
Expr::BinaryExpr(BinaryExpr { | |
left: other_left, | |
op: other_op, | |
right: other_right, | |
}), | |
) => { | |
if self_op != other_op { | |
return false; | |
} | |
if matches!( | |
self_op, | |
Operator::Plus | |
| Operator::Multiply | |
| Operator::BitwiseAnd | |
| Operator::BitwiseOr | |
| Operator::BitwiseXor | |
| Operator::Eq | |
| Operator::NotEq | |
) { | |
(self_left.normalize_eq(other_left) | |
&& self_right.normalize_eq(other_right)) | |
|| (self_left.normalize_eq(other_right) | |
&& self_right.normalize_eq(other_left)) | |
} else { | |
self_left.normalize_eq(other_left) | |
&& self_right.normalize_eq(other_right) | |
} | |
} | |
(_, _) => self == other, | |
} | |
} | |
} |
4e60434
to
dda0848
Compare
impl<'n, N: HashNode> Identifier<'n, N> { | ||
impl<N: NormalizeEq> PartialEq for Identifier<'_, N> { | ||
fn eq(&self, other: &Self) -> bool { | ||
self.node.normalize_eq(other.node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's compare hash
es as well. I know that we use Identifier
in hashmap keys and there hashes need to match first to evaluate eq
, but still this would look more eq
like...
&& self_right.normalize_eq(other_right) | ||
} | ||
} | ||
(_, _) => self == other, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this actually means that if there is an other expr above the cumulative BinaryExpr
operation, then we don't normalize? E.g. Not(a == b)
shouldn't be semantically equal to Not(b == a)
?
Which issue does this PR close?
Rationale for this change
I notice that some expressions are semantically equivalent. (i.g.
a + b
andb + a
is equivalent). I think their expressions can be optimized in CSE (i.e. common subexper eliminate) process. For example: we can optimize below logical plan.before:
after:
In example, we extract common expr
test.a + test.b
, astest.a + test.b
is equal totest.b + test.a
semantically.What changes are included in this PR?
extract_common_nodes
.NormalizeNode
and implement this trait for enumExpr
.BinaryExpr
(includingPlus
/Multi
/BitsetAnd
/BitsetOr
/BitsetXor
/Eq
/NotEq
expressions)Are these changes tested?
yes, I add below test cases.
and, I modify below test cases.
Are there any user-facing changes?
No