Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support normalized expr in CSE #13315

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zhuliquan
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

I notice that some expressions are semantically equivalent. (i.g. a + b and b + a is equivalent). I think their expressions can be optimized in CSE (i.e. common subexper eliminate) process. For example: we can optimize below logical plan.

before:

Filter: (test.a + test.b) * (test.b + test.a) = Int32(30)
  TableScan: test

after:

Projection: test.a, test.b, test.c
  Filter: __common_expr_1 * __common_expr_1 = Int32(30)
    Projection: test.a + test.b AS __common_expr_1, test.a, test.b, test.c
      TableScan: test

In example, we extract common expr test.a + test.b, as test.a + test.b is equal to test.b + test.a semantically.

What changes are included in this PR?

  1. adding code of normalizing node before CSE in function extract_common_nodes.
  2. adding trait NormalizeNode and implement this trait for enum Expr.
  3. adding some normalize rule for some specific BinaryExpr (including Plus / Multi / BitsetAnd / BitsetOr / BitsetXor / Eq / NotEq expressions)

Are these changes tested?

yes, I add below test cases.

  1. test_normalize_add_expression
  2. test_normalize_multi_expression
  3. test_normalize_bitset_and_expression
  4. test_normalize_bitset_or_expression
  5. test_normalize_bitset_xor_expression
  6. test_normalize_eq_expression
  7. test_normalize_ne_expression

and, I modify below test cases.

  1. subexpr_in_different_order

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules common Related to common crate labels Nov 8, 2024
@alamb
Copy link
Contributor

alamb commented Nov 8, 2024

cc @peter-toth

@peter-toth
Copy link
Contributor

peter-toth commented Nov 9, 2024

Thanks @alamb for pinging me and @zhuliquan for the PR. Node normalization was indeed missing from CSE. I am happy to review the change next week.

| Operator::NotEq
) {
let (l_expr, r_expr) =
if format!("{normalized_left}") < format!("{normalized_right}") {
Copy link
Contributor

@jayzhan211 jayzhan211 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this slowdown the query? How about checking the equality for (left, right) and (right, left) to determine the order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, This code runs in CSE phase, instead of the statement execution phase. It stands to reason that there should be no impact on execution, but are you referring specifically to the scenario where you use datafusion-cli to run statements? This function will only be invoked on Eq when the hash value of the node is the same, and the frequency should not be high, and the normalized comparison should be the same time complexity as the original node direct comparison.

Copy link
Contributor

@peter-toth peter-toth Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I share the concerns of @jayzhan211. Also, the other case in this method contains .clone() of Expr, which can be also costly.
I wonder if we could implement and use something like fn eq_normalized(&self, other: &Self) -> bool instead of the current fn normalize(&self) -> Expr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I got it. Thanks for @peter-toth @jayzhan211 suggestion. I improve this code as below

impl NormalizeEq for Expr {
fn normalize_eq(&self, other: &Self) -> bool {
match (self, other) {
(
Expr::BinaryExpr(BinaryExpr {
left: self_left,
op: self_op,
right: self_right,
}),
Expr::BinaryExpr(BinaryExpr {
left: other_left,
op: other_op,
right: other_right,
}),
) => {
if self_op != other_op {
return false;
}
if matches!(
self_op,
Operator::Plus
| Operator::Multiply
| Operator::BitwiseAnd
| Operator::BitwiseOr
| Operator::BitwiseXor
| Operator::Eq
| Operator::NotEq
) {
(self_left.normalize_eq(other_left)
&& self_right.normalize_eq(other_right))
|| (self_left.normalize_eq(other_right)
&& self_right.normalize_eq(other_left))
} else {
self_left.normalize_eq(other_left)
&& self_right.normalize_eq(other_right)
}
}
(_, _) => self == other,
}
}
}

impl<'n, N: HashNode> Identifier<'n, N> {
impl<N: NormalizeEq> PartialEq for Identifier<'_, N> {
fn eq(&self, other: &Self) -> bool {
self.node.normalize_eq(other.node)
Copy link
Contributor

@peter-toth peter-toth Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's compare hashes as well. I know that we use Identifier in hashmap keys and there hashes need to match first to evaluate eq, but still this would look more eq like...

&& self_right.normalize_eq(other_right)
}
}
(_, _) => self == other,
Copy link
Contributor

@peter-toth peter-toth Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this actually means that if there is an other expr above the cumulative BinaryExpr operation, then we don't normalize? E.g. Not(a == b) shouldn't be semantically equal to Not(b == a)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants