Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782

drrtuy · 2022-10-10T18:47:58Z

…

Which issue does this PR close?

Closes #3643.

Rationale for this change

To improve expression simplification efficiency.

What changes are included in this PR?

Are there any user-facing changes?

No user-facing changes are expected.

andygrove · 2022-10-10T19:59:53Z

datafusion/optimizer/src/simplify_expressions.rs

+            if *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v {
+                true
+            } else {
+                false
+            }


Suggested change

if *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v {

true

} else {

false

}

*_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

A way more laconic expression.

I updated the PR.

… is a literal zero or one and division, modulo when right arg is one

alamb

Looks great to me -- thank you @drrtuy

@liukun4515 can you please confirm this is ok with you as well?

liukun4515 · 2022-10-11T13:39:10Z

Looks great to me -- thank you @drrtuy

@liukun4515 can you please confirm this is ok with you as well?

Thanks @alamb I will take a look this issue and pr tomorrow.

isidentical · 2022-10-11T23:03:27Z

datafusion/optimizer/src/simplify_expressions.rs

@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
        | Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
        Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
        Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
+            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v


Question: Is there a specific reason to embed a POWS_OF_TEN array instead of dynamically calculating it at the runtime? (or the same goes to taking to log10() of the v, instead of raising _s) E.g.

i128::pow(10, *_s as u32) == *v

pow call make the patch lesser but will do a real multiplication instead of a pointer deref.
This is perf degradaton IMHO. In the worst case there will be 6 such multiplications(2 for Multiply, 2 for Divide, 2 for Modulo) and there might be multiple scalars binary ops in the expression.

On the other side I don't see much benefit from using pow except there will be no static array. Not a big profit given a potential perf gain.

What is your argument in support for using pow() though?

I think this code will run ~1 per query / expression, so I don't expect the any performance difference to be measurable.

For what it is worth, I think both approaches are reasonable. Thank you both for the discussion.

If the function or the logic will be call many time in the runtime, it is better to use static value or const value which will get good performance.

liukun4515

LGTM @drrtuy Please resolve the conflict.

…ecimal_arg

alamb · 2022-10-12T16:09:10Z

I took the liberty of resolving the conflict in 47a13f5

drrtuy · 2022-10-12T16:25:38Z

I took the liberty of resolving the conflict in 47a13f5

Much appreciated.

alamb · 2022-10-12T16:28:54Z

I plan to merge this PR when CI passes (we are a bit backed up now)

alamb · 2022-10-12T18:30:35Z

Thanks again @drrtuy and @isidentical and @liukun4515 for the discussion and review

ursabot · 2022-10-12T18:32:57Z

Benchmark runs are scheduled for baseline = b0f58dd and contender = a226587. a226587 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

drrtuy marked this pull request as ready for review October 10, 2022 18:48

github-actions bot added the optimizer Optimizer rules label Oct 10, 2022

drrtuy changed the title ~~Optimizer now simplifies multiplication when either left or right arg is a literal zero or one and division, modulo when right arg is one~~ Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one Oct 10, 2022

andygrove reviewed Oct 10, 2022

View reviewed changes

Optimizer now simplifies multiplication when either left or right arg…

c531481

… is a literal zero or one and division, modulo when right arg is one

drrtuy force-pushed the simpl_mul_div_mod_decimal_arg branch from 01d56d0 to c531481 Compare October 10, 2022 21:46

alamb approved these changes Oct 11, 2022

View reviewed changes

isidentical reviewed Oct 11, 2022

View reviewed changes

liukun4515 approved these changes Oct 12, 2022

View reviewed changes

Merge remote-tracking branch 'apache/master' into simpl_mul_div_mod_d…

47a13f5

…ecimal_arg

alamb merged commit a226587 into apache:master Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782

Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782

drrtuy commented Oct 10, 2022

andygrove Oct 10, 2022

drrtuy Oct 10, 2022

drrtuy Oct 10, 2022

alamb left a comment

liukun4515 commented Oct 11, 2022

isidentical Oct 11, 2022 •

edited

Loading

drrtuy Oct 12, 2022

drrtuy Oct 12, 2022

drrtuy Oct 12, 2022

alamb Oct 12, 2022 •

edited

Loading

liukun4515 Oct 12, 2022

liukun4515 left a comment

alamb commented Oct 12, 2022

drrtuy commented Oct 12, 2022

alamb commented Oct 12, 2022

alamb commented Oct 12, 2022

ursabot commented Oct 12, 2022

Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782

Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782

Conversation

drrtuy commented Oct 10, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

andygrove Oct 10, 2022

Choose a reason for hiding this comment

drrtuy Oct 10, 2022

Choose a reason for hiding this comment

drrtuy Oct 10, 2022

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

liukun4515 commented Oct 11, 2022

isidentical Oct 11, 2022 • edited Loading

Choose a reason for hiding this comment

drrtuy Oct 12, 2022

Choose a reason for hiding this comment

drrtuy Oct 12, 2022

Choose a reason for hiding this comment

drrtuy Oct 12, 2022

Choose a reason for hiding this comment

alamb Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

liukun4515 Oct 12, 2022

Choose a reason for hiding this comment

liukun4515 left a comment

Choose a reason for hiding this comment

alamb commented Oct 12, 2022

drrtuy commented Oct 12, 2022

alamb commented Oct 12, 2022

alamb commented Oct 12, 2022

ursabot commented Oct 12, 2022

isidentical Oct 11, 2022 •

edited

Loading

alamb Oct 12, 2022 •

edited

Loading