Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782

Merged
merged 2 commits into from
Oct 12, 2022

Conversation

drrtuy
Copy link
Contributor

@drrtuy drrtuy commented Oct 10, 2022

Which issue does this PR close?

Closes #3643.

Rationale for this change

To improve expression simplification efficiency.

What changes are included in this PR?

Are there any user-facing changes?

No user-facing changes are expected.

@drrtuy drrtuy marked this pull request as ready for review October 10, 2022 18:48
@github-actions github-actions bot added the optimizer Optimizer rules label Oct 10, 2022
@drrtuy drrtuy changed the title Optimizer now simplifies multiplication when either left or right arg is a literal zero or one and division, modulo when right arg is one Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one Oct 10, 2022
Comment on lines 183 to 187
if *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v {
true
} else {
false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v {
true
} else {
false
}
*_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way more laconic expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR.

… is a literal zero or one and division, modulo when right arg is one
@drrtuy drrtuy force-pushed the simpl_mul_div_mod_decimal_arg branch from 01d56d0 to c531481 Compare October 10, 2022 21:46
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @drrtuy

@liukun4515 can you please confirm this is ok with you as well?

@liukun4515
Copy link
Contributor

Looks great to me -- thank you @drrtuy

@liukun4515 can you please confirm this is ok with you as well?

Thanks @alamb I will take a look this issue and pr tomorrow.

@@ -137,6 +179,9 @@ fn is_one(s: &Expr) -> bool {
| Expr::Literal(ScalarValue::UInt64(Some(1))) => true,
Expr::Literal(ScalarValue::Float32(Some(v))) if *v == 1. => true,
Expr::Literal(ScalarValue::Float64(Some(v))) if *v == 1. => true,
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
*_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v
Copy link
Contributor

@isidentical isidentical Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is there a specific reason to embed a POWS_OF_TEN array instead of dynamically calculating it at the runtime? (or the same goes to taking to log10() of the v, instead of raising _s) E.g.

i128::pow(10, *_s as u32) == *v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pow call make the patch lesser but will do a real multiplication instead of a pointer deref.
This is perf degradaton IMHO. In the worst case there will be 6 such multiplications(2 for Multiply, 2 for Divide, 2 for Modulo) and there might be multiple scalars binary ops in the expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other side I don't see much benefit from using pow except there will be no static array. Not a big profit given a potential perf gain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your argument in support for using pow() though?

Copy link
Contributor

@alamb alamb Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code will run ~1 per query / expression, so I don't expect the any performance difference to be measurable.

For what it is worth, I think both approaches are reasonable. Thank you both for the discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function or the logic will be call many time in the runtime, it is better to use static value or const value which will get good performance.

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @drrtuy Please resolve the conflict.

@alamb
Copy link
Contributor

alamb commented Oct 12, 2022

I took the liberty of resolving the conflict in 47a13f5

@drrtuy
Copy link
Contributor Author

drrtuy commented Oct 12, 2022

I took the liberty of resolving the conflict in 47a13f5

Much appreciated.

@alamb
Copy link
Contributor

alamb commented Oct 12, 2022

I plan to merge this PR when CI passes (we are a bit backed up now)

@alamb alamb merged commit a226587 into apache:master Oct 12, 2022
@alamb
Copy link
Contributor

alamb commented Oct 12, 2022

Thanks again @drrtuy and @isidentical and @liukun4515 for the discussion and review

@ursabot
Copy link

ursabot commented Oct 12, 2022

Benchmark runs are scheduled for baseline = b0f58dd and contender = a226587. a226587 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
6 participants