-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Add ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256
#16831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256
datafusion/common/src/scalar/mod.rs
Outdated
| DataType::Float32 => ScalarValue::Float32(Some(1.0)), | ||
| DataType::Float64 => ScalarValue::Float64(Some(1.0)), | ||
| DataType::Decimal128(precision, scale) => { | ||
| ScalarValue::Decimal128(Some(1), *precision, *scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we create a new_one() for type Decimal128(3,3), the result in the natural scale will be 0.001:
datafusion/datafusion/sql/src/expr/value.rs
Line 467 in 3869857
| ("0.001", ScalarValue::Decimal128(Some(1), 3, 3)), |
I think this function is supposed to construct 1 in the natural scale? So in this example it should be converted to
Decimal128(Some(1000), 3, 3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for scale, input verification and some tests. It should match Arrow's decimal semantics now.
| Expr, Like, Operator, | ||
| }; | ||
|
|
||
| pub static POWS_OF_TEN: [i128; 38] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this lookup table is used for performance? We can do some measurements to check if it's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me conduct some tests. It could also have been introduced for clarity, too.
i128 and i256 pow are not hardware-backed (with i256 introducing non-trivial low/high logic), so it's probably better to precompute lists via a const function.
datafusion/common/src/scalar/mod.rs
Outdated
| DataType::Float32 => ScalarValue::Float32(Some(-1.0)), | ||
| DataType::Float64 => ScalarValue::Float64(Some(-1.0)), | ||
| DataType::Decimal128(precision, scale) => { | ||
| ScalarValue::Decimal128(Some(-1), *precision, *scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion/common/src/scalar/mod.rs
Outdated
| Self::Decimal256(Some(r), rprecision, rscale), | ||
| ) => { | ||
| if lprecision == rprecision && lscale == rscale { | ||
| // l.checked_sub(*r).and_then( |v| v.checked_abs() ).and_then(|v| v.to_usize() ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
datafusion/common/src/scalar/mod.rs
Outdated
| Self::Decimal128(Some(r), rprecision, rscale), | ||
| ) => { | ||
| if lprecision == rprecision && lscale == rscale { | ||
| l.checked_sub(*r)?.abs().to_usize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_usize returns None on overflow.
shouldn't we return None when checked_sub overflows too?
| Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true, | ||
| Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => { | ||
| *s >= 0 | ||
| && POWS_OF_TEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were powers of 10 precomputed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the initial idea is to mirror Arrow's approach https://github.com/apache/arrow-rs/blob/123045cc766d42d1eb06ee8bb3f09e39ea995ddc/arrow-data/src/decimal.rs
i128::pow and i256::pow have logarithmic complexity depending on the argument (scale in our case), which is usually low. The precomputed array lookup is surely done in constant time.
My other idea about const function to precalculate this array works only for i128 since its methods are consts, which is not the case for arrow-buffer's i256. So, the const function cannot be written without tinkering with from_parts manipulations.
const fn calculate_pows_of_ten_decimal128() -> [i128; DECIMAL128_MAX_PRECISION as usize] {
let mut result = [0i128; DECIMAL128_MAX_PRECISION as usize];
result[0] = 1;
let mut i = 0;
while i <(DECIMAL128_MAX_PRECISION-1) as usize {
result[i+1] = result[i] * 10;
i += 1
}
result
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?
Other than this particular change, this PR looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rolled back to the original lookup map. The new calculation method is used only for Decimal256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked the lookup table approach is faster, perhaps it's better to implement such table in Arrow instead.
fn bench_println(c: &mut Criterion) {
c.bench_function("pow-lookup-table", |b| {
b.iter(|| {
let precision = 30;
let max_scale = 25;
for s in 1..max_scale {
is_one(&lit(ScalarValue::Decimal128(
Some(i128::from(1)),
precision,
max_scale,
)));
}
})
});
// Decimal256 doesn't have a pre-computed power table
c.bench_function("pow-with-calculation", |b| {
b.iter(|| {
let precision = 30;
let max_scale = 25;
for s in 1..max_scale {
is_one(&lit(ScalarValue::Decimal256(
Some(i256::from(1)),
precision,
max_scale,
)));
}
})
});
}pow-lookup-table time: [159.16 ns 161.86 ns 166.03 ns]
change: [-0.9307% +0.0846% +1.1886%] (p = 0.90 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
pow-with-calculation time: [673.14 ns 674.23 ns 675.36 ns]
change: [-0.3838% -0.1634% +0.0709%] (p = 0.18 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild
- Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @theirix and @findepi and @2010YOUY01
datafusion/common/src/scalar/mod.rs
Outdated
| DataType::Float32 => ScalarValue::Float32(Some(1.0)), | ||
| DataType::Float64 => ScalarValue::Float64(Some(1.0)), | ||
| DataType::Decimal128(precision, scale) => { | ||
| if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to add new InternalError wrappers here?
As in why not just
| if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>( | |
| validate_decimal_precision_and_scale::<Decimal128Type>(*precision, *scale)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, no need for it, updated. Forgot about the auto-conversion from ArrowError.
| Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true, | ||
| Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => { | ||
| *s >= 0 | ||
| && POWS_OF_TEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?
Other than this particular change, this PR looks good to me
ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256
This reverts commit ba23e8c.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @theirix - this makes sense to me
cc @berkaysynnada as you have been working on this recently as well I think
…for `Decimal128` and `Decimal256` (apache#16831) * Add missing ScalarValue impls for large decimals Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256 * Support expr simplication for Decimal256 * Replace lookup table with i128::pow * Support different scales for Decimal constructors - Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance * Revert "Replace lookup table with i128::pow" This reverts commit ba23e8c. * Use Arrow error directly
Which issue does this PR close?
Rationale for this change
Enhancing support for
ScalarValue::Decimal128andScalarValue::Decimal256What changes are included in this PR?
Decimal128andDecimal256toScalarValuein functions:new_one,new_zero,new_ten,distanceDecimal256:is_zeroandis_oneAre these changes tested?
Unit tests for optimiser utils
Are there any user-facing changes?
No