Skip to content

Conversation

@Jefffrey
Copy link
Contributor

@Jefffrey Jefffrey commented Nov 6, 2025

Which issue does this PR close?

Part of #14763 and #14760

Rationale for this change

Current log() signature has some drawbacks:

Numeric(1),
Numeric(2),
Exact(vec![DataType::Float32, DataType::Float32]),
Exact(vec![DataType::Float64, DataType::Float64]),
Exact(vec![
DataType::Int64,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float32,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float64,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Int64,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float32,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float64,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),

  • A bit nasty to look at: mixes numeric with exact float/int with exact decimal (of exact precision and scale)
  • Can't accommodate arbitrary decimals of any precision/scale (this is true for other functions too)

Aim of this PR is to refactor it to use the coercion API, uplifting the API where necessary to make this possible. This simplifies the signature in code, whilst not losing flexibility.

Also other minor refactors are included to log.

What changes are included in this PR?

New TypeSignatureClass variants: Float, Decimal & Numeric

Refactor log() signature to be more in line with it's supported implementations.

Fix issue in log() where ColumnarValue::Scalars were being lost as ColumnarValue::Arrays for the base.

Support null propagation in simplify() for log().

Fix issue with calculate_binary_math where it wasn't casting scalars.

Are these changes tested?

Added new tests.

  • Tests for float16, decimal32, decimal64, decimals with different scales/precisions
  • Test for null propagation (ensure use array input to avoid function inlining)

Are there any user-facing changes?

No.

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Nov 6, 2025
Copy link
Contributor Author

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: add the tests Done

logical_plan
01)Sort: log_c11_base_c12 ASC NULLS LAST
02)--Projection: log(aggregate_test_100.c12, CAST(aggregate_test_100.c11 AS Float64)) AS log_c11_base_c12
02)--Projection: log(aggregate_test_100.c12, aggregate_test_100.c11) AS log_c11_base_c12
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c12 is f64, c11 is f32; now log() will handle casting them to the right types internally for it's implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is probably more efficient than trying to cast externally anyways

Comment on lines 861 to 865
# TODO: this should be 116.267483321058, error with native decimal log impl
query R
select log(2.0, 100000000000000000000000000000000000::decimal(38,0));
----
116.267483321058
116
Copy link
Contributor Author

@Jefffrey Jefffrey Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting; we weren't actually making use of the native decimal implementation for log here apparently. The decimal value was actually being casted to float. Fixing the signature to have decimals take precedence (so they don't get casted to float) reveals a bug in our decimal log implementation.

Note to self: raise separate issue for this

Comment on lines +270 to +275
// Null propagation
if arg_types.iter().any(|dt| dt.is_null()) {
return Ok(ExprSimplifyResult::Simplified(lit(
ScalarValue::Null.cast_to(&return_type)?
)));
}
Copy link
Contributor Author

@Jefffrey Jefffrey Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So any log(null, x), log(x, null) or log(null) should be immediately simplified to null.

I'll add a test case to show this actually has an effect.


impl LogFunc {
pub fn new() -> Self {
// Converts decimals & integers to float64, accepting other floats as is
Copy link
Contributor Author

@Jefffrey Jefffrey Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main fix here for signature; to me this is more readable than the previous signature, and importantly it accepts any decimals regardless of precision/scale.

Note to self: add tests for decimals of different precision/scale if they don't exist


// Support overloaded log(base, x) and log(x) which defaults to log(10, x)
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
let args = ColumnarValue::values_to_arrays(&args.args)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was converting all the args to array, losing optimization opportunity if the base was a scalar. Fix so we maintain the scalar throughout. the value will need to be an array because calculate_binary_math requires its left argument (value for us) to be an array anyway:

pub fn calculate_binary_math<L, R, O, F>(
left: &dyn Array,
right: &ColumnarValue,
fun: F,
) -> Result<Arc<PrimitiveArray<O>>>

Float64Type,
_,
>(value, &base, |x, b| Ok(f64::log(x, b)))?,
DataType::Int32 => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed arms for Int32 & Int64 since they should now be casted to Float64 by the new signature; this is functionally equivalent to what the implementations for Int32 & Int64 did anyway

Comment on lines +709 to +718
# Null propagation for log
query TT
EXPLAIN SELECT log(NULL, c2) from aggregate_simple;
----
logical_plan
01)Projection: Float64(NULL) AS log(NULL,aggregate_simple.c2)
02)--TableScan: aggregate_simple projection=[]
physical_plan
01)ProjectionExec: expr=[NULL as log(NULL,aggregate_simple.c2)]
02)--DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_simple.csv]]}, file_type=csv, has_header=true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main this fails like so:

1. query result mismatch:
[SQL] EXPLAIN SELECT log(NULL, c2) from aggregate_simple;
[Diff] (-expected|+actual)
    logical_plan
-   01)Projection: Float64(NULL) AS log(NULL,aggregate_simple.c2)
-   02)--TableScan: aggregate_simple projection=[]
+   01)Projection: log(Float64(NULL), aggregate_simple.c2) AS log(NULL,aggregate_simple.c2)
+   02)--TableScan: aggregate_simple projection=[c2]
    physical_plan
-   01)ProjectionExec: expr=[NULL as log(NULL,aggregate_simple.c2)]
-   02)--DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_simple.csv]]}, file_type=csv, has_header=true
+   01)ProjectionExec: expr=[log(NULL, c2@0) as log(NULL,aggregate_simple.c2)]
+   02)--RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
+   03)----DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_simple.csv]]}, projection=[c2], file_type=csv, has_header=true
at /Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/math.slt:710
  • On main it still computes the log; in this PR the log function gets optimized away entirely

@Jefffrey Jefffrey marked this pull request as ready for review November 7, 2025 06:57
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over this PR and it looks really nice to me @Jefffrey - i don't have anything to add -- it is well tested, well commented, and well structured 🏆

Thank you @martin-g for the review 🙏

logical_plan
01)Sort: log_c11_base_c12 ASC NULLS LAST
02)--Projection: log(aggregate_test_100.c12, CAST(aggregate_test_100.c11 AS Float64)) AS log_c11_base_c12
02)--Projection: log(aggregate_test_100.c12, aggregate_test_100.c11) AS log_c11_base_c12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is probably more efficient than trying to cast externally anyways

@Jefffrey Jefffrey added this pull request to the merge queue Nov 8, 2025
Merged via the queue into apache:main with commit c826009 Nov 8, 2025
33 checks passed
@Jefffrey Jefffrey deleted the refactor-log-sig branch November 8, 2025 05:37
hsiang-c pushed a commit to hsiang-c/datafusion that referenced this pull request Nov 9, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

Part of apache#14763 and apache#14760

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Current `log()` signature has some drawbacks:


https://github.com/apache/datafusion/blob/a5eb9121ccf802dda547897155403b08a4fbf774/datafusion/functions/src/math/log.rs#L78-L105

- A bit nasty to look at: mixes numeric with exact float/int with exact
decimal (of exact precision and scale)
- Can't accommodate arbitrary decimals of any precision/scale (this is
true for other functions too)

Aim of this PR is to refactor it to use the coercion API, uplifting the
API where necessary to make this possible. This simplifies the signature
in code, whilst not losing flexibility.

Also other minor refactors are included to log.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

New `TypeSignatureClass` variants: Float, Decimal & Numeric

Refactor `log()` signature to be more in line with it's supported
implementations.

Fix issue in `log()` where `ColumnarValue::Scalar`s were being lost as
`ColumnarValue::Array`s for the base.

Support null propagation in `simplify()` for `log()`.

~~Fix issue with `calculate_binary_math` where it wasn't casting
scalars.~~

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added new tests.

- Tests for float16, decimal32, decimal64, decimals with different
scales/precisions
- Test for null propagation (ensure use array input to avoid function
inlining)

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

Part of apache#14763 and apache#14760

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Current `log()` signature has some drawbacks:


https://github.com/apache/datafusion/blob/a5eb9121ccf802dda547897155403b08a4fbf774/datafusion/functions/src/math/log.rs#L78-L105

- A bit nasty to look at: mixes numeric with exact float/int with exact
decimal (of exact precision and scale)
- Can't accommodate arbitrary decimals of any precision/scale (this is
true for other functions too)

Aim of this PR is to refactor it to use the coercion API, uplifting the
API where necessary to make this possible. This simplifies the signature
in code, whilst not losing flexibility.

Also other minor refactors are included to log.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

New `TypeSignatureClass` variants: Float, Decimal & Numeric

Refactor `log()` signature to be more in line with it's supported
implementations.

Fix issue in `log()` where `ColumnarValue::Scalar`s were being lost as
`ColumnarValue::Array`s for the base.

Support null propagation in `simplify()` for `log()`.

~~Fix issue with `calculate_binary_math` where it wasn't casting
scalars.~~

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added new tests.

- Tests for float16, decimal32, decimal64, decimals with different
scales/precisions
- Test for null propagation (ensure use array input to avoid function
inlining)

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate functions Changes to functions implementation logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants