Skip to content

Conversation

@rluvaton
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

make stuff faster

What changes are included in this PR?

  1. updated Boolean scalar to array to use existing functions for creating BooleanArray with the same value n times
  2. Update None Binary/BinaryView/FixedSizeBinary/LargeBinary to use new_null_array instead of repeat

Are these changes tested?

Existing tests

Are there any user-facing changes?

Nope

@alamb
Copy link
Contributor

alamb commented Oct 20, 2025

Here is a related item: #18159

The CI test failure https://github.com/apache/datafusion/actions/runs/18657662446/job/53190612662?pr=18180 looks like it may be a flaky test (there is guarantee that shuffle does not leave the input array the same)

I will restart it

running 1 test
test engines::conversion::tests::test_big_decimal_to_str ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running bin/sqllogictests.rs (target/ci/deps/sqllogictests-1b770706aacdb3f8)
External error: 1 errors in file /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/shuffle.slt

1. query result mismatch:
[SQL] SELECT shuffle([1, 2, 3, 4, 5, NULL]) != [1, 2, 3, 4, 5, NULL];
[Diff] (-expected|+actual)
-   true
+   false
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/shuffle.slt:24

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton

@adriangb
Copy link
Contributor

I wonder if this will benefit from #16789 or if we could expand that sort of approach to more places.

@rluvaton rluvaton added this pull request to the merge queue Oct 20, 2025
Merged via the queue into apache:main with commit 54fff60 Oct 20, 2025
51 of 52 checks passed
@rluvaton rluvaton deleted the improve-scalar-perf branch October 20, 2025 19:55
@rluvaton
Copy link
Member Author

I wonder if this will benefit from #16789 or if we could expand that sort of approach to more places.

I'm not familiar with that PR, do you plan to replace all nulls creation?

@adriangb
Copy link
Contributor

The basic idea is that the num_rows parameter is largely constant (the batch size). IIRC I only implemented caching in that PR for dictionary keys and arrays of type null but there's not reason we couldn't expand this to more data types and call sites. Ultimately any call to array_of_size where the ScalarValue is null should be able to re-use the same underlaying buffer of all null values I think...

tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
…ll values (apache#18180)

## Which issue does this PR close?

N/A

## Rationale for this change

make stuff faster

## What changes are included in this PR?

1. updated Boolean scalar to array to use existing functions for
creating `BooleanArray` with the same value n times
2. Update None `Binary`/`BinaryView`/`FixedSizeBinary`/`LargeBinary` to
use `new_null_array` instead of repeat

## Are these changes tested?

Existing tests

## Are there any user-facing changes?

Nope
EeshanBembi pushed a commit to EeshanBembi/datafusion that referenced this pull request Nov 24, 2025
…ll values (apache#18180)

## Which issue does this PR close?

N/A

## Rationale for this change

make stuff faster

## What changes are included in this PR?

1. updated Boolean scalar to array to use existing functions for
creating `BooleanArray` with the same value n times
2. Update None `Binary`/`BinaryView`/`FixedSizeBinary`/`LargeBinary` to
use `new_null_array` instead of repeat

## Are these changes tested?

Existing tests

## Are there any user-facing changes?

Nope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants