Optimize `date_trunc` function by avoiding allocations by alamb · Pull Request #18360 · apache/datafusion

alamb · 2025-10-29T18:28:38Z

Which issue does this PR close?

Related to Non-constant DATE_TRUNC expression regression for values before epoch #18334
Follow on to fix: correct date_trunc for times before the epoch #18356

Rationale for this change

@mhilton's change to date_trunc to make it correct also potentially slows it down

Let's try and recover the performance (and then some) so DataFusion 51 can be both more correct and faster

What changes are included in this PR?

Use try_unary_op methods to avoid allocating intermediate arrays to improve performance

Are these changes tested?

functionally by CI

I will run benchmarks on this PR as well

Are there any user-facing changes?

The array-based implementation of date_trunc can produce incorrect results for negative timestamps (i.e. dates before 1970-01-01). Check for any such incorrect values and compensate accordingly.

alamb · 2025-10-29T18:34:53Z

datafusion/functions/src/datetime/date_trunc.rs

-        (Nanosecond, "minute") => Some(Int64Array::new_scalar(60_000_000_000)),
-        (Nanosecond, "hour") => Some(Int64Array::new_scalar(3_600_000_000_000)),
-        (Nanosecond, "day") => Some(Int64Array::new_scalar(86_400_000_000_000)),
+    let unit: Option<i64> = match (tu, granularity) {


The key idea here is to make this code faster by reusing the allocation and operating in place rather than allocating new arrays

findepi · 2025-10-29T21:25:24Z

datafusion/functions/src/datetime/date_trunc.rs

+            i.checked_div(unit)
+                .ok_or_else(|| exec_datafusion_err!("division overflow"))
+        })?;
+        let array = try_unary_mut_or_clone(array, |i| {


technically speaking, only the first try_unary_mut_or_clone is needed
on the second transformation, we're guaranteed to be the pointer into the array, and the so taking the or_clone path would be an error

This is true, though I don't know how to represent this in code.

Maybe I could make a second function try_unary_mut_or_error that throws a runtime error 🤔

findepi · 2025-10-29T21:26:59Z

datafusion/functions/src/datetime/date_trunc.rs

+fn try_unary_mut_or_clone<F>(
+    array: PrimitiveArray<Int64Type>,
+    op: F,
+) -> Result<PrimitiveArray<Int64Type>>
+where
+    F: Fn(i64) -> Result<i64>,


not really date_trunc specific. can this be made more flexible with a more generous use of generics?
perhaps it could even be in arrow-rs. it makes try_unary_mut significantly more approachable

yes, I agree -- the try_unary_mut is quite awkward to use. I will see if I can port some of these changes upstream / see what they look like

I filed Hard to use PrimitiveArray::unary_mut, PrimitiveArray:try_unary_mut, etc arrow-rs#8808 to track this idea

alamb · 2025-10-30T08:24:46Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/issue-18344-opt (dc2b5f7) to 6cc73fa diff
BENCH_NAME=date_trunc
BENCH_COMMAND=cargo bench --bench date_trunc
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_issue-18344-opt
Results will be posted here when complete

alamb · 2025-10-30T08:29:44Z

🤖: Benchmark completed

Details

group                     alamb_issue-18344-opt                  main
-----                     ---------------------                  ----
date_trunc_minute_1000    1.10      5.2±0.02µs        ? ?/sec    1.00      4.8±0.01µs        ? ?/sec

alamb · 2025-11-07T15:05:17Z

We went with a dfferent approach , and I have filed the follow on ticket

Hard to use PrimitiveArray::unary_mut, PrimitiveArray:try_unary_mut, etc arrow-rs#8808

mhilton and others added 2 commits October 29, 2025 15:59

fix: correct date_trunc for times before the epoch

f70abf1

The array-based implementation of date_trunc can produce incorrect results for negative timestamps (i.e. dates before 1970-01-01). Check for any such incorrect values and compensate accordingly.

Optimized version of date_trunc

f4893c8

alamb added the performance Make DataFusion faster label Oct 29, 2025

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Oct 29, 2025

alamb changed the title ~~Alamb/issue 18344 opt~~ Optimize version of date_trunc Oct 29, 2025

alamb changed the title ~~Optimize version of date_trunc~~ Optimize date_trunc more Oct 29, 2025

alamb mentioned this pull request Oct 29, 2025

fix: correct date_trunc for times before the epoch #18356

Merged

alamb commented Oct 29, 2025

View reviewed changes

Comments / drop

dc2b5f7

alamb changed the title ~~Optimize date_trunc more~~ Optimize date_trunc function by avoiding allocations Oct 29, 2025

findepi reviewed Oct 29, 2025

View reviewed changes

alamb mentioned this pull request Nov 7, 2025

Hard to use PrimitiveArray::unary_mut, PrimitiveArray:try_unary_mut, etc apache/arrow-rs#8808

Open

alamb closed this Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `date_trunc` function by avoiding allocations#18360

Optimize `date_trunc` function by avoiding allocations#18360
alamb wants to merge 3 commits intoapache:mainfrom
alamb:alamb/issue-18344-opt

alamb commented Oct 29, 2025

Uh oh!

alamb Oct 29, 2025

Uh oh!

findepi Oct 29, 2025

Uh oh!

alamb Oct 29, 2025

Uh oh!

findepi Oct 29, 2025

Uh oh!

alamb Oct 29, 2025

Uh oh!

alamb Nov 7, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alamb commented Oct 29, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

findepi Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

findepi Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Oct 30, 2025

Uh oh!

alamb commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants