Skip to content

Format Date32 to string given timestamp specifiers #15361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

friendlymatthew
Copy link
Contributor

@friendlymatthew friendlymatthew commented Mar 22, 2025

Rationale for this change

Datafusion currently errs when attempting to format a date using time-related specifiers.

> select to_char('2023-09-04'::date, '%Y-%m-%dT%H:%M:%S%.3f');
Execution error: Cast error: Format error

However, Postgres supports this feature as it implicitly treats the date as a timestamp.

Rather than eagerly casting every Date32 to a Date64 when calling to_char, this commit attempts to first format a Date32 with the supplied format string. If the formatting fails, we try to reformat as a Date64. This way, only format strings with time-related specifiers endure the intermediary cast.

All changes are tested, specifically for two different call sites: _to_char_scalar and _to_char_array.

@github-actions github-actions bot added the functions Changes to functions implementation label Mar 22, 2025
@friendlymatthew friendlymatthew force-pushed the friendlymatthew/date32-with-ts-format branch from 52e1d29 to 1e57ed1 Compare March 22, 2025 21:24
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @friendlymatthew -- something seems strange to me about this PR in that it is catching errors and reformatting. It seems like the allowable format options should be known based on the input data type 🤔

@@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) -> Result<ColumnarValue> {
let result = formatter.value(idx).try_to_string();
match result {
Ok(value) => results.push(Some(value)),
Err(e) => return exec_err!("{}", e),
Err(e) => {
if data_type == &Date32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is handling errors -- shouldn't we just dispach on type when building the format options?

Copy link
Contributor Author

@friendlymatthew friendlymatthew Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is handling errors -- shouldn't we just dispach on type when building the format options?

Sorry.

I assume you mean detect whether the format options contain time specifiers ahead of time and if so cast the Date32 into a Date64.

Or are you saying avoid the cast altogether? A FormatOption.with_datetime_format() won't work with Date32, as Arrow doesn't consider the datetime_format field when formatting a Date32.

#[test]
fn unaffected_format() {
    let date_format = "%Y-%m %H";
    
    let cast_options = CastOptions {
        safe: false,
        format_options: FormatOptions::default().with_datetime_format(Some(date_format)),
    };
    
    let array = Date32Array::from(vec![10000, 17890]);
    let res = cast_with_options(&array, &DataType::Utf8, &cast_options);
    dbg!(&res); // StringArray(["1997-05-19", "2018-12-25"])
}

Initially, I figured if the initial format attempt fails with a Date32, it'll retry as a Date64. This is similar to how Postgres tries to parse an interval string under the normal standard, and if that errs, it tries to parse the string again using the ISO8601 standard.

Regardless, I'll be happy to refactor this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I do find it very hard to understand all the nested error handling

Perhaps you could pull the retry logic into a function that handles the retry and is well documented (I think your description above makes a lot of sense, but it is not at all clear from the code in my mind)

Copy link
Contributor Author

@friendlymatthew friendlymatthew Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look and reached a different conclusion from above. I fear 1e57ed1 is performing way too many casts in the retry logic, and it is better to implicitly treat Date32 as timestamps ahead of time.

to_char invokes either to_char_scalar or to_char_array based on the number of formats provided in the arguments (i.e. N dates with 1 format string or N dates with N format strings). If any of the dates err when formatting, we'd cast all N dates as Date64s and redo the body of work.

While this would be fine with to_char_scalar because a format string with a time-specifier would err for any Date32, so at worst it'd err, cast, and retry after the first format attempt. But in to_char_array where we're iterating through N format strings, at worst we'd need to recast the entire Date32 array N times*.

Since we can't avoid the Date32 to Date64 cast to support this feature, it's much simpler to check ahead of time and cast the input Date32 array into a Date64 array. Plus, all date-specifiers are valid in timestamp formatting, so existing features work as expected.

I pushed ec2373a as a proof of concept. But I'd be happy to rework it as you see fit.


*The entire Date32 array because for every format string, we create a new ArrayFormatter by supplying the entire array and specifying the index for the formatter to format:

// this isn't ideal but this can't use ValueFormatter as it isn't independent
// from ArrayFormatter
let formatter = ArrayFormatter::try_new(arrays[0].as_ref(), &format_options)?;
let result = formatter.value(idx).try_to_string();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in to_char_array where we're iterating through N format strings, at worst we'd need to recast the entire Date32 array N times.

We could make the Date32 array mut and keep it outside of the loop. Then when we encounter the first failed format attempt, we'd cast the date array to a Date64 array and treat the subsequent dates as Date64s. This way, we limit the cast to only once per batch.

But this adds a lot of unnecessary code complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit and was mentally toying with the idea of testing the format string(s) for time formats vs just casting always ... and generally I don't think it would be worth it at all. Casting Date32 to Date64 should be incredibly cheap I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*The entire Date32 array because for every format string, we create a new ArrayFormatter by supplying the entire array and specifying the index for the formatter to format:

// this isn't ideal but this can't use ValueFormatter as it isn't independent
// from ArrayFormatter
let formatter = ArrayFormatter::try_new(arrays[0].as_ref(), &format_options)?;
let result = formatter.value(idx).try_to_string();

Yeah, this is a limitation of arrow-rs currently. I never got around to pushing a PR to arrow-rs to change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting Date32 to Date64 should be incredibly cheap I think

I agree, converting involves a cast between primitives and a mult.

https://github.com/apache/arrow-rs/blob/206fe024cdd9b9de770714baa27b58a1442fcc75/arrow-cast/src/cast/mod.rs#L1423-L1427

Yeah, this is a limitation of arrow-rs currently. I never got around to pushing a PR to arrow-rs to change that.

I'd be happy to work on this next, if you'd like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a limitation of arrow-rs currently. I never got around to pushing a PR to arrow-rs to change that.

I'd be happy to work on this next, if you'd like.

It would clean things up but I'm not sure if it would be a measurable performance improvement or not. Leaning to it not being all that noticeable but I'm often wrong :)

@friendlymatthew friendlymatthew marked this pull request as draft March 26, 2025 16:19
@friendlymatthew friendlymatthew marked this pull request as ready for review March 27, 2025 03:00
@friendlymatthew friendlymatthew force-pushed the friendlymatthew/date32-with-ts-format branch from 88d304e to ec2373a Compare March 27, 2025 03:24
@friendlymatthew friendlymatthew requested a review from alamb March 29, 2025 13:59
@@ -220,6 +221,13 @@ fn _to_char_scalar(
}
}

// eagerly cast Date32 values to Date64 to support date formatting with time-related specifiers
// without error.
if data_type == &Date32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @Omega359 says, this will now penalize performance for all existing Date32 columns

Is there any way we can check if the format string contains any time related specifiers before doing this conversion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to check for time formats in the format string however I'm not sure that cost would be any less than just doing the conversion up front. I think we need a benchmark to have quantified data vs assumptions.

Copy link
Contributor

@Omega359 Omega359 Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some data from a quick little benchmark I wrote

test_date32_to_date64_cast_array_1000
                        time:   [311.06 ns 314.48 ns 318.16 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

test_parse_performance_with_time_specifiers_array_1000
                        time:   [343.79 ns 351.64 ns 359.98 ns]
Found 14 outliers among 100 measurements (14.00%)
  10 (10.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

test_parse_performance_without_time_specifiers_array_1000
                        time:   [196.59 µs 201.06 µs 206.45 µs]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe

test_date32_to_date64_cast_array_1000: just casts from date32 to date64

test_parse_performance_with_time_specifiers_array_1000: parses the formats looking for time specifiers and when found does the cast from date32 to date64

test_parse_performance_without_time_specifiers_array_1000: parses the formats looking for time specifiers (doesn't find anything), no cast

Note that results on my machine will vary +-10% between runs. The check for time specifiers in the format is simple and could be improved but I think is enough to show general performance. Note that the list of time specifiers is not complete

const TIME_SPECIFIERS: &[&str] = &[
    "%H", "%M", "%S", "%c", "%f", "%k", "%I", "%l", "%p", "%R", "%T", "%x", "%r", "%Z",
];

fn has_time_specifiers(str: &str) -> bool {
    for specifier in TIME_SPECIFIERS {
        if str.contains(specifier) {
            return true;
        }
    }

    false
}

A couple of takeaways from this. casting is not as cheap as I thought however parsing is seems to be more expensive than that but perhaps with some really good optimizations it could be reduced.

I don't see a good way to make this feature have no cost with Date32 && no time specifiers in the format.

@Omega359
Copy link
Contributor

Omega359 commented Apr 2, 2025

I went ahead and did a quick poc using format parsing, you can see it @ https://github.com/Omega359/arrow-datafusion/tree/to_char_date32_with_time_formats (main...Omega359:arrow-datafusion:to_char_date32_with_time_formats)

I haven't done a comparison with the benchmark results from main yet.

@friendlymatthew
Copy link
Contributor Author

I went ahead and did a quick poc using format parsing, you can see it @ https://github.com/Omega359/arrow-datafusion/tree/to_char_date32_with_time_formats (main...Omega359:arrow-datafusion:to_char_date32_with_time_formats)

I haven't done a comparison with the benchmark results from main yet.

I was planning on getting to this during the weekend, but wow, this is super awesome @Omega359! I love how has_time_specifier will early return true when encountering the first time-related specifier.

Let me know how you want to proceed, I'd be happy to help out with benchmarking.

@Omega359
Copy link
Contributor

Omega359 commented Apr 3, 2025

@friendlymatthew - Running the benchmarks between main and the sidebranch would be very helpful - it should give us an idea as to the overhead of the format parsing for date32 data.

@friendlymatthew
Copy link
Contributor Author

@friendlymatthew - Running the benchmarks between main and the sidebranch would be very helpful - it should give us an idea as to the overhead of the format parsing for date32 data.

Sounds great. I'll have some free time tomorrow to work on this.

@friendlymatthew
Copy link
Contributor Author

I went ahead and did a quick poc using format parsing, you can see it @ https://github.com/Omega359/arrow-datafusion/tree/to_char_date32_with_time_formats (main...Omega359:arrow-datafusion:to_char_date32_with_time_formats)

I haven't done a comparison with the benchmark results from main yet.

@Omega359 would you be open to committing to this branch? This way, I could base the benchmarking work on top of your work.

If we decide to on a different approach than the experimented one, we could just roll back to a prior commit.

@Omega359
Copy link
Contributor

Omega359 commented Apr 4, 2025

Your branch you mean? I can try and push a PR to your branch once I figure out how :)

@friendlymatthew
Copy link
Contributor Author

friendlymatthew commented Apr 4, 2025

Your branch you mean? I can try and push a PR to your branch once I figure out how :)

I just sent an invite to collaborate on my fork. From there you should be able to directly push a commit to this branch!

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 5, 2025
@Omega359
Copy link
Contributor

Omega359 commented Apr 5, 2025

I messed up and pushed directly to your branch vs a PR, sorry about that @friendlymatthew

@friendlymatthew
Copy link
Contributor Author

I messed up and pushed directly to your branch vs a PR, sorry about that @friendlymatthew

Not at all, thank you and happy to collaborate on this together!

@friendlymatthew
Copy link
Contributor Author

friendlymatthew commented Apr 6, 2025

Some benchmark results when comparing main with this branch:

(TLDR) This branch will make the existing to_char paths slower, at the cost of adding the ability to format time-specifiers.

I compared two benchmark functions: to_char_array_date_only_patterns_1000 and to_char_scalar_date_only_pattern_1000.

These functions are concerned with formatting dates with date-only specifiers.

From main:

to_char_array_date_only_patterns_1000
                        time:   [136.39 µs 136.68 µs 137.03 µs]

to_char_scalar_date_only_pattern_1000
                        time:   [92.084 µs 95.999 µs 100.12 µs]

And when compared to this branch:

to_char_array_date_only_patterns_1000
                        time:   [173.46 µs 174.21 µs 175.10 µs]
                        change: [+25.159% +25.881% +26.651%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild


to_char_scalar_date_only_pattern_1000
                        time:   [105.04 µs 109.46 µs 113.80 µs]
                        change: [+6.6648% +11.410% +16.429%] (p = 0.00 < 0.05)
                        Performance has regressed.

To reproduce:

You can run the two benchmark functions by:

cargo bench --package datafusion-functions "to_char_array_date_only_patterns_1000|to_char_scalar_date_only_pattern_1000"

Since the benchmark function names clash between main and this branch, you can consider this branch as main. (This branch only contains the refactor benchmark logic).

cc/ @Omega359

@alamb
Copy link
Contributor

alamb commented Apr 7, 2025

(TLDR) This branch will make the existing to_char paths slower, at the cost of adding the ability to format time-specifiers.

Slowing down existing functionality for current users for some new functionality doesn't sound like a great tradeoff to me -- if users want this behavior they can always override the to_char implementation

If we can retain the same performance for existing to_char but add the new feature I think that is much more reasonable

@Omega359
Copy link
Contributor

Omega359 commented Apr 7, 2025

It's only slower when date32 is the input type. It's going to be hard to have this functionality for formats with timestamp specifiers but not have some sort of impact on those with just date specifiers. The only way I can think of to handle that is via catching the error when the parsing fails and retrying with date64.

To put some context on this:

  • The 25% is for the case where the use has an array of 1000 dates and 1000 formats (the table case)
  • The 7% is for the far more common case of 1000 dates and a single format. That is the overhead of chrono parsing a format string and doing matches on that.
  • This does not affect any other input types.

@friendlymatthew
Copy link
Contributor Author

It may be beneficial to leave the existing code paths undisturbed and decide whether to retry on err.

It avoids the overhead of custom chrono::Strftime::parse validation. Plus, the retry logic for to_char_scalar is simple to implement. Something like:

to_char_scalar
  let formatter = ArrayFormatter::try_new(array.as_ref(), &format_options)?;
  let formatted: Result<Vec<Option<String>>, ArrowError> = (0..array.len())
      .map(|i| {
          if array.is_null(i) {
              Ok(None)
          } else {
              formatter.value(i).try_to_string().map(Some)
          }
      })
      .collect(); /*
                           will stop iterating upon the first err and 
                           since we're dealing with one format string, 
                           it'll halt after the first format attempt
                    */

  if let Ok(formatted) = formatted {
      if is_scalar_expression {
          Ok(ColumnarValue::Scalar(ScalarValue::Utf8(
              formatted.first().unwrap().clone(),
          )))
      } else {
          Ok(ColumnarValue::Array(
              Arc::new(StringArray::from(formatted)) as ArrayRef
          ))
      }
  } else {
      // if the format attempt is with a Date32, it's possible the attempt failed because
      // the format string contained time-specifiers, so we'll retry as Date64s
      if data_type == &Date32 {
          return to_char_scalar(expression.clone().cast_to(&Date64, None)?, format);
      }

      exec_err!("{}", formatted.unwrap_err())
  }

to_char_array is a bit more complex (see #15361 (comment)), but I'll think it through.

FWIW, it's worth implementing the retry logic and benchmarking the performance. I'm happy to do the work. The decision between eager casting and selective retry likely comes down to whether we're willing to slightly slow down existing behavior in exchange for better performance in the new case. But I'm just spitballing.

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/date32-with-ts-format branch 2 times, most recently from a855bb0 to c69180b Compare April 10, 2025 15:10
@friendlymatthew friendlymatthew force-pushed the friendlymatthew/date32-with-ts-format branch from 76919c4 to 01b1bc0 Compare April 10, 2025 15:11
@friendlymatthew
Copy link
Contributor Author

friendlymatthew commented Apr 10, 2025

Note: I think I found a better solution. I'll update this report when ready

f906af5 is a proof of concept that involves selective retry on err. There's benchmark results below!

Quick overview

The to_char_scalar is quite trivial-- we deal with 1 format string for N Date32s, the first format err will trigger a retry, treating the Date32 array as a Date64 array.

The to_char_array is trickier since we deal with N format strings for N Date32s. When a format error occurs at the ith Date32, we'll retry that specific date as a Date64. And rather than recast the entire input Date32 array as a Date64 array, we slice into the array and retrieve + cast the faulty Date32 into a Date64.

Another method I considered was casting the input Date32 array at most once. When the first occurrence of a format err occurs, we'll just recast the entire input array as a Date64 array. So that subsequent format strings with time-specifiers will format without err.

I don't really like this, because I'd like to cast from Date32 to Date64 very sparingly. Also, if we have a 1000 Date32 and a 1000 format strings, and it just happens that the last format string contains time-specifiers, we'd endure the pain of casting the entire input array as a Date64 array just to format the last element.

Benchmarks

Selective retry vs. main

The selective retry doesn't disturb the existing code paths. When benchmarking with main, we see little to no variance.

# The selective retry code was benched first. This is the benchmark result when running on main.

to_char_array_date_only_patterns_1000
                        time:   [136.87 µs 137.13 µs 137.49 µs]
                        change: [-0.9784% -0.7511% -0.4721%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

to_char_scalar_date_only_pattern_1000
                        time:   [101.26 µs 104.97 µs 108.40 µs]
                        change: [-1.5505% +2.7529% +7.4144%] (p = 0.21 > 0.05)
                        No change in performance detected.

Eager cast vs. selective retry

Since selective retry doesn't disturb the existing paths, those cases also run much faster when compared to the eager cast approach. Plus, the selective retry approach sees a 3% improvement in to_char_scalar than the eager cast approach with datetime format strings.

The tradeoff here is that the new datetime formatting features will suffer and is slower than the eager cast approach for the contrived case that @Omega359 mentioned here. (1000 Date32s with 1000 datetime format specifiers).

But this isn't the worst thing in the world. In a case where someone wants to format a 1000 Date32s with a 1000 different datetime format strings, they could easily perform an intermediate cast before running to_char.

# The eager cast code was benched first. 
# This is the benchmark result when running the selective retry code. 

to_char_array_date_only_patterns_1000
                        time:   [137.99 µs 138.20 µs 138.44 µs]
                        change: [-21.564% -20.959% -20.538%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

to_char_array_datetime_patterns_1000
                        time:   [503.48 µs 506.55 µs 509.87 µs]
                        change: [+110.95% +112.69% +114.12%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

to_char_array_mixed_patterns_1000
                        time:   [323.05 µs 325.66 µs 328.11 µs]
                        change: [+65.744% +66.705% +67.701%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

to_char_scalar_date_only_pattern_1000
                        time:   [96.153 µs 100.02 µs 103.88 µs]
                        change: [-12.058% -8.2654% -4.3595%] (p = 0.00 < 0.05)
                        Performance has improved.

to_char_scalar_datetime_pattern_1000
                        time:   [177.38 µs 186.04 µs 194.64 µs]
                        change: [-11.383% -7.3528% -3.2635%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  14 (14.00%) low mild
  6 (6.00%) high mild

to_char_scalar_1000     time:   [312.03 ns 316.56 ns 320.91 ns]
                        change: [-4.3628% -3.1215% -1.7750%] (p = 0.00 < 0.05)
                        Performance has improved.

If you want to run these benchmarks yourself, you can run:

# use this to compare the alternate approaches (eager cast vs. selective retry)
cargo bench --package datafusion-functions "to_char"
# use this to compare approaches to main
cargo bench --package datafusion-functions "to_char_array_date_only_patterns_1000|to_char_scalar_date_only_pattern_1000"

cc @Omega359 @alamb

@Omega359
Copy link
Contributor

Thanks for the additional work on this @friendlymatthew ! I think this approach is solid - the overhead for the casting is limited to only the cases where the format string includes time type specifiers and otherwise the performance is unchanged.

@friendlymatthew
Copy link
Contributor Author

friendlymatthew commented Apr 11, 2025

Note: I'm sorry about the super long write ups. I'm not trying to bike shed.

I was thinking about f906af5 and realized we could do more.

The selective retry approach will extract every failed Date32 and cast it to a Date64. If we have M failed Date32's, we'll still recast M times. It's just that every cast will have a single element in the array.

We can use the cast even more sparingly, by keeping track of consecutive failed Date32s.

Suppose you had a 1000 format attempts, and a contiguous block of those format attempts erred because they contained datetime specifiers:

|----------|xxxxxxxxxxxxx|-------|
   valid      invalid       valid

For every invalid Date32, we'll create an array of size 1 and recast that array to a Date64 array.

b379fe1 will keep track of consecutive failed Date32 values. By batching the failed Date32's, we can call cast once. And of course, we preserve the order.

This approach also helps with readability and saves LOC, to_char_inner gets recursively invoked with the newly casted timestamp array.

Benchmarks

She leaves the existing code paths undisturbed like usual. While performance has slightly regressed, I really like how this approach looks and would prefer to call cast sparingly and make use of the batching.

I'm also a bit surprised that calling cast with a single element multiple times runs faster than calling cast with multiple elements once. I figured batching would improve perf.

# I ran the initial selective retry approach first. 
# This run includes the new logic mentioned above.

to_char_array_date_only_patterns_1000
                        time:   [136.92 µs 137.15 µs 137.39 µs]
                        change: [-3.1659% -2.3990% -1.6897%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  7 (7.00%) high mild
  12 (12.00%) high severe

to_char_array_datetime_patterns_1000
                        time:   [543.24 µs 543.84 µs 544.47 µs]
                        change: [+4.4460% +6.0837% +7.4847%] (p = 0.00 < 0.05)
                        Performance has regressed.

to_char_array_mixed_patterns_1000
                        time:   [339.68 µs 342.60 µs 345.66 µs]
                        change: [-5.3101% -2.9305% -0.8615%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

to_char_scalar_date_only_pattern_1000
                        time:   [95.107 µs 98.972 µs 102.90 µs]
                        change: [-5.9031% -1.2243% +3.4337%] (p = 0.60 > 0.05)
                        No change in performance detected.

to_char_scalar_datetime_pattern_1000
                        time:   [171.09 µs 179.74 µs 188.39 µs]
                        change: [-4.0895% +1.3179% +7.1836%] (p = 0.64 > 0.05)
                        No change in performance detected.

to_char_scalar_1000     time:   [315.63 ns 319.69 ns 323.50 ns]
                        change: [-7.9240% -6.5685% -5.1341%] (p = 0.00 < 0.05)
                        Performance has improved.

cc @Omega359 @alamb

@friendlymatthew friendlymatthew requested a review from alamb April 13, 2025 22:56
@Omega359
Copy link
Contributor

To put my notes here I prefer the version before the addition of the consecutive failures check - simpler and faster.

@xudong963 xudong963 self-requested a review April 28, 2025 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: support to_char(date, timstamp format)
3 participants