[SQL][TEST] Re-run collation benchmark #47030

uros-db · 2024-06-19T12:17:11Z

What changes were proposed in this pull request?

Re-running the collation benchmark with two modifications:

UTF8_BINARY_LCASE has been renamed to UTF8_LCASE in [SPARK-48576][SQL] Rename UTF8_BINARY_LCASE to UTF8_LCASE #46924
UTF8_BINARY should appear first in the collation benchmark results, so performance is relative to it

Why are the changes needed?

We've changed the meaning of LCASE collation in Spark, and also modified how equality checks / hashing/ expressions work with this collation, so we need to re-run the benchmarks and identify areas of improvement.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

uros-db

adding @mkaravel to review

uros-db · 2024-06-20T16:28:24Z

also adding @dbatomic and @cloud-fan to review

cloud-fan · 2024-06-21T01:40:18Z

thanks, merging to master!

### What changes were proposed in this pull request? Following up on #47030, re-running the collation benchmark for NonASCII. ### Why are the changes needed? We've changed the meaning of LCASE collation in Spark, and also modified how equality checks / hashing/ expressions work with this collation, so we need to re-run the benchmarks and identify areas of improvement. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47054 from uros-db/collation-benchmarks-nonascii. Authored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? Re-running the collation benchmark with two modifications: - UTF8_BINARY_LCASE has been renamed to UTF8_LCASE in apache#46924 - UTF8_BINARY should appear first in the collation benchmark results, so performance is relative to it ### Why are the changes needed? We've changed the meaning of LCASE collation in Spark, and also modified how equality checks / hashing/ expressions work with this collation, so we need to re-run the benchmarks and identify areas of improvement. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Rxisting tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47030 from uros-db/collation-benchmarks. Authored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? Following up on apache#47030, re-running the collation benchmark for NonASCII. ### Why are the changes needed? We've changed the meaning of LCASE collation in Spark, and also modified how equality checks / hashing/ expressions work with this collation, so we need to re-run the benchmarks and identify areas of improvement. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47054 from uros-db/collation-benchmarks-nonascii. Authored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Re-run benchmark

507cf34

github-actions bot added the SQL label Jun 19, 2024

uros-db commented Jun 19, 2024

View reviewed changes

Update results

a42449d

cloud-fan approved these changes Jun 21, 2024

View reviewed changes

cloud-fan changed the title ~~[SQL] Re-run collation benchmark~~ [SQL][TEST] Re-run collation benchmark Jun 21, 2024

cloud-fan closed this in 6eb7978 Jun 21, 2024

uros-db mentioned this pull request Jun 21, 2024

[SQL][TEST][FOLLOWUP] Re-run collation benchmark (NonASCII) #47054

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SQL][TEST] Re-run collation benchmark #47030

[SQL][TEST] Re-run collation benchmark #47030

Uh oh!

uros-db commented Jun 19, 2024 •

edited

Loading

Uh oh!

uros-db left a comment

Uh oh!

uros-db commented Jun 20, 2024

Uh oh!

cloud-fan commented Jun 21, 2024

Uh oh!

Uh oh!

[SQL][TEST] Re-run collation benchmark #47030

[SQL][TEST] Re-run collation benchmark #47030

Uh oh!

Conversation

uros-db commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

uros-db left a comment

Choose a reason for hiding this comment

Uh oh!

uros-db commented Jun 20, 2024

Uh oh!

cloud-fan commented Jun 21, 2024

Uh oh!

Uh oh!

uros-db commented Jun 19, 2024 •

edited

Loading