Skip to content

Commit aaaf90d

Browse files
uros-dbattilapiros
authored andcommitted
[SQL][TEST] Re-run collation benchmark
### What changes were proposed in this pull request? Re-running the collation benchmark with two modifications: - UTF8_BINARY_LCASE has been renamed to UTF8_LCASE in apache#46924 - UTF8_BINARY should appear first in the collation benchmark results, so performance is relative to it ### Why are the changes needed? We've changed the meaning of LCASE collation in Spark, and also modified how equality checks / hashing/ expressions work with this collation, so we need to re-run the benchmarks and identify areas of improvement. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Rxisting tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47030 from uros-db/collation-benchmarks. Authored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent b015d73 commit aaaf90d

File tree

3 files changed

+61
-61
lines changed

3 files changed

+61
-61
lines changed
Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,54 @@
1-
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
1+
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
22
AMD EPYC 7763 64-Core Processor
33
collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
44
--------------------------------------------------------------------------------------------------------------------------
5-
UTF8_BINARY_LCASE 2948 2958 13 0.0 29483.6 1.0X
6-
UNICODE 2040 2042 3 0.0 20396.6 1.4X
7-
UTF8_BINARY 2043 2043 0 0.0 20426.3 1.4X
8-
UNICODE_CI 16318 16338 28 0.0 163178.4 0.2X
5+
UTF8_BINARY 1355 1358 4 0.1 13551.1 1.0X
6+
UTF8_LCASE 4983 4984 3 0.0 49826.4 0.3X
7+
UNICODE 18212 18220 12 0.0 182120.9 0.1X
8+
UNICODE_CI 17568 17577 14 0.0 175677.2 0.1X
99

10-
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
10+
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
1111
AMD EPYC 7763 64-Core Processor
1212
collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
1313
---------------------------------------------------------------------------------------------------------------------------
14-
UTF8_BINARY_LCASE 3227 3228 1 0.0 32272.1 1.0X
15-
UNICODE 16637 16643 9 0.0 166367.7 0.2X
16-
UTF8_BINARY 3132 3137 7 0.0 31319.2 1.0X
17-
UNICODE_CI 17816 17829 18 0.0 178162.4 0.2X
14+
UTF8_BINARY 1772 1774 3 0.1 17722.3 1.0X
15+
UTF8_LCASE 4365 4365 0 0.0 43649.6 0.4X
16+
UNICODE 16538 16544 9 0.0 165375.5 0.1X
17+
UNICODE_CI 16296 16305 12 0.0 162961.9 0.1X
1818

19-
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
19+
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
2020
AMD EPYC 7763 64-Core Processor
2121
collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
2222
------------------------------------------------------------------------------------------------------------------------
23-
UTF8_BINARY_LCASE 4824 4824 0 0.0 48243.7 1.0X
24-
UNICODE 69416 69475 84 0.0 694158.3 0.1X
25-
UTF8_BINARY 3806 3808 2 0.0 38062.8 1.3X
26-
UNICODE_CI 60943 60975 45 0.0 609426.2 0.1X
23+
UTF8_BINARY 7279 7280 1 0.0 72791.2 1.0X
24+
UTF8_LCASE 18538 18543 6 0.0 185381.0 0.4X
25+
UNICODE 71514 71520 8 0.0 715144.6 0.1X
26+
UNICODE_CI 60488 60488 0 0.0 604880.9 0.1X
2727

28-
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
28+
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
2929
AMD EPYC 7763 64-Core Processor
3030
collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
3131
------------------------------------------------------------------------------------------------------------------------
32-
UTF8_BINARY_LCASE 11979 11980 1 0.0 119790.4 1.0X
33-
UNICODE 6469 6474 7 0.0 64694.8 1.9X
34-
UTF8_BINARY 7253 7253 1 0.0 72528.3 1.7X
35-
UNICODE_CI 319124 319881 1070 0.0 3191244.0 0.0X
32+
UTF8_BINARY 7516 7519 4 0.0 75162.9 1.0X
33+
UTF8_LCASE 120330 120338 12 0.0 1203299.2 0.1X
34+
UNICODE 371784 371946 228 0.0 3717840.7 0.0X
35+
UNICODE_CI 427401 427547 207 0.0 4274009.0 0.0X
3636

37-
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
37+
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
3838
AMD EPYC 7763 64-Core Processor
3939
collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
4040
------------------------------------------------------------------------------------------------------------------------
41-
UTF8_BINARY_LCASE 11584 11595 15 0.0 115841.4 1.0X
42-
UNICODE 6155 6156 2 0.0 61548.7 1.9X
43-
UTF8_BINARY 6979 6982 5 0.0 69785.6 1.7X
44-
UNICODE_CI 318228 318726 705 0.0 3182275.2 0.0X
41+
UTF8_BINARY 6504 6507 3 0.0 65044.6 1.0X
42+
UTF8_LCASE 60331 60359 40 0.0 603313.9 0.1X
43+
UNICODE 369394 369404 13 0.0 3693943.0 0.0X
44+
UNICODE_CI 427382 427421 55 0.0 4273819.7 0.0X
4545

46-
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
46+
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
4747
AMD EPYC 7763 64-Core Processor
4848
collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
4949
------------------------------------------------------------------------------------------------------------------------
50-
UTF8_BINARY_LCASE 11655 11664 12 0.0 116552.8 1.0X
51-
UNICODE 6235 6239 5 0.0 62350.8 1.9X
52-
UTF8_BINARY 7066 7069 5 0.0 70658.1 1.6X
53-
UNICODE_CI 313515 313999 685 0.0 3135149.1 0.0X
50+
UTF8_BINARY 6600 6601 1 0.0 66002.7 1.0X
51+
UTF8_LCASE 58723 58751 39 0.0 587230.1 0.1X
52+
UNICODE 379668 379789 172 0.0 3796677.7 0.0X
53+
UNICODE_CI 437119 437194 106 0.0 4371189.5 0.0X
5454

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,54 @@
1-
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
1+
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
22
AMD EPYC 7763 64-Core Processor
33
collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
44
--------------------------------------------------------------------------------------------------------------------------
5-
UTF8_BINARY_LCASE 3571 3576 7 0.0 35708.8 1.0X
6-
UNICODE 2235 2240 7 0.0 22349.2 1.6X
7-
UTF8_BINARY 2237 2242 6 0.0 22371.7 1.6X
8-
UNICODE_CI 18733 18817 118 0.0 187333.8 0.2X
5+
UTF8_BINARY 1370 1370 1 0.1 13698.4 1.0X
6+
UTF8_LCASE 4836 4836 0 0.0 48359.5 0.3X
7+
UNICODE 19239 19271 45 0.0 192391.8 0.1X
8+
UNICODE_CI 18936 18954 25 0.0 189362.4 0.1X
99

10-
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
10+
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
1111
AMD EPYC 7763 64-Core Processor
1212
collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
1313
---------------------------------------------------------------------------------------------------------------------------
14-
UTF8_BINARY_LCASE 4260 4290 41 0.0 42602.6 1.0X
15-
UNICODE 19536 19624 124 0.0 195360.2 0.2X
16-
UTF8_BINARY 3582 3612 43 0.0 35818.5 1.2X
17-
UNICODE_CI 20381 20454 103 0.0 203814.1 0.2X
14+
UTF8_BINARY 1726 1727 1 0.1 17260.4 1.0X
15+
UTF8_LCASE 6293 6304 16 0.0 62927.1 0.3X
16+
UNICODE 18677 18679 4 0.0 186768.3 0.1X
17+
UNICODE_CI 18488 18504 23 0.0 184879.6 0.1X
1818

19-
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
19+
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
2020
AMD EPYC 7763 64-Core Processor
2121
collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
2222
------------------------------------------------------------------------------------------------------------------------
23-
UTF8_BINARY_LCASE 7347 7349 3 0.0 73467.1 1.0X
24-
UNICODE 73462 73608 206 0.0 734623.2 0.1X
25-
UTF8_BINARY 5775 5815 57 0.0 57746.0 1.3X
26-
UNICODE_CI 57543 57619 108 0.0 575425.2 0.1X
23+
UTF8_BINARY 3028 3029 1 0.0 30283.4 1.0X
24+
UTF8_LCASE 19773 19830 81 0.0 197726.4 0.2X
25+
UNICODE 68565 68594 41 0.0 685646.9 0.0X
26+
UNICODE_CI 53100 53101 2 0.0 530996.0 0.1X
2727

28-
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
28+
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
2929
AMD EPYC 7763 64-Core Processor
3030
collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
3131
------------------------------------------------------------------------------------------------------------------------
32-
UTF8_BINARY_LCASE 15415 15424 13 0.0 154147.1 1.0X
33-
UNICODE 8091 8108 25 0.0 80907.9 1.9X
34-
UTF8_BINARY 8964 8979 21 0.0 89643.5 1.7X
35-
UNICODE_CI 469123 474822 8060 0.0 4691227.7 0.0X
32+
UTF8_BINARY 7024 7026 3 0.0 70244.6 1.0X
33+
UTF8_LCASE 118693 118703 15 0.0 1186926.5 0.1X
34+
UNICODE 385409 386299 1257 0.0 3854093.7 0.0X
35+
UNICODE_CI 434618 435527 1285 0.0 4346181.0 0.0X
3636

37-
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
37+
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
3838
AMD EPYC 7763 64-Core Processor
3939
collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
4040
------------------------------------------------------------------------------------------------------------------------
41-
UTF8_BINARY_LCASE 13064 13080 23 0.0 130635.2 1.0X
42-
UNICODE 6836 6851 22 0.0 68360.1 1.9X
43-
UTF8_BINARY 7693 7719 36 0.0 76933.9 1.7X
44-
UNICODE_CI 488919 495530 9349 0.0 4889190.5 0.0X
41+
UTF8_BINARY 6069 6090 29 0.0 60691.9 1.0X
42+
UTF8_LCASE 61809 61828 27 0.0 618094.5 0.1X
43+
UNICODE 370523 371729 1705 0.0 3705229.7 0.0X
44+
UNICODE_CI 435805 436945 1612 0.0 4358051.5 0.0X
4545

46-
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
46+
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
4747
AMD EPYC 7763 64-Core Processor
4848
collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
4949
------------------------------------------------------------------------------------------------------------------------
50-
UTF8_BINARY_LCASE 13097 13112 21 0.0 130970.4 1.0X
51-
UNICODE 6960 6985 34 0.0 69603.9 1.9X
52-
UTF8_BINARY 7766 7768 3 0.0 77663.5 1.7X
53-
UNICODE_CI 456956 470733 19485 0.0 4569556.7 0.0X
50+
UTF8_BINARY 6725 6732 10 0.0 67247.9 1.0X
51+
UTF8_LCASE 54990 55010 28 0.0 549896.0 0.1X
52+
UNICODE 380872 383258 3375 0.0 3808722.0 0.0X
53+
UNICODE_CI 443911 444111 283 0.0 4439112.3 0.0X
5454

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ import org.apache.spark.unsafe.types.UTF8String
2424

2525
abstract class CollationBenchmarkBase extends BenchmarkBase {
2626
protected val collationTypes: Seq[String] =
27-
Seq("UTF8_LCASE", "UNICODE", "UTF8_BINARY", "UNICODE_CI")
27+
Seq("UTF8_BINARY", "UTF8_LCASE", "UNICODE", "UNICODE_CI")
2828

2929
def generateSeqInput(n: Long): Seq[UTF8String]
3030

0 commit comments

Comments
 (0)