[SPARK-54693][CORE][TESTS] Add LZ4TPCDSDataBenchmark#53453
[SPARK-54693][CORE][TESTS] Add LZ4TPCDSDataBenchmark#53453pan3793 wants to merge 9 commits intoapache:masterfrom
Conversation
|
TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about 5% slower on lz4 decompression even with migrating to suggested safeDecompressor (#53454) Here is my local test result: lz4-java 1.8.0 with fastDecompressor (state before aa65bda) lz4-java 1.10.1 with fastDecompressor (current master state, significant decompress perf drop!!!) lz4-java 1.10.1 with safeDecompressor (#53454) |
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
…21, Scala 2.13, split 1 of 1)
…17, Scala 2.13, split 1 of 1)
… (JDK 21, Scala 2.13, split 1 of 1)
… (JDK 17, Scala 2.13, split 1 of 1)
|
@LuciferYang, I did a refactor to make |
| /** | ||
| * Any code before running any benchmark, e.g., data preparation | ||
| */ | ||
| def beforeAll(): Unit = {} |
There was a problem hiding this comment.
My suggestion is not to add the beforeAll() method in the current pr If we find it to be widely applicable, we can add this method in a subsequent pr and also extract beforeAll() for more microbenchmark scenarios.
|
Merged into master. Thanks @pan3793 |
|
Although I didn't get a chance to take a look at the detail, thank you so much for adding additional benchmark, @pan3793 and @LuciferYang ! |
What changes were proposed in this pull request?
Add
LZ4TPCDSDataBenchmark, testLZ4CompressionCodecagainst TPCDScatalog_sales.dat(SF1), the size is about 283M.Why are the changes needed?
Add a benchmark to measure the perf impact of lz4 security upgrading.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added benchmark result. Since the change has a refactor that touched
ZSTDTPCDSDataBenchmark, also updated its benchmark results.Was this patch authored or co-authored using generative AI tooling?
No.