Skip to content

[SPARK-25846][SQL][TEST] Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark to use main method #22842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
================================================================================================
Benchmark for ArrayBuffer and UnsafeRowArray
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
Array with 1000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ArrayBuffer 8694 / 8743 30.2 33.2 1.0X
ExternalAppendOnlyUnsafeRowArray 19827 / 19946 13.2 75.6 0.4X

OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
Array with 30000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ArrayBuffer 24793 / 27135 19.8 50.4 1.0X
ExternalAppendOnlyUnsafeRowArray 24877 / 24963 19.8 50.6 1.0X

OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
Array with 100000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ArrayBuffer 6255 / 6263 16.4 61.1 1.0X
ExternalAppendOnlyUnsafeRowArray 5608 / 6170 18.3 54.8 1.1X

OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
Spilling with 1000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
UnsafeExternalSorter 15767 / 15790 16.6 60.1 1.0X
ExternalAppendOnlyUnsafeRowArray 10013 / 10036 26.2 38.2 1.6X

OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
Spilling with 10000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
UnsafeExternalSorter 5 / 6 29.2 34.3 1.0X
ExternalAppendOnlyUnsafeRowArray 6 / 7 26.2 38.1 0.9X

Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,27 @@ package org.apache.spark.sql.execution
import scala.collection.mutable.ArrayBuffer

import org.apache.spark.{SparkConf, SparkContext, SparkEnv, TaskContext}
import org.apache.spark.benchmark.Benchmark
import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
import org.apache.spark.internal.config
import org.apache.spark.memory.MemoryTestingUtils
import org.apache.spark.sql.catalyst.expressions.UnsafeRow
import org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter

object ExternalAppendOnlyUnsafeRowArrayBenchmark {
/**
* Synthetic Benchmark for ArrayBuffer and UnsafeRowArray
* To run this benchmark:
* {{{
* 1. without sbt:
* bin/spark-submit --class <this class> --jars <spark core test jar> <spark catalyst test jar>
* 2. build/sbt "catalyst/test:runMain <this class>"
* 3. generate result:
* SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "catalyst/test:runMain <this class>"
* Results will be written to
* "benchmarks/ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt".
* }}}
*/

object ExternalAppendOnlyUnsafeRowArrayBenchmark extends BenchmarkBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @heary-cao Seems it's duplicate work? #22617

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check our all refactors here: https://issues.apache.org/jira/browse/SPARK-25475


def testAgainstRawArrayBuffer(numSpillThreshold: Int, numRows: Int, iterations: Int): Unit = {
val random = new java.util.Random()
Expand All @@ -37,7 +51,8 @@ object ExternalAppendOnlyUnsafeRowArrayBenchmark {
row
})

val benchmark = new Benchmark(s"Array with $numRows rows", iterations * numRows)
val benchmark =
new Benchmark(s"Array with $numRows rows", iterations * numRows, output = output)

// Internally, `ExternalAppendOnlyUnsafeRowArray` will create an
// in-memory buffer of size `numSpillThreshold`. This will mimic that
Expand Down Expand Up @@ -108,7 +123,8 @@ object ExternalAppendOnlyUnsafeRowArrayBenchmark {
row
})

val benchmark = new Benchmark(s"Spilling with $numRows rows", iterations * numRows)
val benchmark =
new Benchmark(s"Spilling with $numRows rows", iterations * numRows, output = output)

benchmark.addCase("UnsafeExternalSorter") { _: Int =>
var sum = 0L
Expand Down Expand Up @@ -171,67 +187,15 @@ object ExternalAppendOnlyUnsafeRowArrayBenchmark {
sc.stop()
}

def main(args: Array[String]): Unit = {

// ========================================================================================= //
// WITHOUT SPILL
// ========================================================================================= //

val spillThreshold = 100 * 1000

/*
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz

Array with 1000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ArrayBuffer 7821 / 7941 33.5 29.8 1.0X
ExternalAppendOnlyUnsafeRowArray 8798 / 8819 29.8 33.6 0.9X
*/
testAgainstRawArrayBuffer(spillThreshold, 1000, 1 << 18)

/*
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz

Array with 30000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ArrayBuffer 19200 / 19206 25.6 39.1 1.0X
ExternalAppendOnlyUnsafeRowArray 19558 / 19562 25.1 39.8 1.0X
*/
testAgainstRawArrayBuffer(spillThreshold, 30 * 1000, 1 << 14)

/*
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz

Array with 100000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ArrayBuffer 5949 / 6028 17.2 58.1 1.0X
ExternalAppendOnlyUnsafeRowArray 6078 / 6138 16.8 59.4 1.0X
*/
testAgainstRawArrayBuffer(spillThreshold, 100 * 1000, 1 << 10)

// ========================================================================================= //
// WITH SPILL
// ========================================================================================= //

/*
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz

Spilling with 1000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
UnsafeExternalSorter 9239 / 9470 28.4 35.2 1.0X
ExternalAppendOnlyUnsafeRowArray 8857 / 8909 29.6 33.8 1.0X
*/
testAgainstRawUnsafeExternalSorter(100 * 1000, 1000, 1 << 18)

/*
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz

Spilling with 10000 rows: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
UnsafeExternalSorter 4 / 5 39.3 25.5 1.0X
ExternalAppendOnlyUnsafeRowArray 5 / 6 29.8 33.5 0.8X
*/
testAgainstRawUnsafeExternalSorter(
config.SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD.defaultValue.get, 10 * 1000, 1 << 4)
override def runBenchmarkSuite(): Unit = {
runBenchmark("Benchmark for ArrayBuffer and UnsafeRowArray") {
val spillThreshold = 100 * 1000
testAgainstRawArrayBuffer(spillThreshold, 1000, 1 << 18)
testAgainstRawArrayBuffer(spillThreshold, 30 * 1000, 1 << 14)
testAgainstRawArrayBuffer(spillThreshold, 100 * 1000, 1 << 10)
testAgainstRawUnsafeExternalSorter(100 * 1000, 1000, 1 << 18)
testAgainstRawUnsafeExternalSorter(
config.SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD.defaultValue.get, 10 * 1000, 1 << 4)
}
}
}