Utilities for writing tests that use Apache Spark.
SparkSuite: a SparkContext for each test suite
Add configuration options in subclasses using sparkConf(…), cf. KryoSparkSuite:
sparkConf(
// Register this class as its own KryoRegistrator
"spark.kryo.registrator" → getClass.getCanonicalName,
"spark.serializer" → "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.referenceTracking" → referenceTracking.toString,
"spark.kryo.registrationRequired" → registrationRequired.toString
)PerCaseSuite: SparkContext for each test case
SparkSuite implementation that provides hooks for kryo-registration:
register(
classOf[Foo],
"org.foo.Bar",
classOf[Bar] → new BarSerializer
)Also useful for subclassing once per-project and filling in that project's default Kryo registrar, then having concrete tests subclass that; see cf. hammerlab/guacamole and hammerlab/pageant for examples.
rdd.Util: make an RDD with specific elements in specific partitions.NumJobsUtil: verify the number of Spark jobs that have been run.RDDSerialization: interface that allows for verifying that performing a serialization+deserialization round-trip on an RDD results in the same RDD.