Skip to content

[SPARK-6368][SQL] Build a specialized serializer for Exchange operator. #5497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

[SPARK-6368][SQL] Build a specialized serializer for Exchange operator. #5497

wants to merge 14 commits into from

Conversation

yhuai
Copy link
Contributor

@yhuai yhuai commented Apr 13, 2015

@SparkQA
Copy link

SparkQA commented Apr 13, 2015

Test build #30194 has finished for PR 5497 at commit 39704ab.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 13, 2015

Test build #30196 has finished for PR 5497 at commit 2379eeb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.


val key = if (keySchema != null) new SpecificMutableRow(keySchema) else null
val value = if (valueSchema != null) new SpecificMutableRow(valueSchema) else null
val readKey = SparkSqlSerializer2.createDeserializationFunction(keySchema, rowIn, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readKey should always be () =>{} if the keySchema is null? The same for readValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the schema is null, we just have a function that does nothing. Is it what you were asking for?

@chenghao-intel
Copy link
Contributor

@yhuai this is a really cool improvement, definitely will improve the performance a lot. I have some of the comments about the future improvement(of course we can leave it for future), the most of the concern is using the Seq[Expression] probably better for Array[DataType] in constructing the Serializer, as we can optimize it for not serializing the Literal stuff.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30215 has finished for PR 5497 at commit c9373c8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30255 has finished for PR 5497 at commit 8297732.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.1.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.6.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30259 has finished for PR 5497 at commit 43b9fb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30288 has finished for PR 5497 at commit 3e09655.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • snappy-java-1.1.1.7.jar
  • This patch removes the following dependencies:
    • snappy-java-1.1.1.6.jar

yhuai added 3 commits April 15, 2015 16:24
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30382 has finished for PR 5497 at commit 791b96a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAttribute(nameParts: Seq[String])
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
    • case class Exchange(
    • case class SortMergeJoin(
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30384 has finished for PR 5497 at commit 09e587a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAttribute(nameParts: Seq[String])
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
    • case class Exchange(
    • case class SortMergeJoin(
  • This patch does not change any dependencies.

SparkSqlSerializer2.support(valueSchema)

val serializer = if (useSqlSerializer2) {
logInfo("Use SparkSqlSerializer2.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Using

same below

@SparkQA
Copy link

SparkQA commented Apr 18, 2015

Test build #30513 has finished for PR 5497 at commit da562c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@marmbrus
Copy link
Contributor

Thanks! Merged to master.

@asfgit asfgit closed this in ce7ddab Apr 21, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
JIRA: https://issues.apache.org/jira/browse/SPARK-6368

Author: Yin Huai <yhuai@databricks.com>

Closes apache#5497 from yhuai/serializer2 and squashes the following commits:

da562c5 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
50e0c3d [Yin Huai] When no filed is emitted to shuffle, use SparkSqlSerializer for now.
9f1ed92 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
6d07678 [Yin Huai] Address comments.
4273b8c [Yin Huai] Enabled SparkSqlSerializer2.
09e587a [Yin Huai] Remove TODO.
791b96a [Yin Huai] Use UTF8String.
60a1487 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2
3e09655 [Yin Huai] Use getAs for Date column.
43b9fb4 [Yin Huai] Test.
8297732 [Yin Huai] Fix test.
c9373c8 [Yin Huai] Support DecimalType.
2379eeb [Yin Huai] ASF header.
39704ab [Yin Huai] Specialized serializer for Exchange.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants