-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-6368][SQL] Build a specialized serializer for Exchange operator. #5497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #30194 has finished for PR 5497 at commit
|
Test build #30196 has finished for PR 5497 at commit
|
|
||
val key = if (keySchema != null) new SpecificMutableRow(keySchema) else null | ||
val value = if (valueSchema != null) new SpecificMutableRow(valueSchema) else null | ||
val readKey = SparkSqlSerializer2.createDeserializationFunction(keySchema, rowIn, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readKey
should always be () =>{}
if the keySchema
is null? The same for readValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the schema is null, we just have a function that does nothing. Is it what you were asking for?
@yhuai this is a really cool improvement, definitely will improve the performance a lot. I have some of the comments about the future improvement(of course we can leave it for future), the most of the concern is using the |
Test build #30215 has finished for PR 5497 at commit
|
Test build #30255 has finished for PR 5497 at commit
|
Test build #30259 has finished for PR 5497 at commit
|
Test build #30288 has finished for PR 5497 at commit
|
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
Test build #30382 has finished for PR 5497 at commit
|
Test build #30384 has finished for PR 5497 at commit
|
SparkSqlSerializer2.support(valueSchema) | ||
|
||
val serializer = if (useSqlSerializer2) { | ||
logInfo("Use SparkSqlSerializer2.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Using
same below
Test build #30513 has finished for PR 5497 at commit
|
Thanks! Merged to master. |
JIRA: https://issues.apache.org/jira/browse/SPARK-6368 Author: Yin Huai <yhuai@databricks.com> Closes apache#5497 from yhuai/serializer2 and squashes the following commits: da562c5 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2 50e0c3d [Yin Huai] When no filed is emitted to shuffle, use SparkSqlSerializer for now. 9f1ed92 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2 6d07678 [Yin Huai] Address comments. 4273b8c [Yin Huai] Enabled SparkSqlSerializer2. 09e587a [Yin Huai] Remove TODO. 791b96a [Yin Huai] Use UTF8String. 60a1487 [Yin Huai] Merge remote-tracking branch 'upstream/master' into serializer2 3e09655 [Yin Huai] Use getAs for Date column. 43b9fb4 [Yin Huai] Test. 8297732 [Yin Huai] Fix test. c9373c8 [Yin Huai] Support DecimalType. 2379eeb [Yin Huai] ASF header. 39704ab [Yin Huai] Specialized serializer for Exchange.
JIRA: https://issues.apache.org/jira/browse/SPARK-6368