-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 #22179
Conversation
Test build #95072 has finished for PR 22179 at commit
|
cc @srowen |
That looks like a major version bump -- the usual question here -- what are the key changes we need, what are possible incompatible changes? |
Thanks @srowen SPARK-25176 has a detail description:
|
Yes, I think the question is whether anything breaks as well. Kryo 4.x is a breaking change IIRC. Is there a 3.x version that has these fixes? |
https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0 This has a number of breaking changes. However, Spark actually doesn't use kryo itself, and excludes it. It uses kryo-shaded, via chill. chill otherwise has almost no changes from 0.8 to 0.9 except to accommodate Kryo 4. Therefore the update shouldn't actually force downstream apps to use a different kryo, which is most of the potential problem here. The changes in serialized format should not be visible outside Spark, if I am not mistaken. It controls serialization within Spark components and this isn't compatible across versions anyway. I'm inclined to merge this for 2.4 as it resolves a number of issues users are hitting. |
If it doesn't expose any breaking changes to users, that seems reasonable. I assume there would be no integration problems with scala 2.12? |
Kryo 3.0.3 and Chill 0.8.2 already work with 2.12, so no issue particularly there. I don't see that this change would be visible to users, and tests suggest it doesn't affect Spark. I'm cautious that I'm maybe overlooking something though. And weighing the risk vs the upside of fixing some user issues and potential performance bumps. |
@@ -1770,6 +1770,10 @@ | |||
<groupId>org.apache.hive</groupId> | |||
<artifactId>hive-storage-api</artifactId> | |||
</exclusion> | |||
<exclusion> | |||
<groupId> com.esotericsoftware</groupId> | |||
<artifactId>kryo-shaded</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @wangyum yes good point. ORC also uses kryo-shaded, and uses 3.0.3. In theory that could cause a break, but the tests pass. That's a positive sign but not bulletproof. @dongjoon-hyun do you have any insight into how ORC uses kryo? Is it a code path that wouldn't matter to Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pinging me, @srowen . ORC uses Kryo only for writing/reading one ORC configuration, orc.kryo.sarg
. The followings are the Spark's indirect code path. I guess Kryo provides forward compatibility at least, but I'll take a look at this PR today.
- For new ORCFileFormat, OrcInputFormat.setSearchArgument
- For old ORCFileFormat, SearchArgument.toKryo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen In short, the current Spark always uses the same Kryo version for read/write SearchArgument
and it's used only on runtime.
-
Old OrcFileFormat always uses
org.spark-project.hive:hive-exec:1.2.1.spark2
which uses the shaded one inhive-exec
.com.esotericsoftware.kryo:kryo:2.21
.
-
New OrcFileFormat uses
org.apache.orc
which uses the one provided by Spark.com.esotericsoftware:kryo-shaded:3.0.3
(All Spark/Orc/Hive uses this version for now)
-
New OrcFileFormat (in this PR) uses
org.apache.orc
which uses the one provided by Spark.com.esotericsoftware:kryo-shaded:4.0.2
So, (1) is unchanged by this PR. (2) and (3) also doesn't use a mixed version of Kryo. So, it should be fine because Apache Spark doesn't allow a mixed Spark version(master and executor). BTW, during investigation, there was some performance issue in createFilter
. I'll file a new JIRA for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is whether Orc complied against kryo-shaded 3.x necessarily works with kryo-shaded 4.x, because there were API and behavior changes. Our current tests don't seem to surface any such problems, but who knows if there's something they don't test.
I agree that our distribution won't include two versions, but the issue I'm wondering about is whether Orc will like the one later version that it gets bundled with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I checked that, @srowen . org.apache.orc
only uses Kryo constructor, writeObject
, and readObject
from kryo-shaded
library. There is no change for them.
WRITE
(new Kryo()).writeObject(out, sarg);
READ
... = (new Kryo()).readObject(new Input(sargBytes), SearchArgumentImpl.class);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if that's the extent of the usage, then I believe that ORC is OK with Kryo 4. That much is not a problem. Edit: Hm, on another second thought, if the serialization format changes from 3 to 4, I wonder if it means that config it writes becomes unreadable? does this config file matter to Spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also worried that part, but it's used only in a run-time SearchArgument
serialization. There was no usage with ORC file
s.
For maximum breaking-change forgiveness, the "safe" option would be to punt it to 3.0; which seems likely to be the next release after this. |
@wangyum . I know this is designed for a build-only PR. However, since this is claimed for bug fixes(SPARK-20389, SPARK-23131, and SPARK-25176), could you add test cases for them in order to verify those issues are really resolved? |
Ping, @wangyum . Could you add the test cases for SPARK-20389, SPARK-23131, and SPARK-25176? |
@dongjoon-hyun I'm trying to add test cases. |
Thanks, @wangyum ! |
Tests are great of course; those three might be hard to test. The first and last one don't have info on the reproduction. The second one looks possibly reproducible though as it shows the custom serialization code used to write a GeneralizedLinearRegression model. |
I pinged the first one, SPARK-20389, to find any chance to test on the original cluster. Although it's too old issue, but there is no harm for us to ping once more. For the third one, SPARK-25176, the reporter provided the sample code. Although the code is open for us, but it's not Apache license. We can make our own test case for type-inheritance.
|
Sorry @dongjoon-hyun I only reproduce one test. val ser = new KryoSerializer(new SparkConf).newInstance().asInstanceOf[KryoSerializerInstance]
class BaseType[R] {}
class CollectionType(val child: BaseType[_]*) extends BaseType[Boolean] {
val children: List[BaseType[_]] = child.toList
}
class ValueType[R](val v: R) extends BaseType[R] {}
val value = new CollectionType(new ValueType("hello"))
ser.serialize(value) SPARK-23131 may be related to data. I can't reproduce it: def modelToString(model: GeneralizedLinearRegressionModel): (String, String) = {
val os: ByteArrayOutputStream = new ByteArrayOutputStream()
val zos = new GZIPOutputStream(os)
val oo: ObjectOutputStream = new ObjectOutputStream(zos)
oo.writeObject(model)
oo.close()
zos.close()
os.close()
(model.uid, DatatypeConverter.printBase64Binary(os.toByteArray))
} |
Test build #95613 has finished for PR 22179 at commit
|
retest this please |
Test build #95623 has finished for PR 22179 at commit
|
retest this please |
@@ -412,6 +412,26 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext { | |||
assert(!ser2.getAutoReset) | |||
} | |||
|
|||
test("ClassCastException when writing a Map after previously " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a bug fix test case, could you add SPARK-25176
like SPARK-25176 ClassCastException ...
?
Test build #95643 has finished for PR 22179 at commit
|
Test build #95646 has finished for PR 22179 at commit
|
I made a PR to your branch. Could you review and merge that? The content are the followings.
After merging the PR, please add For the remaining SPARK-20389 (NegativeArraySizeException), there is no response on JIRA. So, let's simply exclude it from this PR description because we cannot claim that it's resolved. |
Thanks, @dongjoon-hyun |
Test build #95673 has finished for PR 22179 at commit
|
Although this will give us a different Kryo version (not Hive, ORC), the newly added test cases show the benefit clearly. Also, I checked two new test cases with/without this PR. It looks enough to me. |
And, @wangyum . Please add Also, the PR description had better have the following. (We excludes only SPARK-20389.)
|
Do we have any compatibility issues here? Seems fine to me as we already shaded kryo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite that we shade Kryo, but use a shaded artifact called kryo-shaded. ORC does too, but looks like that can be ruled out as a compatibility issue. I don't see other ways that the version change can leak into user-visible contexts, like through the code or serialized data formats. I can't see evidence of a compatibility issue, as likewise all tests pass.
I think this is OK to merge, given the upside.
Merged to master |
What changes were proposed in this pull request?
Upgrade chill to 0.9.3, Kryo to 4.0.2, to get bug fixes and improvements.
The resolved tickets includes:
More details:
https://github.com/twitter/chill/releases/tag/v0.9.3
twitter/chill@cc3910d
How was this patch tested?
Existing tests.