[SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 #22179

wangyum · 2018-08-22T01:48:11Z

What changes were proposed in this pull request?

Upgrade chill to 0.9.3, Kryo to 4.0.2, to get bug fixes and improvements.

The resolved tickets includes:

SPARK-25258 Upgrade kryo package to version 4.0.2
SPARK-23131 Kryo raises StackOverflow during serializing GLR model
SPARK-25176 Kryo fails to serialize a parametrised type hierarchy

More details:
https://github.com/twitter/chill/releases/tag/v0.9.3
twitter/chill@cc3910d

How was this patch tested?

Existing tests.

SparkQA · 2018-08-22T06:24:37Z

Test build #95072 has finished for PR 22179 at commit 8c28e07.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-08-22T06:33:00Z

cc @srowen

srowen · 2018-08-22T14:18:00Z

That looks like a major version bump -- the usual question here -- what are the key changes we need, what are possible incompatible changes?

wangyum · 2018-08-22T15:10:28Z

Thanks @srowen SPARK-25176 has a detail description:

I'm using the latest spark version spark-core_2.11:2.3.1 which
transitively depends on com.esotericsoftware:kryo-shaded:3.0.3 via the
com.twitter:chill_2.11:0.8.0 dependency. This exact version of kryo serializer contains an issue [1,2] which results in throwing ClassCastExceptions when serialising parameterised type hierarchy.
This issue has been fixed in kryo version 4.0.0 [3]. It would be great to have this update in Spark as well. Could you please upgrade the version of com.twitter:chill_2.11 dependency from 0.8.0 up to 0.9.2?
You can find a simple test to reproduce the issue [4].
[1] EsotericSoftware/kryo#384
[2] EsotericSoftware/kryo#377
[3] https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0
[4] https://github.com/mpryahin/kryo-parametrized-type-inheritance

srowen · 2018-08-22T15:25:04Z

Yes, I think the question is whether anything breaks as well. Kryo 4.x is a breaking change IIRC. Is there a 3.x version that has these fixes?

srowen · 2018-08-30T16:35:46Z

https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0

This has a number of breaking changes. However, Spark actually doesn't use kryo itself, and excludes it. It uses kryo-shaded, via chill. chill otherwise has almost no changes from 0.8 to 0.9 except to accommodate Kryo 4. Therefore the update shouldn't actually force downstream apps to use a different kryo, which is most of the potential problem here.

The changes in serialized format should not be visible outside Spark, if I am not mistaken. It controls serialization within Spark components and this isn't compatible across versions anyway.

I'm inclined to merge this for 2.4 as it resolves a number of issues users are hitting.

erikerlandson · 2018-08-30T17:01:15Z

If it doesn't expose any breaking changes to users, that seems reasonable. I assume there would be no integration problems with scala 2.12?

srowen · 2018-08-30T17:04:16Z

Kryo 3.0.3 and Chill 0.8.2 already work with 2.12, so no issue particularly there. I don't see that this change would be visible to users, and tests suggest it doesn't affect Spark. I'm cautious that I'm maybe overlooking something though. And weighing the risk vs the upside of fixing some user issues and potential performance bumps.

cloud-fan · 2018-08-30T17:34:56Z

pom.xml

@@ -1770,6 +1770,10 @@
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-storage-api</artifactId>
          </exclusion>
+          <exclusion>
+            <groupId> com.esotericsoftware</groupId>
+            <artifactId>kryo-shaded</artifactId>


why this change?

CC @wangyum yes good point. ORC also uses kryo-shaded, and uses 3.0.3. In theory that could cause a break, but the tests pass. That's a positive sign but not bulletproof. @dongjoon-hyun do you have any insight into how ORC uses kryo? Is it a code path that wouldn't matter to Spark?

Thank you for pinging me, @srowen . ORC uses Kryo only for writing/reading one ORC configuration, orc.kryo.sarg. The followings are the Spark's indirect code path. I guess Kryo provides forward compatibility at least, but I'll take a look at this PR today.

For new ORCFileFormat, OrcInputFormat.setSearchArgument

For old ORCFileFormat, SearchArgument.toKryo

@srowen In short, the current Spark always uses the same Kryo version for read/write SearchArgument and it's used only on runtime.

Old OrcFileFormat always uses org.spark-project.hive:hive-exec:1.2.1.spark2 which uses the shaded one in hive-exec.

com.esotericsoftware.kryo:kryo:2.21.

New OrcFileFormat uses org.apache.orc which uses the one provided by Spark.

com.esotericsoftware:kryo-shaded:3.0.3 (All Spark/Orc/Hive uses this version for now)

New OrcFileFormat (in this PR) uses org.apache.orc which uses the one provided by Spark.

com.esotericsoftware:kryo-shaded:4.0.2

So, (1) is unchanged by this PR. (2) and (3) also doesn't use a mixed version of Kryo. So, it should be fine because Apache Spark doesn't allow a mixed Spark version(master and executor). BTW, during investigation, there was some performance issue in createFilter. I'll file a new JIRA for that.

I guess my question is whether Orc complied against kryo-shaded 3.x necessarily works with kryo-shaded 4.x, because there were API and behavior changes. Our current tests don't seem to surface any such problems, but who knows if there's something they don't test.

I agree that our distribution won't include two versions, but the issue I'm wondering about is whether Orc will like the one later version that it gets bundled with.

Yes. I checked that, @srowen . org.apache.orc only uses Kryo constructor, writeObject, and readObject from kryo-shaded library. There is no change for them.

WRITE

(new Kryo()).writeObject(out, sarg);

READ

... = (new Kryo()).readObject(new Input(sargBytes), SearchArgumentImpl.class);

OK, if that's the extent of the usage, then I believe that ORC is OK with Kryo 4. That much is not a problem. Edit: Hm, on another second thought, if the serialization format changes from 3 to 4, I wonder if it means that config it writes becomes unreadable? does this config file matter to Spark?

I was also worried that part, but it's used only in a run-time SearchArgument serialization. There was no usage with ORC files.

erikerlandson · 2018-08-30T18:25:52Z

For maximum breaking-change forgiveness, the "safe" option would be to punt it to 3.0; which seems likely to be the next release after this.

dongjoon-hyun · 2018-08-30T20:19:00Z

@wangyum . I know this is designed for a build-only PR. However, since this is claimed for bug fixes(SPARK-20389, SPARK-23131, and SPARK-25176), could you add test cases for them in order to verify those issues are really resolved?

dongjoon-hyun · 2018-09-01T23:01:51Z

Ping, @wangyum . Could you add the test cases for SPARK-20389, SPARK-23131, and SPARK-25176?

wangyum · 2018-09-02T05:16:49Z

@dongjoon-hyun I'm trying to add test cases.

dongjoon-hyun · 2018-09-02T05:19:46Z

Thanks, @wangyum !

srowen · 2018-09-02T15:17:38Z

Tests are great of course; those three might be hard to test. The first and last one don't have info on the reproduction. The second one looks possibly reproducible though as it shows the custom serialization code used to write a GeneralizedLinearRegression model.

dongjoon-hyun · 2018-09-02T18:56:18Z

I pinged the first one, SPARK-20389, to find any chance to test on the original cluster. Although it's too old issue, but there is no harm for us to ping once more.

For the third one, SPARK-25176, the reporter provided the sample code. Although the code is open for us, but it's not Apache license. We can make our own test case for type-inheritance.

You can find a simple test to reproduce the issue [4].

[4] https://github.com/mpryahin/kryo-parametrized-type-inheritance
- https://github.com/mpryahin/kryo-parametrized-type-inheritance/blob/master/src/main/java/io/cleverdata/dmpkit/BaseType.java

wangyum · 2018-09-03T08:21:54Z

Sorry @dongjoon-hyun I only reproduce one test.
kryo-parametrized-type-inheritance related to language. It seems scala can't reproduce it:

    val ser = new KryoSerializer(new SparkConf).newInstance().asInstanceOf[KryoSerializerInstance]

    class BaseType[R] {}
    class CollectionType(val child: BaseType[_]*) extends BaseType[Boolean] {
      val children: List[BaseType[_]] = child.toList
    }
    class ValueType[R](val v: R) extends BaseType[R] {}

    val value = new CollectionType(new ValueType("hello"))

    ser.serialize(value)

SPARK-23131 may be related to data. I can't reproduce it:

  def modelToString(model: GeneralizedLinearRegressionModel): (String, String) = {
    val os: ByteArrayOutputStream = new ByteArrayOutputStream()
    val zos = new GZIPOutputStream(os)
    val oo: ObjectOutputStream = new ObjectOutputStream(zos)
    oo.writeObject(model)
    oo.close()
    zos.close()
    os.close()
    (model.uid, DatatypeConverter.printBase64Binary(os.toByteArray))
  }

SparkQA · 2018-09-03T11:58:01Z

Test build #95613 has finished for PR 22179 at commit f2fb28d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class MapHolder

wangyum · 2018-09-03T12:06:37Z

retest this please

SparkQA · 2018-09-03T17:23:05Z

Test build #95623 has finished for PR 22179 at commit f2fb28d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class MapHolder

wangyum · 2018-09-03T23:02:14Z

retest this please

dongjoon-hyun · 2018-09-03T23:20:59Z

core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala

@@ -412,6 +412,26 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext {
    assert(!ser2.getAutoReset)
  }

+  test("ClassCastException when writing a Map after previously " +


Since this is a bug fix test case, could you add SPARK-25176 like SPARK-25176 ClassCastException ...?

SparkQA · 2018-09-04T03:34:37Z

Test build #95643 has finished for PR 22179 at commit f2fb28d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class MapHolder

SparkQA · 2018-09-04T06:52:43Z

Test build #95646 has finished for PR 22179 at commit 0d78113.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-04T10:31:01Z

@wangyum

I made a PR to your branch. Could you review and merge that? The content are the followings.

Add a new test case for SPARK-23131 (Kryo raises StackOverflow during serializing GLR model)
Update your test case for SPARK-25176 (ClassCastException when reading/writing a Map)

After merging the PR, please add [SPARK-25176] after [SPARK-23131] because we have test cases. We can resolve both issues under your name.

For the remaining SPARK-20389 (NegativeArraySizeException), there is no response on JIRA. So, let's simply exclude it from this PR description because we cannot claim that it's resolved.

wangyum · 2018-09-04T11:52:45Z

Thanks, @dongjoon-hyun

SparkQA · 2018-09-04T16:20:13Z

Test build #95673 has finished for PR 22179 at commit d1dac1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-09-04T18:58:35Z

Although this will give us a different Kryo version (not Hive, ORC), the newly added test cases show the benefit clearly. Also, I checked two new test cases with/without this PR. It looks enough to me.
Could you take a look again, @srowen, @cloud-fan and @erikerlandson ?

dongjoon-hyun · 2018-09-04T19:12:19Z

And, @wangyum . Please add [SPARK-25258] to the PR title like [SPARK-25258][SPARK-23131][SPARK-25176]. SPARK-23131 is the one you created for this PR.

Also, the PR description had better have the following. (We excludes only SPARK-20389.)

- SPARK-25258 Upgrade kryo package to version 4.0.2
- SPARK-23131 Kryo raises StackOverflow during serializing GLR model
- SPARK-25176 Kryo fails to serialize a parametrised type hierarchy

cloud-fan · 2018-09-05T13:20:19Z

Do we have any compatibility issues here? Seems fine to me as we already shaded kryo.

srowen

It's not quite that we shade Kryo, but use a shaded artifact called kryo-shaded. ORC does too, but looks like that can be ruled out as a compatibility issue. I don't see other ways that the version change can leak into user-visible contexts, like through the code or serialized data formats. I can't see evidence of a compatibility issue, as likewise all tests pass.

I think this is OK to merge, given the upside.

srowen · 2018-09-05T22:50:22Z

Merged to master

Upgrade chill from 0.8.4 to 0.9.3

8c28e07

cloud-fan reviewed Aug 30, 2018

View reviewed changes

Add a test

f2fb28d

dongjoon-hyun reviewed Sep 3, 2018

View reviewed changes

Rename test name

0d78113

Add and update test cases (#9)

d1dac1e

wangyum changed the title ~~[SPARK-23131][BUILD] Upgrade Kryo to 4.0.2~~ [SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 Sep 4, 2018

wangyum changed the title ~~[SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2~~ [SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 Sep 4, 2018

srowen approved these changes Sep 5, 2018

View reviewed changes

dongjoon-hyun approved these changes Sep 5, 2018

View reviewed changes

asfgit closed this in 3e03303 Sep 5, 2018

danking mentioned this pull request Mar 8, 2019

Cannot load a PLINK file containing 20 million variants hail-is/hail#5564

Closed

nicknezis mentioned this pull request Jan 6, 2021

[SPARK-34023][BUILD] Updated to Kryo 5.0.3 #31059

Closed

[SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 #22179

[SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 #22179

Conversation

wangyum commented Aug 22, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Aug 22, 2018

wangyum commented Aug 22, 2018

srowen commented Aug 22, 2018

wangyum commented Aug 22, 2018

srowen commented Aug 22, 2018

srowen commented Aug 30, 2018

erikerlandson commented Aug 30, 2018

srowen commented Aug 30, 2018

cloud-fan Aug 30, 2018

Choose a reason for hiding this comment

srowen Aug 30, 2018

Choose a reason for hiding this comment

dongjoon-hyun Aug 30, 2018 • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun Aug 31, 2018

Choose a reason for hiding this comment

srowen Aug 31, 2018

Choose a reason for hiding this comment

dongjoon-hyun Aug 31, 2018

Choose a reason for hiding this comment

srowen Sep 1, 2018 • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun Sep 1, 2018

Choose a reason for hiding this comment

erikerlandson commented Aug 30, 2018

dongjoon-hyun commented Aug 30, 2018

dongjoon-hyun commented Sep 1, 2018

wangyum commented Sep 2, 2018

dongjoon-hyun commented Sep 2, 2018

srowen commented Sep 2, 2018

dongjoon-hyun commented Sep 2, 2018

wangyum commented Sep 3, 2018

SparkQA commented Sep 3, 2018

wangyum commented Sep 3, 2018

SparkQA commented Sep 3, 2018

wangyum commented Sep 3, 2018

dongjoon-hyun Sep 3, 2018

Choose a reason for hiding this comment

SparkQA commented Sep 4, 2018

SparkQA commented Sep 4, 2018

dongjoon-hyun commented Sep 4, 2018

wangyum commented Sep 4, 2018

SparkQA commented Sep 4, 2018

dongjoon-hyun commented Sep 4, 2018

dongjoon-hyun commented Sep 4, 2018

cloud-fan commented Sep 5, 2018

srowen left a comment

Choose a reason for hiding this comment

srowen commented Sep 5, 2018

wangyum commented Aug 22, 2018 •

edited

Loading

dongjoon-hyun Aug 30, 2018 •

edited

Loading

srowen Sep 1, 2018 •

edited

Loading