Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 #22179

Closed
wants to merge 4 commits into from
Closed

[SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 #22179

wants to merge 4 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Aug 22, 2018

What changes were proposed in this pull request?

Upgrade chill to 0.9.3, Kryo to 4.0.2, to get bug fixes and improvements.

The resolved tickets includes:

  • SPARK-25258 Upgrade kryo package to version 4.0.2
  • SPARK-23131 Kryo raises StackOverflow during serializing GLR model
  • SPARK-25176 Kryo fails to serialize a parametrised type hierarchy

More details:
https://github.com/twitter/chill/releases/tag/v0.9.3
twitter/chill@cc3910d

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Aug 22, 2018

Test build #95072 has finished for PR 22179 at commit 8c28e07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Aug 22, 2018

cc @srowen

@srowen
Copy link
Member

srowen commented Aug 22, 2018

That looks like a major version bump -- the usual question here -- what are the key changes we need, what are possible incompatible changes?

@wangyum
Copy link
Member Author

wangyum commented Aug 22, 2018

Thanks @srowen SPARK-25176 has a detail description:

I'm using the latest spark version spark-core_2.11:2.3.1 which
transitively depends on com.esotericsoftware:kryo-shaded:3.0.3 via the
com.twitter:chill_2.11:0.8.0 dependency. This exact version of kryo serializer contains an issue [1,2] which results in throwing ClassCastExceptions when serialising parameterised type hierarchy.
This issue has been fixed in kryo version 4.0.0 [3]. It would be great to have this update in Spark as well. Could you please upgrade the version of com.twitter:chill_2.11 dependency from 0.8.0 up to 0.9.2?
You can find a simple test to reproduce the issue [4].
[1] EsotericSoftware/kryo#384
[2] EsotericSoftware/kryo#377
[3] https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0
[4] https://github.com/mpryahin/kryo-parametrized-type-inheritance

@srowen
Copy link
Member

srowen commented Aug 22, 2018

Yes, I think the question is whether anything breaks as well. Kryo 4.x is a breaking change IIRC. Is there a 3.x version that has these fixes?

@srowen
Copy link
Member

srowen commented Aug 30, 2018

https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0

This has a number of breaking changes. However, Spark actually doesn't use kryo itself, and excludes it. It uses kryo-shaded, via chill. chill otherwise has almost no changes from 0.8 to 0.9 except to accommodate Kryo 4. Therefore the update shouldn't actually force downstream apps to use a different kryo, which is most of the potential problem here.

The changes in serialized format should not be visible outside Spark, if I am not mistaken. It controls serialization within Spark components and this isn't compatible across versions anyway.

I'm inclined to merge this for 2.4 as it resolves a number of issues users are hitting.

@erikerlandson
Copy link
Contributor

If it doesn't expose any breaking changes to users, that seems reasonable. I assume there would be no integration problems with scala 2.12?

@srowen
Copy link
Member

srowen commented Aug 30, 2018

Kryo 3.0.3 and Chill 0.8.2 already work with 2.12, so no issue particularly there. I don't see that this change would be visible to users, and tests suggest it doesn't affect Spark. I'm cautious that I'm maybe overlooking something though. And weighing the risk vs the upside of fixing some user issues and potential performance bumps.

@@ -1770,6 +1770,10 @@
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
<exclusion>
<groupId> com.esotericsoftware</groupId>
<artifactId>kryo-shaded</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @wangyum yes good point. ORC also uses kryo-shaded, and uses 3.0.3. In theory that could cause a break, but the tests pass. That's a positive sign but not bulletproof. @dongjoon-hyun do you have any insight into how ORC uses kryo? Is it a code path that wouldn't matter to Spark?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pinging me, @srowen . ORC uses Kryo only for writing/reading one ORC configuration, orc.kryo.sarg. The followings are the Spark's indirect code path. I guess Kryo provides forward compatibility at least, but I'll take a look at this PR today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen In short, the current Spark always uses the same Kryo version for read/write SearchArgument and it's used only on runtime.

  1. Old OrcFileFormat always uses org.spark-project.hive:hive-exec:1.2.1.spark2 which uses the shaded one in hive-exec.

    • com.esotericsoftware.kryo:kryo:2.21.
  2. New OrcFileFormat uses org.apache.orc which uses the one provided by Spark.

    • com.esotericsoftware:kryo-shaded:3.0.3 (All Spark/Orc/Hive uses this version for now)
  3. New OrcFileFormat (in this PR) uses org.apache.orc which uses the one provided by Spark.

    • com.esotericsoftware:kryo-shaded:4.0.2

So, (1) is unchanged by this PR. (2) and (3) also doesn't use a mixed version of Kryo. So, it should be fine because Apache Spark doesn't allow a mixed Spark version(master and executor). BTW, during investigation, there was some performance issue in createFilter. I'll file a new JIRA for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is whether Orc complied against kryo-shaded 3.x necessarily works with kryo-shaded 4.x, because there were API and behavior changes. Our current tests don't seem to surface any such problems, but who knows if there's something they don't test.

I agree that our distribution won't include two versions, but the issue I'm wondering about is whether Orc will like the one later version that it gets bundled with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I checked that, @srowen . org.apache.orc only uses Kryo constructor, writeObject, and readObject from kryo-shaded library. There is no change for them.

WRITE

(new Kryo()).writeObject(out, sarg);

READ

... = (new Kryo()).readObject(new Input(sargBytes), SearchArgumentImpl.class);

Copy link
Member

@srowen srowen Sep 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if that's the extent of the usage, then I believe that ORC is OK with Kryo 4. That much is not a problem. Edit: Hm, on another second thought, if the serialization format changes from 3 to 4, I wonder if it means that config it writes becomes unreadable? does this config file matter to Spark?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also worried that part, but it's used only in a run-time SearchArgument serialization. There was no usage with ORC files.

@erikerlandson
Copy link
Contributor

For maximum breaking-change forgiveness, the "safe" option would be to punt it to 3.0; which seems likely to be the next release after this.

@dongjoon-hyun
Copy link
Member

@wangyum . I know this is designed for a build-only PR. However, since this is claimed for bug fixes(SPARK-20389, SPARK-23131, and SPARK-25176), could you add test cases for them in order to verify those issues are really resolved?

@dongjoon-hyun
Copy link
Member

Ping, @wangyum . Could you add the test cases for SPARK-20389, SPARK-23131, and SPARK-25176?

@wangyum
Copy link
Member Author

wangyum commented Sep 2, 2018

@dongjoon-hyun I'm trying to add test cases.

@dongjoon-hyun
Copy link
Member

Thanks, @wangyum !

@srowen
Copy link
Member

srowen commented Sep 2, 2018

Tests are great of course; those three might be hard to test. The first and last one don't have info on the reproduction. The second one looks possibly reproducible though as it shows the custom serialization code used to write a GeneralizedLinearRegression model.

@dongjoon-hyun
Copy link
Member

I pinged the first one, SPARK-20389, to find any chance to test on the original cluster. Although it's too old issue, but there is no harm for us to ping once more.

For the third one, SPARK-25176, the reporter provided the sample code. Although the code is open for us, but it's not Apache license. We can make our own test case for type-inheritance.

You can find a simple test to reproduce the issue [4].

@wangyum
Copy link
Member Author

wangyum commented Sep 3, 2018

Sorry @dongjoon-hyun I only reproduce one test.
kryo-parametrized-type-inheritance related to language. It seems scala can't reproduce it:

    val ser = new KryoSerializer(new SparkConf).newInstance().asInstanceOf[KryoSerializerInstance]

    class BaseType[R] {}
    class CollectionType(val child: BaseType[_]*) extends BaseType[Boolean] {
      val children: List[BaseType[_]] = child.toList
    }
    class ValueType[R](val v: R) extends BaseType[R] {}

    val value = new CollectionType(new ValueType("hello"))

    ser.serialize(value)

SPARK-23131 may be related to data. I can't reproduce it:

  def modelToString(model: GeneralizedLinearRegressionModel): (String, String) = {
    val os: ByteArrayOutputStream = new ByteArrayOutputStream()
    val zos = new GZIPOutputStream(os)
    val oo: ObjectOutputStream = new ObjectOutputStream(zos)
    oo.writeObject(model)
    oo.close()
    zos.close()
    os.close()
    (model.uid, DatatypeConverter.printBase64Binary(os.toByteArray))
  }

@SparkQA
Copy link

SparkQA commented Sep 3, 2018

Test build #95613 has finished for PR 22179 at commit f2fb28d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MapHolder

@wangyum
Copy link
Member Author

wangyum commented Sep 3, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Sep 3, 2018

Test build #95623 has finished for PR 22179 at commit f2fb28d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MapHolder

@wangyum
Copy link
Member Author

wangyum commented Sep 3, 2018

retest this please

@@ -412,6 +412,26 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext {
assert(!ser2.getAutoReset)
}

test("ClassCastException when writing a Map after previously " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a bug fix test case, could you add SPARK-25176 like SPARK-25176 ClassCastException ...?

@SparkQA
Copy link

SparkQA commented Sep 4, 2018

Test build #95643 has finished for PR 22179 at commit f2fb28d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MapHolder

@SparkQA
Copy link

SparkQA commented Sep 4, 2018

Test build #95646 has finished for PR 22179 at commit 0d78113.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

@wangyum

I made a PR to your branch. Could you review and merge that? The content are the followings.

  • Add a new test case for SPARK-23131 (Kryo raises StackOverflow during serializing GLR model)
  • Update your test case for SPARK-25176 (ClassCastException when reading/writing a Map)

After merging the PR, please add [SPARK-25176] after [SPARK-23131] because we have test cases. We can resolve both issues under your name.

For the remaining SPARK-20389 (NegativeArraySizeException), there is no response on JIRA. So, let's simply exclude it from this PR description because we cannot claim that it's resolved.

@wangyum
Copy link
Member Author

wangyum commented Sep 4, 2018

Thanks, @dongjoon-hyun

@wangyum wangyum changed the title [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2 [SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 Sep 4, 2018
@SparkQA
Copy link

SparkQA commented Sep 4, 2018

Test build #95673 has finished for PR 22179 at commit d1dac1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Although this will give us a different Kryo version (not Hive, ORC), the newly added test cases show the benefit clearly. Also, I checked two new test cases with/without this PR. It looks enough to me.
Could you take a look again, @srowen, @cloud-fan and @erikerlandson ?

@dongjoon-hyun
Copy link
Member

And, @wangyum . Please add [SPARK-25258] to the PR title like [SPARK-25258][SPARK-23131][SPARK-25176]. SPARK-23131 is the one you created for this PR.

Also, the PR description had better have the following. (We excludes only SPARK-20389.)

- SPARK-25258 Upgrade kryo package to version 4.0.2
- SPARK-23131 Kryo raises StackOverflow during serializing GLR model
- SPARK-25176 Kryo fails to serialize a parametrised type hierarchy

@wangyum wangyum changed the title [SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 [SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade Kryo to 4.0.2 Sep 4, 2018
@cloud-fan
Copy link
Contributor

Do we have any compatibility issues here? Seems fine to me as we already shaded kryo.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite that we shade Kryo, but use a shaded artifact called kryo-shaded. ORC does too, but looks like that can be ruled out as a compatibility issue. I don't see other ways that the version change can leak into user-visible contexts, like through the code or serialized data formats. I can't see evidence of a compatibility issue, as likewise all tests pass.

I think this is OK to merge, given the upside.

@srowen
Copy link
Member

srowen commented Sep 5, 2018

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants