Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34023][BUILD] Updated to Kryo 5.0.3 #31059

Closed
wants to merge 2 commits into from

Conversation

nicknezis
Copy link

What changes were proposed in this pull request?

Updating Kryo to a newer version. Release information can be found here: https://github.com/EsotericSoftware/kryo/releases

Why are the changes needed?

Kryo 4.0.2 was released in March 2018. This 5.X release has many new features and performance improvements.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn clean test

@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun
Copy link
Member

Thank you for making a PR, @nicknezis .

@dongjoon-hyun dongjoon-hyun changed the title Updated to Kryo 5.0.3 [SPARK-34023][BUILD] Updated to Kryo 5.0.3 Jan 6, 2021
@dongjoon-hyun
Copy link
Member

I created SPARK-34023 for this.

@dongjoon-hyun
Copy link
Member

cc @srowen

@LuciferYang
Copy link
Contributor

@nicknezis Are there any exciting new features in Kryo 5.x?

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Test build #133724 has finished for PR 31059 at commit 206bd63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this actually updates the kryo version used? the pom.xml files were not changed.
This could be fine, but we'll have to see if it breaks anything once Spark actually uses it.

The migration guide is at https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5
For example requiring registration by default may be a breaking change that we have to explicitly disable.

docs/tuning.md Outdated Show resolved Hide resolved
@nicknezis
Copy link
Author

I don't think this actually updates the kryo version used? the pom.xml files were not changed.
This could be fine, but we'll have to see if it breaks anything once Spark actually uses it.

The migration guide is at https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5
For example requiring registration by default may be a breaking change that we have to explicitly disable.

Yes I was a little bit unsure about this. The pom.xml files didn't seem to reference a version. And the previous PR #22179 that updated the version to 4.0.2 seemed to only touch these files. If you can let me know where the version information is actually stored, I can update that and redo any testing and follow-up code edits.

@srowen
Copy link
Member

srowen commented Jan 6, 2021

Yeah I'm pretty sure because it only comes in transitively via chill. Updating it might be a good thing, but, might break chill. Note that chill uses a shaded version of kryo. You can try to override the kryo-shaded dependency in the pom.xml file, but maybe better to see if there is a newer chill release that already uses it?

@nicknezis
Copy link
Author

Ahh OK perfect. And is Chill the only thing that's bringing it in as a transitive dependency? I'll do more research down that path and look into upgrading it. I recently did the upgrade for Apache Heron and it wasn't that big of a change. If it isn't a simple chill version bump, should we close this pull request and keep SPARK-34023 for me to provide updates?

@srowen
Copy link
Member

srowen commented Jan 6, 2021

It looks like it's just chill. Spark doesn't use it directly though depends on its behavior. 0.9.5 looks like the latest chill release though. It may Just Work if you bump the kryo version, and could be worth a shot. If so that could be fine; only downside is now we manually manage the kryo version too, but, maybe that's OK. Yes keep the JIRA open / update the title/description.

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38345/

@nicknezis
Copy link
Author

Oh, I think I'm going to try to update Chill itself to support the newer Kryo. Then we can just bump the version of Chill in Spark. I don't think we should add the specific version of Kryo in Spark if it's not needed. I assume those dep files I manually updated are dynamically generated by some process? Is this something I would need to do as part of the process in updating Chill? This would explain why I was mistaken when referencing the older pull request that mentioned updating Kryo to 4.0.2.

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38345/

@srowen
Copy link
Member

srowen commented Jan 6, 2021

Yes they're generated by the build process to detect unexpected changes in dependencies. There is a script in dev/ that can update it. But the file itself doesn't do anything beyond that check.

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Test build #133757 has finished for PR 31059 at commit 0d96273.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 6, 2021

Thanks for working on this, @nicknezis !

Kryo 4.0.2 was released in March 2018. This 5.X release has many new features and performance improvements.

I'd like to see if this change can actually improves performance. If possible, could you run the kryo benchmark?

https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/serializer/KryoBenchmark.scala

@nicknezis
Copy link
Author

Thanks for working on this, @nicknezis !

Kryo 4.0.2 was released in March 2018. This 5.X release has many new features and performance improvements.

I'd like to see if this change can actually improves performance. If possible, could you run the kryo benchmark?

https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/serializer/KryoBenchmark.scala

Until I get the opportunity to run the Spark benchmark, this Kryo benchmark might give some insight into what the changes might look like.

@nicknezis
Copy link
Author

@srowen Sharing a link to the Google Groups discussion about what approach to take to update to Kyro 5 in Chill + Spark. I would really appreciate your input.

@srowen
Copy link
Member

srowen commented Jan 8, 2021

If updating chill means updating to kryo 5 and that means changing some imports in Spark, that's probably OK. Yes, kryo ends up in the user app namespace and that's a change that could be visible, but I don't think we expect users to rely on kryo directly. Still that might be a point up for discussion.

@srowen srowen closed this Mar 1, 2021
@nicknezis
Copy link
Author

I'm sorry I lost steam on this. I plan to revisit, but got stuck in a morass of updating Chill, Storm and other dependencies. I think I have the Chill dependency updated. Working through the Spark changes now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants