-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core | Spark | Integrations] : Fix kryo serialization failure for FileIO #5437
Conversation
b7cc12c
to
3532aa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM.
You might want to put a small comment in the code above the creation of tableSpecificCatalogProperties
and mention that ImmutableMap
isn't used to support Kryo serde as that's the normal style.
Totally up to you though.
e058629
to
95bc117
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right way to fix the problem.
The approach in this PR is to avoid passing ImmutableMap
into initializeFileIO
, but that makes S3FileIO vulnerable to this problem any time the caller doesn't know not to pass that type of map. Instead, S3FileIO
should be responsible for making a copy of the incoming map that can be serialized with Kryo.
aws/src/test/java/org/apache/iceberg/aws/glue/TestGlueCatalog.java
Outdated
Show resolved
Hide resolved
95bc117
to
ecedfc7
Compare
This looks like the right fix to me. Thanks, @singhpk234! Could you also add tests for the other FileIO implementations? |
8f40d83
to
159e84a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I would define the version number for com.twitter:chill-java:0.10.0
in versions.props
and have build.gradle
only use the actual artifact (without the version): testImplementation 'com.twitter:chill-java
spark/v3.3/spark/src/test/java/org/apache/iceberg/TestFileIOSerialization.java
Outdated
Show resolved
Hide resolved
1f3d860
to
3aef768
Compare
3aef768
to
6118aa8
Compare
6118aa8
to
58e2082
Compare
spark/v3.3/spark/src/test/java/org/apache/iceberg/TestFileIOSerialization.java
Outdated
Show resolved
Hide resolved
Thanks, @singhpk234! |
Thanks @rdblue @kbendick @nastra @amogh-jahagirdar for your awesome reviews :) !!! |
@singhpk234 I am still facing this issue using |
@akshay-kokate-06 what issue are you seeing exactly? Could you provide the stack trace please? |
About the change
presently spark queries fails when using S3fileIO & GlueCatalog when being used with KryoSerializer ref #5414 (comment). This happens because Immutable map part of S3FileIO properties is not serializable with Kryo and seralizers for it not available in Twitter Chill lib as well (which also spark uses). This PR attemps to fix this by using java collections instead which spark will be able to ser/de as it's serializer is present in Chill lib.
Solves #5414
Testing Done
Manual Test
UT to validate the change fix the issue, without the fix the UT fails with exception mentioned in ticket #5414 (comment)