-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix when reading struct-type data without an id in iceberg-parquet #11378
base: main
Are you sure you want to change the base?
Conversation
@nastra Can you help review this PR? Thanks a lot. |
@joyCurry30 I'll take a look early next week but this should at least have a test that reproduces the issue you're trying to fix |
Ok, I will add a test for this issue. |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
@joyCurry30 please ping me once you added the test and I'll take a look |
73776cc
to
3bc6636
Compare
@nastra Sorry for replying so late. I've added the related test cases. Please review the code for me. If you have any suggestions, I would be happy to make the changes. |
…t returns null values.
d927204
to
2108f6e
Compare
@Fokko could you review this one please? |
@@ -49,7 +49,7 @@ class ParquetWriter<T> implements FileAppender<T>, Closeable { | |||
private final long targetRowGroupSize; | |||
private final Map<String, String> metadata; | |||
private final ParquetProperties props; | |||
private final CodecFactory.BytesCompressor compressor; | |||
private final CodecFactory.BytesInputCompressor compressor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be an unrelated change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the issue, this will enable reading files that have the field-id missing for the StructType
. However, Field-IDs allow for safe evolution of an Iceberg table. With this patch, you will be able to read the table, but once you rename the column-name of the struct, it will break and brick the table again.
if (field.getId() != null) { | ||
return field.getId().intValue(); | ||
} | ||
return expected.field(field.getName()).fieldId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially falling back to name-based schema resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field-ids are null, how can I find the expected field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also left a comment on the issue itself #11214 (comment). I don't think this is a good solution since it hides the problem, as mentioned before, if you rename a struct field, it will break the table. Instead, I would leverage name mapping as suggested on the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments, I will reconsider and try to re-address the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any time, thank you for the PR, but I'm afraid that it will cause issues downstream
Fix issue: #11214
Fix when reading struct-type data without an id in iceberg-parquet, it returns null values.