-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Make Metadata tables serializable #2046
Conversation
@marton-bod, @lcspinter: Could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (non-binding)
Just a couple of observations.
We store the TableOperations, Table and name redundantly in each child class of BaseMetadataTable? Is there any reason we cannot move them one level higher?
Should we consider the introduction of a new ancestor class (SerializableBaseTable) for BaseTable and BaseMetadataTable? In my opinion, it would improve the readability of the code.
@rdblue What do you think?
@openinx, @aokolnychyi: After talking with @rdblue, he said you might be interested in reviewing this change. Would you be so kind to review? Thanks, |
@pvary I skimmed this PR, seems I need more background to understand this change. Let me see the previous committed PRs. |
Thanks @openinx for taking the time to check the PR! I would like to give some context - hope this helps: In nutshell what we are trying to archive here to have a way to Serialize/Deserialize not only BaseTable-s, but every MetadataTable as well. Thanks, |
core/src/main/java/org/apache/iceberg/util/SerializationUtil.java
Outdated
Show resolved
Hide resolved
@@ -186,4 +194,34 @@ public Transaction newTransaction() { | |||
public String toString() { | |||
return name(); | |||
} | |||
|
|||
abstract Object writeReplace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This object has access to table()
, io()
, and name()
already. Would it be easier to expose protected metadataLocation()
and metadataTableType()
methods instead of writeReplace()
in each implementation? Then this method could be implemented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed your recommendation. LGTM
The thing is that metadataLocation()
is exactly the same for every implementation, but we need this if we do not want to relax the private
restriction on ops
. We might be further simplify the code if we move ops
to BaseMetadataTable
.
What do you think? Or it would be a bigger change which do no worth to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make ops
a protected method. Let's get this in and we can clean that up later.
dc21ead
to
8088631
Compare
When used from Hive queries it would be useful to serialize the tables at the time of the query compilation for multiple reasons:
The Serialization is implemented for BaseTables in #1920. This PR aims to do the same for the Metadata tables too.
Things which might worth to check:
writeReplace()
method is exactly the same for the specific types for metadata tables. Might worth to consider movingops
,table
,name
to theBaseMetadataTable
. Did not do this because the change I did is not significant compared to the other quasi duplicated code, and there might be other reasons I am not aware of which would prevent this refactor.