-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Use Immutables for TableMetadata #3580
Conversation
dc4d48c
to
5124f87
Compare
@@ -63,8 +63,8 @@ subprojects { | |||
pluginManager.withPlugin('com.palantir.baseline-error-prone') { | |||
tasks.withType(JavaCompile).configureEach { | |||
options.errorprone.errorproneArgs.addAll ( | |||
// error-prone is slow, don't run on tests and generated src | |||
'-XepExcludedPaths:.*/(test|generated-src)/.*', | |||
// error-prone is slow, don't run on tests/generated-src/generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so that generated immutable classes aren't checked by errorprone
e14a4fb
to
ecbe310
Compare
this.sortOrdersById = indexSortOrders(sortOrders); | ||
@Value.Check | ||
protected void check() { | ||
Preconditions.checkArgument(specs() != null && !specs().isEmpty(), "Partition specs cannot be null or empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left these checks untouched, but these fields cannot be null, since they don't have a Nullable
annotation and the Immutables lib will throw an error if a these fields receive a null
value
9913dea
to
ecbe310
Compare
I don't think that this is a good alternative to the other builder. The things we want to fix with the other builder are to make the builder smarter. It doesn't really help just to have a builder, we want to put more logic in there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for my own understanding around Immutables
and your experience with them:
Are there any commonly followed practices for checking in (or not checking in) generated code? Is that something people usually vendor or not in your experience?
And what about for things that get stored long term for state? If the object definition changes, will the immutable library give us the ability to support / migrate definitions when users read things that might have been built from earlier versions of the objects? Have you experienced this to be an issue in practice?
Thanks for demonstrating this alternative approach. I don't like having to write a lot of boilerplate, but I do also have some concerns about maintaining serdes for persisted objects that are serialized / deserialized with immutables
(or any other annotation based serde generation).
I admittedly don't have a strong opinion in this area as I come from Scala where things like this are very standard (generated code, heavy usage of implicits, etc), though people choose to use them to varying degrees. I know it has definitely bitten me in the past when somebody changed a class in a way that wasn't fully compatible and codegen somewhat hid that (the need to provide a schema upgrade path for persisted objects).
Looking at the docs, it looks like we can write resolvers for classes to control how things are read via derived attribute, which is like a one time setter method (for having logic to compute a field). There seems to be a lot of things a builder would need. Using a library like this would be nice. But would this solve the overriding concerns over there? If we used it, I'd maybe prefer to see it start being used in newer places or with less critical things. Again, in Scala case classes and then associated objects with apply and unapply methods are more my background. TableMetadata is a really core part of the API and it has a builder. Though I wold love to use a library like this. It would save a lot of time. |
The generated code is not checked in. All you have is your class definition that have the Immutables-specific annotations.
This isn't usually a big concern. When something like this evolves, one can switch from the generated JSON serializers to custom serializers and mark fields as optional in order to have greater flexibility in what & how things are being serialized/deserialized. This for example uses auto-generated JSON serdes, but you always have the option to provide your own serializer and handle object evolution there. Here and here is an example for using a custom serializer with Immutables. |
I'm not exactly sure I understand what you mean by writing resolvers. Are you referring to https://immutables.github.io/immutable.html#derived-attributes?
I'm perfectly fine with using Immutables at a less critical place. I rather wanted to show an alternative approach that doesn't require manually writing Builders. |
@rdblue could you be more specific in what you'd like to see in such a Builder? Then we could check and see whether Immutable builders would support that. The Immutables lib provides more than just generated builders, but if the generated builders don't do what we want them to do, then there are also alternative ways of dealing with this (constructing objects via constructors and not builders, ...). Looking at https://github.com/apache/iceberg/pull/2957/files#diff-c55bb00afe1e6529dc13f2421f18bbb557de1a8111573400d4d3d22df573a9b9R28 I see a straight-forward builder implementation that just sets things. The downside I see with this approach is that you also have to write tests for that builder, since you'd like to make sure that all your fields are properly set and such. I've seen examples in the past where hand-made Builders didn't have unit tests and had bugs in them, so to me it seems it's better to have builders be generated. I also understand that changing something like |
I quite like immutables and I think its very flexible. Especially for serde stuff. This would simplify a lot of our data objects and would introduce a bit of certainty/good practices into them w/ the immutability guarantees. However, it probably makes sense to start w/ a less critical object than |
ecbe310
to
afa1cca
Compare
The motivation for this change is * `TableMetadata` is truly immutable * we get a Builder for free, which allows easy creation of new `TableMetadata` instances
c7323f9
to
b8bdd5b
Compare
Closing in favor of #3664 |
Motivation
This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing.
This is an alternative implementation to #2957 and makes
TableMetadata
truly immutable.