Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Use Immutables for TableMetadata #3580

Closed
wants to merge 1 commit into from

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Nov 19, 2021

Motivation

  • Use true immutable objects that are type-safe, thread-safe, null-safe
  • Get builder classes for free

This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing.

  • Immutable objects are serialization ready (including JSON and its binary forms)
  • Supports lazy, derived and optional attributes
  • Immutable objects are constructed once, in a consistent state, and can be safely shared
    • Will fail if mandatory attributes are missing
    • Cannot be sneakily modified when passed to other code
  • Immutable objects are naturally thread-safe and can therefore be safely shared among threads
    • No excessive copying
    • No excessive synchronization
  • Object definitions are pleasant to write and read
    • No boilerplate setter and getters
    • No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control.

This is an alternative implementation to #2957 and makes TableMetadata truly immutable.

@nastra nastra marked this pull request as draft November 19, 2021 10:13
@nastra nastra force-pushed the tablemetadatabuilder branch 2 times, most recently from dc4d48c to 5124f87 Compare November 19, 2021 10:54
@@ -63,8 +63,8 @@ subprojects {
pluginManager.withPlugin('com.palantir.baseline-error-prone') {
tasks.withType(JavaCompile).configureEach {
options.errorprone.errorproneArgs.addAll (
// error-prone is slow, don't run on tests and generated src
'-XepExcludedPaths:.*/(test|generated-src)/.*',
// error-prone is slow, don't run on tests/generated-src/generated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so that generated immutable classes aren't checked by errorprone

@nastra nastra force-pushed the tablemetadatabuilder branch 2 times, most recently from e14a4fb to ecbe310 Compare November 19, 2021 13:43
this.sortOrdersById = indexSortOrders(sortOrders);
@Value.Check
protected void check() {
Preconditions.checkArgument(specs() != null && !specs().isEmpty(), "Partition specs cannot be null or empty");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left these checks untouched, but these fields cannot be null, since they don't have a Nullable annotation and the Immutables lib will throw an error if a these fields receive a null value

@nastra nastra force-pushed the tablemetadatabuilder branch from 9913dea to ecbe310 Compare November 19, 2021 14:36
@nastra nastra marked this pull request as ready for review November 19, 2021 15:18
@rdblue
Copy link
Contributor

rdblue commented Nov 19, 2021

I don't think that this is a good alternative to the other builder. The things we want to fix with the other builder are to make the builder smarter. It doesn't really help just to have a builder, we want to put more logic in there.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for my own understanding around Immutables and your experience with them:

Are there any commonly followed practices for checking in (or not checking in) generated code? Is that something people usually vendor or not in your experience?

And what about for things that get stored long term for state? If the object definition changes, will the immutable library give us the ability to support / migrate definitions when users read things that might have been built from earlier versions of the objects? Have you experienced this to be an issue in practice?

Thanks for demonstrating this alternative approach. I don't like having to write a lot of boilerplate, but I do also have some concerns about maintaining serdes for persisted objects that are serialized / deserialized with immutables (or any other annotation based serde generation).

I admittedly don't have a strong opinion in this area as I come from Scala where things like this are very standard (generated code, heavy usage of implicits, etc), though people choose to use them to varying degrees. I know it has definitely bitten me in the past when somebody changed a class in a way that wasn't fully compatible and codegen somewhat hid that (the need to provide a schema upgrade path for persisted objects).

@kbendick
Copy link
Contributor

kbendick commented Nov 21, 2021

Looking at the docs, it looks like we can write resolvers for classes to control how things are read via derived attribute, which is like a one time setter method (for having logic to compute a field). There seems to be a lot of things a builder would need. Using a library like this would be nice. But would this solve the overriding concerns over there?

If we used it, I'd maybe prefer to see it start being used in newer places or with less critical things. Again, in Scala case classes and then associated objects with apply and unapply methods are more my background.

TableMetadata is a really core part of the API and it has a builder. Though I wold love to use a library like this. It would save a lot of time.

@nastra
Copy link
Contributor Author

nastra commented Nov 22, 2021

Question for my own understanding around Immutables and your experience with them:

Are there any commonly followed practices for checking in (or not checking in) generated code? Is that something people usually vendor or not in your experience?

The generated code is not checked in. All you have is your class definition that have the Immutables-specific annotations.

And what about for things that get stored long term for state? If the object definition changes, will the immutable library give us the ability to support / migrate definitions when users read things that might have been built from earlier versions of the objects? Have you experienced this to be an issue in practice?

This isn't usually a big concern. When something like this evolves, one can switch from the generated JSON serializers to custom serializers and mark fields as optional in order to have greater flexibility in what & how things are being serialized/deserialized. This for example uses auto-generated JSON serdes, but you always have the option to provide your own serializer and handle object evolution there. Here and here is an example for using a custom serializer with Immutables.

@nastra
Copy link
Contributor Author

nastra commented Nov 22, 2021

Looking at the docs, it looks like we can write resolvers for classes to control how things are read via derived attribute, which is like a one time setter method (for having logic to compute a field). There seems to be a lot of things a builder would need. Using a library like this would be nice. But would this solve the overriding concerns over there?

I'm not exactly sure I understand what you mean by writing resolvers. Are you referring to https://immutables.github.io/immutable.html#derived-attributes?

If we used it, I'd maybe prefer to see it start being used in newer places or with less critical things. Again, in Scala case classes and then associated objects with apply and unapply methods are more my background.

I'm perfectly fine with using Immutables at a less critical place. I rather wanted to show an alternative approach that doesn't require manually writing Builders.

@nastra
Copy link
Contributor Author

nastra commented Nov 22, 2021

I don't think that this is a good alternative to the other builder. The things we want to fix with the other builder are to make the builder smarter. It doesn't really help just to have a builder, we want to put more logic in there.

@rdblue could you be more specific in what you'd like to see in such a Builder? Then we could check and see whether Immutable builders would support that. The Immutables lib provides more than just generated builders, but if the generated builders don't do what we want them to do, then there are also alternative ways of dealing with this (constructing objects via constructors and not builders, ...).

Looking at https://github.com/apache/iceberg/pull/2957/files#diff-c55bb00afe1e6529dc13f2421f18bbb557de1a8111573400d4d3d22df573a9b9R28 I see a straight-forward builder implementation that just sets things. The downside I see with this approach is that you also have to write tests for that builder, since you'd like to make sure that all your fields are properly set and such. I've seen examples in the past where hand-made Builders didn't have unit tests and had bugs in them, so to me it seems it's better to have builders be generated.

I also understand that changing something like TableMetadata to using Immutables is kind of a critical place with a certain level of uncertainty. I'm perfectly fine with starting at other places in the code if you think the Immutables lib is something that's useful.
My personal opinion on using Immutables is that it forces you to adhere to good coding practices, since you're working with Immutable data structures, which brings its own (positive) implications. Additionally, it saves you from writing manual and repetitive code (builders, serializers/deserializers, ...).

@rymurr
Copy link
Contributor

rymurr commented Nov 26, 2021

I quite like immutables and I think its very flexible. Especially for serde stuff. This would simplify a lot of our data objects and would introduce a bit of certainty/good practices into them w/ the immutability guarantees. However, it probably makes sense to start w/ a less critical object than TableMetadata as a proof of concept.

@nastra nastra force-pushed the tablemetadatabuilder branch from ecbe310 to afa1cca Compare December 2, 2021 08:53
The motivation for this change is
* `TableMetadata` is truly immutable
* we get a Builder for free, which allows easy creation of new `TableMetadata` instances
@nastra nastra force-pushed the tablemetadatabuilder branch from c7323f9 to b8bdd5b Compare December 3, 2021 08:15
@nastra
Copy link
Contributor Author

nastra commented Dec 7, 2021

Closing in favor of #3664

@nastra nastra closed this Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants