Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to use a NestedField builder and remove nested defaults. #1

Merged
merged 4 commits into from
Oct 1, 2024

Conversation

rdblue
Copy link

@rdblue rdblue commented Sep 3, 2024

While reviewing apache#9502, I accumulated some changes:

  • Added a builder instead of the new methods to create NestedField. I think this is a bit cleaner since we are now at the point where the combinations (with/without docs, with/without default, etc.) are going to get much larger.
  • Try to validate the default values when creating NestedField. To do this, I create a Literal and convert to the field's type using to. This gives behavior for interpreting default values that is identical to the expressions library. The drawback is that it doesn't support nested types.
  • To make the validation above work, I removed test cases for nested types. That allowed also removing the code for Iceberg internal to Avro generics conversion.
  • Converted the tests to JUnit5

return new Builder(true, name);
}

public static class Builder {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wmoustafa, I'm interested to get your feedback on the builder vs new factory methods.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ryan. Left comment in the main conversation.

private static Object castDefault(Object defaultValue, Type type) {
if (type.isNestedType() && defaultValue != null) {
throw new IllegalArgumentException(
String.format("Invalid default value for %s: %s (must be null)", type, defaultValue));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we don't support non-primitive literals so this removes support for nested types. We can add that back but I think this is a good start.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we support non-primitive types, this would be heavy-weight. Do we want to do such a heavy-weight casting in the constructor? This was the main reason I did not include it in the original PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The reason not to extend this to non-primitives right now is that we will need to do even more heavy-weight work for nested types

Comment on lines +502 to +506
public Builder withDefault(Object defaultValue) {
this.initialDefault = defaultValue;
this.writeDefault = defaultValue;
return this;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would skip this method since semantics of initial and write default are too different. Further, initial cannot change but write can. Also, we may get unexpected results of chaining this with the other two methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about these issues originally, but ended up not adding validations to mirror the behavior of the original version that used factory methods. I think it's fine to have a method that sets both as a convenience.

The question is whether to enforce default value behavior here or elsewhere. I think the validation should be done elsewhere because:

  1. As I noted above, the builder is doing the same thing as the factory method -- there was no validation before
  2. There isn't enough information to validate here. This could only allow setting the initialDefault once, but what does that accomplish? It only limits the builder API by assuming what the caller is doing when constructing a field. It would be perfectly valid to use this builder to copy another field (which sets initialDefault) and then change the initial default to create a new field with a different one. Schema validation is not the builder's responsibility.

Comment on lines +477 to +479
public Builder withId(int id) {
this.id = id;
return this;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already withFieldId(). How about deprecating/removing it before it spreads?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the public API. We can deprecate it, but we can't remove it until v2.

Copy link
Owner

@wmoustafa wmoustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rdblue for the suggestions. Overall I am fine with adding a Builder. I also thought that the NestedField constructor started to become bloated, but also did not feel strongly about exposing both constructor and builder.

My first preference is to continue consistently with the existing API (e.g., NestedField.of) and revise the API in a major version, but I am okay with the current refactor in this PR as well. I see value in both.

I left a few comments around Literals and non-primitives. It is okay to postpone non-primitives as long as we are clear on the path forward.

private static Object castDefault(Object defaultValue, Type type) {
if (type.isNestedType() && defaultValue != null) {
throw new IllegalArgumentException(
String.format("Invalid default value for %s: %s (must be null)", type, defaultValue));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we support non-primitive types, this would be heavy-weight. Do we want to do such a heavy-weight casting in the constructor? This was the main reason I did not include it in the original PR.

* @return a Literal for the given value
* @throws IllegalArgumentException if the value has no literal implementation
*/
public static <T> Literal<T> lit(T value) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite clear on the need for literals. Literals are part of expressions but not sure if default values are first-class expressions. I see we could use them in expressions, but as part of the NestedField API, I do not see a need to bring in expressions yet.

Also note that the input value comes from SingleValueParser which returns Iceberg data. I am guessing the input to Literals is weakly defined now (since it is constrained to primitives) and it is not clear if it will be Iceberg data or Java data, etc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this is to support the same set of values as expressions do.

Adding the default value methods to the API creates a weird situation in which we need to coerce values from Object to a value of the appropriate type and internal representation. Rather than create completely different code to do that, I'm reusing what we already support for expressions by creating a literal (which handles coercing from Object) and then converting it to a specific type using to (which handles conversion to the right Iceberg representation).

I think it's important to have consistency here.

throw new IllegalArgumentException(
String.format("Invalid default value for %s: %s (must be null)", type, defaultValue));
} else if (defaultValue != null) {
return Expressions.lit(defaultValue).to(type).value();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the description you mentioned that literals help with validations. Cannot we validated without bringing in literals?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should support the same values and coercion that we have implemented for literals, so I think this is the right way forward.

@wmoustafa
Copy link
Owner

wmoustafa commented Oct 1, 2024

Thanks @rdblue for the PR. Let us follow up with nested types in a separate PR. For now I will merge this PR to my branch and apply minor fixes to the Builder API as discussed in the comments.

@wmoustafa wmoustafa merged commit 58957e7 into wmoustafa:default-value Oct 1, 2024
44 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants