Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Common metadata for TableMetadata and ViewMetadata #9682

Closed
wants to merge 1 commit into from

Conversation

nk1506
Copy link
Contributor

@nk1506 nk1506 commented Feb 8, 2024

Ref Issue #9514

TableMetadata and ViewMetadata have some common methods , Which have been abstracted out to IcebergMetadata now. It should help to have common code for Iceberg Table and View implementations with different catalog.

@github-actions github-actions bot added the core label Feb 8, 2024
@nk1506
Copy link
Contributor Author

nk1506 commented Feb 8, 2024

cc: @szehon-ho , @nastra , @pvary

@nk1506 nk1506 changed the title Common metadata for TableMetadata and ViewMetadata Core: Common metadata for TableMetadata and ViewMetadata Feb 8, 2024
@nastra
Copy link
Contributor

nastra commented Feb 8, 2024

Sorry for the delay. I have some feedback and I'll take a closer look at this (and the Hive + View support PR) once 1.5.0 is out

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks better. Left some comment.

For reference, we already had some existing discussion on this PR on : nk1506#9


@Value.Parameter(order = 1002)
int formatVersion();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, can we put Override and Value.parameter here, and remove it from TableMetadata? It will be more clear.

Copy link
Contributor Author

@nk1506 nk1506 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add more context? i couldn't understand this part. Do you mean something like all the Members of BaseMetadata should be with ViewMetadata too with Override? In that case I dont think we need
@Value.Parameter(order=?)

Copy link
Collaborator

@szehon-ho szehon-ho Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea exactly, is it possible to remove @Value.Parameter in the BaseMetadata, and keep the annotations all in the ViewMetadata.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nk1506 let me know if the question makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho , Since we are planning to keep common members with ViewMetadata too with @Override. Do we have a need of keeping @Value.Parameter in ViewMetadata. I think only override should works.
WDYT ? I have updated the patch with only @Override and removed @Value.Parameter from everywhere.


/**
* A base class for {@link TableMetadata} and {@link org.apache.iceberg.view.ViewMetadata} It
* abstracts out all the common metadata for table ane view.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo for ane. I would actually may be renmove the first sentence, it looks like it does not add that much more info than the first sentence. So just

A base class for {@link TableMetadata} and {@link org.apache.iceberg.view.ViewMetadata}.

* A base class for {@link TableMetadata} and {@link org.apache.iceberg.view.ViewMetadata} It
* abstracts out all the common metadata for table ane view.
*/
public interface BaseMetadata extends Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question. Is this refactoring enough?

If we Introduce BaseMetadata, I expect the TableOperations and ViewOperations to have a base class called BaseOperations which uses this BaseMetadata for interfaces like
void commit(BaseMetadata base, BaseMetadata metadata);

Without that I don't see much value in this refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we can have an end to end refactoring DRAFT PR and take a decision based on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, But i think those changes we can do later once this interface pattern gets approved. Or may be with this patch itself.
WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have a separate draft patch. I think doing that refactoring can help us know whether the current interfaces in BaseMetadata is enough or do we need more changes here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think the main refactor is over at #8907 (including baseOperations and everything). I do agree that it is a bit hard to read, I have to search over there to find what is needed and not, and so may miss something :).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually sorry the refactor of the Operations is split in #9461, #8907 is the whole patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is draft PR . with #9461 and #9682 .

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me as a refactor. I wonder if we need more input from @ajantha-bhat or @nastra , else Ill merge it in a day or two. It does need https://github.com/nk1506/iceberg/pull/10/files for context.


Map<String, String> properties();

@Nullable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly prefer not to have it, as most Iceberg classes don't have it. But understand its from the old class.

@nastra
Copy link
Contributor

nastra commented Feb 14, 2024

let's hold off on this one until 1.5.0 is out and the Hive view PR is reviewed so that we can better decide in which direction we'd want to go

@szehon-ho
Copy link
Collaborator

Can we please continue this effort , it has been stalled 2 weeks?

Let's have a PR that is the end to end version (please not on your own github, but against Iceberg github).

@nk1506
Copy link
Contributor Author

nk1506 commented Feb 29, 2024

Thanks @szehon-ho for looking into this again. I will raise another combined PR with CommonMetadata, CommonHiveOperation and Hive-View

@nk1506
Copy link
Contributor Author

nk1506 commented Mar 2, 2024

Can we please continue this effort , it has been stalled 2 weeks?

Let's have a PR that is the end to end version (please not on your own github, but against Iceberg github).

Hi @szehon-ho , I have created a new PR here #9852 .

@nastra
Copy link
Contributor

nastra commented Mar 7, 2024

I reviewed #9852 and also took a look at the metadata abstraction but I'm not convinced it actually makes stuff easier. Most of the time certain Hive methods only require 1-2 params (properties, schema, ...) from table/view metadata, so passing those directly makes more sense.

Additionally, this abstraction can actually hide errors, where we check for certain table properties by providing BaseMetadata. These properties don't exist for views, so what will happen if we check TableProperties.ENGINE_HIVE_ENABLED on a table and on view?

@szehon-ho
Copy link
Collaborator

szehon-ho commented Mar 7, 2024

Thanks, i commented on the other thread in #9852 (comment) the original reason (saw that one first)

Can you clarify this concern?

Additionally, this abstraction can actually hide errors, where we check for certain table properties by providing BaseMetadata. These properties don't exist for views, so what will happen if we check TableProperties.ENGINE_HIVE_ENABLED on a table and on view?

This is just a class hiearchy. The view now gets methods like String property(String property, String defaultValue), which is just shorthand for viewMetadata.getProperties().getOrDefault(property, defaultValue).

I believe its not strictly necessary that these methods to be added here (it seems just for convenience), all we really need to be able to share View/TableCatalog code is properties().

@nk1506
Copy link
Contributor Author

nk1506 commented Apr 4, 2024

Common code abstraction on Hive-Table/View has been parked for later. BaseMetadata for Table and View might not be required. We can fulfil the same using different strategy.

@nk1506 nk1506 closed this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants