Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] API/Core: View support #4657

Closed
wants to merge 2 commits into from
Closed

[WIP] API/Core: View support #4657

wants to merge 2 commits into from

Conversation

jzhuge
Copy link
Member

@jzhuge jzhuge commented Apr 28, 2022

Support view.

The view spec has been added in #3188.

TODO:

  • Fix schema handling
  • Add unit tests

Follow-ups:

  • Add transactions for replace or createOrReplace
  • Update field comments
  • Support Spark view catalog

@jzhuge jzhuge changed the title [WIP] View support [WIP] API/Core: View support Apr 28, 2022
@melin
Copy link

melin commented Apr 28, 2022

PR support spark sql?

@jzhuge
Copy link
Member Author

jzhuge commented Apr 28, 2022

Yes, will follow up with PR to support Spark ViewCatalog once the commit for ViewCatalog interfaces in apache/spark#35636 is merged.

@nastra nastra self-requested a review April 29, 2022 07:20
@jzhuge jzhuge force-pushed the view-support branch 4 times, most recently from 343c899 to eda62c2 Compare May 6, 2022 19:31
jzhuge added 2 commits May 11, 2022 16:08
Co-authored-by: anjalinorwood@gmail.com
Co-authored-by: anjalinorwood@gmail.com
* @return this catalog's name
*/
default String name() {
return toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be default, otherwise it will show the Java object string by default that is unique for each instance.

* @return this catalog's name
*/
default String name() {
return toString();
Copy link
Contributor

@jackye1995 jackye1995 May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be default, otherwise it will show the Java object string by default that is unique for each instance. We can just leave this to implementation

* @return instance of {@link View} implementation referred by {@code tableIdentifier}
* @throws NoSuchViewException if the view does not exist
*/
View loadView(TableIdentifier identifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thould throws NoSuchViewException be a part of API declaration?

* @return instance of {@link View} implementation referred by {@code tableIdentifier}
* @throws NoSuchViewException if the view does not exist
*/
View loadView(TableIdentifier identifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thould throws NoSuchViewException be a part of API declaration?

*/
View createView(
TableIdentifier identifier,
List<ViewRepresentation> representations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a set?

*/
View createView(
TableIdentifier identifier,
List<ViewRepresentation> representations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a set?

}

/**
* Drop a view and delete all data and metadata files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought view is only for logical view. Are we going to make this also materialized view?

}

/**
* Drop a view and delete all data and metadata files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought view is only for logical view. Are we going to make this also materialized view?

void renameView(TableIdentifier from, TableIdentifier to);

/**
* Invalidate cached view metadata from current catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unclear to me why this is a part of the API, it feels like an engine-side cache, how is data actually cached?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this

void renameView(TableIdentifier from, TableIdentifier to);

/**
* Invalidate cached view metadata from current catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unclear to me why this is a part of the API, it feels like an engine-side cache, how is data actually cached?

* @param definition a view definition
* @param properties a string map of view properties
*/
default View createView(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth using a builder style like createTable?

* @param definition a view definition
* @param properties a string map of view properties
*/
default View createView(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth using a builder style like createTable?

import java.util.List;
import org.apache.iceberg.Schema;

public interface ViewDefinition extends ViewRepresentation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want this extension? I thought we just want different view representation implementations

import java.util.List;
import org.apache.iceberg.Schema;

public interface ViewDefinition extends ViewRepresentation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want this extension? I thought we just want different view representation implementations

* When committing, these changes will be applied to the current view metadata.
* Commit conflicts will be resolved by applying the pending changes to the new view metadata.
*/
public interface ViewUpdateProperties extends PendingUpdate<Map<String, String>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateViewProperties?

* When committing, these changes will be applied to the current view metadata.
* Commit conflicts will be resolved by applying the pending changes to the new view metadata.
*/
public interface ViewUpdateProperties extends PendingUpdate<Map<String, String>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateViewProperties?

* @param key a String key
* @param value a String value
* @return this for method chaining
* @throws NullPointerException If either the key or value is null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the standard is generally IllegalArgumentException if either the key or value is null.

*
* @return a long ID for this version's parent, or null if it has no parent
*/
Integer parentId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Long?

*
* @return a long ID
*/
int versionId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a long?

}

public Builder sql(String value) {
sql = value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud: If I'm not mistaken the spec basically lets the SQL be free form and with a dialect, an engine can do whatever parsing/analysis it needs to translate to its desires. However, certain aspects such as a view cannot reference itself (either directly or indirectly) seem universal across engines and it seems like we would want some layer of analysis within Iceberg to prevent that. In other words some semantic analysis should be baked into the spec.

But I haven't fully thought this through, maybe there's something I'm missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind on this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up later.

@nastra nastra mentioned this pull request Jun 1, 2022
Copy link

github-actions bot commented Aug 8, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 8, 2024
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants