-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix non-setting row-lineage from table properties on initial table creation #12307
Core: Fix non-setting row-lineage from table properties on initial table creation #12307
Conversation
@RussellSpitzer I believe you're working on the row-lineage feature. When you have a chance, could you check this issue and review the change? |
This was intentional. We are avoiding writing the property so that Table Metadata Versions that do not include row-lineage at all will not populate the fields. |
Ah wait I understand this more in your code than in this description. You are just saying that you can't enable row-lineage during a Create statement. That sounds like it's fine to fix. |
@@ -146,6 +151,7 @@ static TableMetadata newTableMetadata( | |||
.setDefaultSortOrder(freshSortOrder) | |||
.setLocation(location) | |||
.setProperties(properties) | |||
.setRowLineage(rowLineage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to set RowLineage to false if it is unset. So we have 2 options here
- Set it to false but only if the table is V3
- Set it to true only if it is true and leave it absent otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks for checking. Let me fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually don't have to do anything! I forgot I already did this in "setRowLineage". It won't set the field unless it's getting set to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer If you don't set row-lineage
in the table properties, it's unset, I mean it's kept empty because setRowLineage
can handle null argument such as:
iceberg/core/src/main/java/org/apache/iceberg/TableMetadata.java
Lines 1525 to 1528 in bcbbd03
private Builder setRowLineage(Boolean newRowLineage) { | |
if (newRowLineage == null) { | |
return this; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, it is. Thanks for letting me know. So I believe this part should be fine.
Thanks so much for the quick review! Yes, currently, it's not possible to enable row-lineage in CreateTable statement, and it's always required to set Additionally, after creating an Iceberg table without setting This fix enables the row lineage feature once |
Thanks @tomtongue ! |
Thanks so much for the quick review! @RussellSpitzer |
* Site: Learn More to point to Spark QuickStart Doc (apache#12272) * Build: Bump datamodel-code-generator from 0.27.2 to 0.28.1 (apache#12290) * Spark 3.5: Fix job description of RewriteTablePathSparkAction (apache#12282) * Build: Bump io.netty:netty-buffer from 4.1.117.Final to 4.1.118.Final (apache#12287) Bumps [io.netty:netty-buffer](https://github.com/netty/netty) from 4.1.117.Final to 4.1.118.Final. - [Commits](netty/netty@netty-4.1.117.Final...netty-4.1.118.Final) --- updated-dependencies: - dependency-name: io.netty:netty-buffer dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Build: Bump software.amazon.awssdk:bom from 2.30.16 to 2.30.21 (apache#12286) Bumps software.amazon.awssdk:bom from 2.30.16 to 2.30.21. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * OpenAPI: Add overwrite option when registering a table (apache#12239) * OpenAPI: Add optional overwrite when registering table * simplify to overwrite * Add the article to the description Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> * Update generated python as well Signed-off-by: Hongyue Zhang <steveiszhy@gmail.com> * Fix import order --------- Signed-off-by: Hongyue Zhang <steveiszhy@gmail.com> Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> * Build: Bump mkdocs-material from 9.6.3 to 9.6.4 (apache#12284) Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.6.3 to 9.6.4. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.6.3...9.6.4) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Core: Fix Enabling row-lineage during Create Table (apache#12307) * API: Reject unknown type for required fields and validate defaults (apache#12302) * API: Fix TestInclusiveMetricsEvaluator notStartsWith tests. (apache#12303) * Core: Add variant type support to utils and visitors (apache#11831) * Core: Fix CI: Update tests with UnknownType from required to optional (apache#12316) * Docs: Refactor site navigation bar (apache#12289) * Parquet: Implement Variant readers (apache#12139) * Docs: Add rewrite_table_path Spark Procedure (apache#12115) * Parquet: Fix errorprone warning (apache#12324) * Docs: Add Apache Amoro docs (apache#11966) * Parquet: Fix performance regression in reader init (apache#12305) * Core: Fallback to GET requests for namespace/table/view exists checks (apache#12314) Co-authored-by: Daniel Weeks <dweeks@apache.org> * Docs: Fix refs in Apache Amoro docs (apache#12332) * Revert "Core: Serialize `null` when there is no current snapshot (apache#11560)" (apache#12312) This reverts commit bf8d25f. * Parquet: Fix performance regression in reader init (apache#12305) (apache#12329) Co-authored-by: Bryan Keller <bryanck@gmail.com> * Checkstyle: Apply the same generic type naming rules to interfaces and classes (apache#12333) --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Hongyue Zhang <steveiszhy@gmail.com> Co-authored-by: Danica Fine <danica.fine@gmail.com> Co-authored-by: Manu Zhang <OwenZhang1990@gmail.com> Co-authored-by: Yuya Ebihara <ebyhry@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hongyue/Steve Zhang <steveiszhy@gmail.com> Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> Co-authored-by: Tom Tanaka <43331405+tomtongue@users.noreply.github.com> Co-authored-by: Ryan Blue <blue@apache.org> Co-authored-by: Aihua Xu <aihuaxu@gmail.com> Co-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by: ConradJam <jam.gzczy@gmail.com> Co-authored-by: Bryan Keller <bryanck@gmail.com> Co-authored-by: Daniel Weeks <dweeks@apache.org> Co-authored-by: pvary <peter.vary.apache@gmail.com>
Overview
Fix the
row-lineage
table property reflection onenableRowLineage
.Issue
Currently to enable the Row Lineage feature from the Iceberg table properties, it's required to run the following operations:
At the first step "Create an Iceberg table", even if you set
row-lineage
totrue
in the table properties, the property isn't reflected on the Iceberg table's metadata.json. Therefore, to enable that feature, you need to additionally run table properties update after creating an Iceberg table.Details
Tested two cases such as Spark and Java API
Spark case
When you create an Iceberg table using Spark like the following query,
The relevant metadata.json is stored in the specified bucket and path as below:
At this point, the metadata content (partial) is below. The content doesn't have
row-lineage
even if the parameter is in theproperties
part.And then, update the table property by the same table property like
ALTER TABLE db.rowlin SET TBLPROPERTIES('row-lineage'= 'true')
.After the query is complete, the content of the new metadata.json is below.
row-lineage
andnext-row-id
is added.Here's the diff between two metadata files:
Java API case
Example script:
After running the script, two versions of metadata.json files are created in the specified s3 bucket:
Each content of the metadata file is below: