Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: Push Iceberg table property values to HMS table properties #2123

Merged
merged 6 commits into from
Jan 25, 2021

Conversation

marton-bod
Copy link
Collaborator

As agreed with the community:

  • Iceberg table properties are the canonical source of truth
  • HMS table properties should be maintained as much as possible to be in sync with the Iceberg table, but it can only happen on a best effort basis

This PR makes the following changes:

  • Ensures that all Iceberg table properties are propagated to the HMS table during HiveTableOperations commit
  • All HMS table properties are pushed down to Iceberg as well during table creation (except for metadata location and spec props)
  • Refactors the various property check assertions scattered throughout various test cases into a single property-focused unit test case

What is left out and should be done in the future:

  • Push property changes occurring via Hive DDL (ALTER TABLE SET TBLPROPERTIES) down to Iceberg as well. Currently this can't be done reliably because the HiveMetaHook interface only contains a preAlterTable method, but no commitAlterTable method. We'll need to extend this interface and include the change in an upcoming Hive upstream release.

@marton-bod
Copy link
Collaborator Author

Build failure is due to the flaky Spark tests:

org.apache.iceberg.spark.extensions.TestCopyOnWriteDelete > testDeleteWithSerializableIsolation[catalogName = spark_catalog, implementation = org.apache.iceberg.spark.SparkSessionCatalog, config = {type=hive, default-namespace=default, clients=1, parquet-enabled=false, cache-enabled=false}, format = avro, vectorized = false] FAILED
    java.lang.RuntimeException: Failed to get table info from metastore default.table

        Caused by:
        org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed)

            Caused by:
            java.net.SocketException: Broken pipe (Write failed)

@pvary
Copy link
Contributor

pvary commented Jan 25, 2021

What do you think about changing the title @marton-bod?

I think the title below describes the change better:

Hive: Push Iceberg table property values to HMS table properties

Thanks,
Peter

@marton-bod
Copy link
Collaborator Author

marton-bod commented Jan 25, 2021

Overall I agree, I think that describes the crux of the change better. I went with the original title because we also relax the initial set of properties we push down from HMS to Iceberg in this patch. However, until we have a pre/postAlter hook on the HiveMetaHook, this is mostly a one-way street indeed.

@marton-bod marton-bod changed the title Hive: Configuration sync between HMS and Iceberg table properties Hive: Push Iceberg table property values to HMS table properties Jan 25, 2021
@pvary pvary merged commit c7ff6d5 into apache:master Jan 25, 2021
@pvary
Copy link
Contributor

pvary commented Jan 25, 2021

Thanks @marton-bod for the change!

autumnust added a commit to autumnust/iceberg-1 that referenced this pull request Feb 1, 2022
Iceberg table properties are the canonical source of truth
HMS table properties should be maintained as much as possible to be in sync with the Iceberg table, but it can only happen on a best effort basis
This PR makes the following changes:

Ensures that all Iceberg table properties are propagated to the HMS table during HiveTableOperations commit
All HMS table properties are pushed down to Iceberg as well during table creation (except for metadata location and spec props)
Refactors the various property check assertions scattered throughout various test cases into a single property-focused unit test case
What is left out and should be done in the future:

Push property changes occurring via Hive DDL (ALTER TABLE SET TBLPROPERTIES) down to Iceberg as well. Currently this can't be done reliably because the HiveMetaHook interface only contains a preAlterTable method, but no commitAlterTable method. We'll need to extend this interface and include the change in an upcoming Hive upstream release.

Author: Marton Bod <marton.bod@gmail.com>
PR: apache/iceberg#2123
Backport Reason: To accomdate(I) for fix apache/iceberg#2328
autumnust added a commit to autumnust/iceberg-1 that referenced this pull request Feb 3, 2022
Iceberg table properties are the canonical source of truth
HMS table properties should be maintained as much as possible to be in sync with the Iceberg table, but it can only happen on a best effort basis
This PR makes the following changes:

Ensures that all Iceberg table properties are propagated to the HMS table during HiveTableOperations commit
All HMS table properties are pushed down to Iceberg as well during table creation (except for metadata location and spec props)
Refactors the various property check assertions scattered throughout various test cases into a single property-focused unit test case
What is left out and should be done in the future:

Push property changes occurring via Hive DDL (ALTER TABLE SET TBLPROPERTIES) down to Iceberg as well. Currently this can't be done reliably because the HiveMetaHook interface only contains a preAlterTable method, but no commitAlterTable method. We'll need to extend this interface and include the change in an upcoming Hive upstream release.

Author: Marton Bod <marton.bod@gmail.com>
PR: apache/iceberg#2123
Backport Reason: To accomdate(I) for fix apache/iceberg#2328
autumnust added a commit to linkedin/iceberg that referenced this pull request Feb 8, 2022
Iceberg table properties are the canonical source of truth
HMS table properties should be maintained as much as possible to be in sync with the Iceberg table, but it can only happen on a best effort basis
This PR makes the following changes:

Ensures that all Iceberg table properties are propagated to the HMS table during HiveTableOperations commit
All HMS table properties are pushed down to Iceberg as well during table creation (except for metadata location and spec props)
Refactors the various property check assertions scattered throughout various test cases into a single property-focused unit test case
What is left out and should be done in the future:

Push property changes occurring via Hive DDL (ALTER TABLE SET TBLPROPERTIES) down to Iceberg as well. Currently this can't be done reliably because the HiveMetaHook interface only contains a preAlterTable method, but no commitAlterTable method. We'll need to extend this interface and include the change in an upcoming Hive upstream release.

Author: Marton Bod <marton.bod@gmail.com>
PR: apache/iceberg#2123
Backport Reason: To accomdate(I) for fix apache/iceberg#2328
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants