-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set Input and OutputFormat in the hive meta hook #2025
Conversation
// For a table created by Hive DDL to be readable by Impala, we need the Input and OutputFormat set explicitly | ||
hmsTable.getSd().setInputFormat(HiveIcebergInputFormat.class.getCanonicalName()); | ||
hmsTable.getSd().setOutputFormat(HiveIcebergOutputFormat.class.getCanonicalName()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are going to be overwritten by HiveTableOperations
depending on whether Hive is enabled. I think the right fix is to set the property to enable Hive on the table when creating an Iceberg table with the meta hook. @pvary, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this with @marton-bod.
After @boroknagyz's PR (#1751) hive.engine.enabled
drives the value for these properties for HiveCatalog tables.
For Hive tables above Iceberg tables stored in other Iceberg Catalogs these values are still not set. So it might worth to move this change for the non-HiveCatalog codepath here:
iceberg/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Lines 78 to 81 in 0575296
if (!Catalogs.hiveCatalog(conf)) { | |
// If not using HiveCatalog check for existing table | |
try { | |
this.icebergTable = Catalogs.loadTable(conf, catalogProperties); |
This would allow Impala to read tables created through Hive but stored in non-HiveCatalogs.
For tables created from Hive and stored in HiveCatalog we set hive.engine.enabled
to true
here:
iceberg/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Lines 133 to 137 in 0575296
public void commitCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable) { | |
if (icebergTable == null) { | |
if (Catalogs.hiveCatalog(conf)) { | |
catalogProperties.put(TableProperties.ENGINE_HIVE_ENABLED, true); | |
} |
Setting hive.engine.enabled
for other Iceberg Catalogs does not have any affect, so I think we are good there.
Thanks,
Peter
Thanks for solving this, @marton-bod and @pvary! |
Co-authored-by: Marton Bod <mbod@cloudera.com>
Impala retrieves the Input and OutputFormat classes directly from the HMS properties, and not from the StorageHandler. Therefore, we need to explicitly set these values during the Hive metahook operations, so that we have those present even when the table was created using Hive DDL. The analogous change in the Iceberg API has been added in this commit: 45caac4
@rdblue, @pvary could you please take a look? Thank you