Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set Input and OutputFormat in the hive meta hook #2025

Merged
merged 3 commits into from
Jan 5, 2021

Conversation

marton-bod
Copy link
Collaborator

@marton-bod marton-bod commented Jan 4, 2021

Impala retrieves the Input and OutputFormat classes directly from the HMS properties, and not from the StorageHandler. Therefore, we need to explicitly set these values during the Hive metahook operations, so that we have those present even when the table was created using Hive DDL. The analogous change in the Iceberg API has been added in this commit: 45caac4

@rdblue, @pvary could you please take a look? Thank you

@github-actions github-actions bot added the MR label Jan 4, 2021
// For a table created by Hive DDL to be readable by Impala, we need the Input and OutputFormat set explicitly
hmsTable.getSd().setInputFormat(HiveIcebergInputFormat.class.getCanonicalName());
hmsTable.getSd().setOutputFormat(HiveIcebergOutputFormat.class.getCanonicalName());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are going to be overwritten by HiveTableOperations depending on whether Hive is enabled. I think the right fix is to set the property to enable Hive on the table when creating an Iceberg table with the meta hook. @pvary, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @marton-bod.

After @boroknagyz's PR (#1751) hive.engine.enabled drives the value for these properties for HiveCatalog tables.

For Hive tables above Iceberg tables stored in other Iceberg Catalogs these values are still not set. So it might worth to move this change for the non-HiveCatalog codepath here:

if (!Catalogs.hiveCatalog(conf)) {
// If not using HiveCatalog check for existing table
try {
this.icebergTable = Catalogs.loadTable(conf, catalogProperties);

This would allow Impala to read tables created through Hive but stored in non-HiveCatalogs.

For tables created from Hive and stored in HiveCatalog we set hive.engine.enabled to true here:

public void commitCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable) {
if (icebergTable == null) {
if (Catalogs.hiveCatalog(conf)) {
catalogProperties.put(TableProperties.ENGINE_HIVE_ENABLED, true);
}

Setting hive.engine.enabled for other Iceberg Catalogs does not have any affect, so I think we are good there.

Thanks,
Peter

@rdblue rdblue merged commit 8286843 into apache:master Jan 5, 2021
@rdblue
Copy link
Contributor

rdblue commented Jan 5, 2021

Thanks for solving this, @marton-bod and @pvary!

XuQianJin-Stars pushed a commit to XuQianJin-Stars/iceberg that referenced this pull request Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants