-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions #1917
Conversation
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandler.java
Outdated
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Outdated
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@rdblue Could you please take a look at this as well, whenever suitable? :) Thank you! |
@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas | |||
if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) { | |||
return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA)); | |||
} else { | |||
return HiveSchemaUtil.convert(hmsTable.getSd().getCols()); | |||
if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would isSetPartitionKeys()
be true and getPartitionKeys()
empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically it could be set to an empty list. Checked this way to keep on the safe side.
@@ -542,7 +542,7 @@ project(':iceberg-mr') { | |||
|
|||
test { | |||
// testJoinTables / testScanTable | |||
maxHeapSize '1500m' | |||
maxHeapSize '2500m' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this needed? Additional tasks because of partitioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The extra tasks eat more memory :(
Thanks @marton-bod for reviewing, and @pvary for working on this! It looks good to me. I noted a few nits to fix, but I'm also fine merging this if you don't have time to fix them. I'll wait a day or two and then merge if I don't hear back. |
After consulting with the field folks they convinced me that it would be beneficial to have the first version of conversion in place for creating partitioned Iceberg tables from Hive. They suggested that even in this limited form this feature can boost adoption by allowing to try out Iceberg tables with partitions without changing the actual SQL commands.